All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Chengwen Feng <fengchengwen@huawei.com>
Cc: <thomas@monjalon.net>, <ferruh.yigit@amd.com>,
	<konstantin.ananyev@huawei.com>, <ajit.khaparde@broadcom.com>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
	Somnath Kotur <somnath.kotur@broadcom.com>,
	Kalesh AP <kalesh-anakkur.purayil@broadcom.com>, <dev@dpdk.org>,
	<Honnappa.Nagarahalli@arm.com>
Subject: Re: [PATCH v4 1/7] ethdev: fix race-condition of proactive error handling mode
Date: Wed, 9 Oct 2024 17:46:57 -0700	[thread overview]
Message-ID: <20241009174657.59491f20@hermes.local> (raw)
In-Reply-To: <20240905092504.10725-2-fengchengwen@huawei.com>

On Thu, 5 Sep 2024 09:24:58 +0000
Chengwen Feng <fengchengwen@huawei.com> wrote:

> In the proactive error handling mode, the PMD will set the data path
> pointers to dummy functions and then try recovery, in this period the
> application may still invoking data path API. This will introduce a
> race-condition with data path which may lead to crash [1].
> 
> Although the PMD added delay after setting data path pointers to cover
> the above race-condition, it reduces the probability, but it doesn't
> solve the problem.
> 
> To solve the race-condition problem fundamentally, the following
> requirements are added:
> 1. The PMD should set the data path pointers to dummy functions after
>    report RTE_ETH_EVENT_ERR_RECOVERING event.
> 2. The application should stop data path API invocation when process
>    the RTE_ETH_EVENT_ERR_RECOVERING event.
> 3. The PMD should set the data path pointers to valid functions before
>    report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> 4. The application should enable data path API invocation when process
>    the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> 
> Also, this patch introduce a driver internal function
> rte_eth_fp_ops_setup which used as an help function for PMD.
> 
> [1] http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> 
> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> Cc: stable@dpdk.org

This is not material for stable release, because of the impact to PMD etc.

> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> Acked-by: Huisong Li <lihuisong@huawei.com>

...

> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 548fada1c7..0aec5588e5 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -4041,25 +4041,28 @@ enum rte_eth_event_type {
>  	 */
>  	RTE_ETH_EVENT_RX_AVAIL_THRESH,
>  	/** Port recovering from a hardware or firmware error.
> -	 * If PMD supports proactive error recovery,
> -	 * it should trigger this event to notify application
> -	 * that it detected an error and the recovery is being started.
> -	 * Upon receiving the event, the application should not invoke any control path API
> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED event.
> -	 * The PMD will set the data path pointers to dummy functions,
> -	 * and re-set the data path pointers to non-dummy functions
> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> -	 * It means that the application cannot send or receive any packets
> -	 * during this period.
> +	 *
> +	 * If PMD supports proactive error recovery, it should trigger this
> +	 * event to notify application that it detected an error and the
> +	 * recovery is about to start.
> +	 *
> +	 * Upon receiving the event, the application should not invoke any
> +	 * control and data path API until receiving
> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
> +	 * event.
> +	 *
> +	 * Once this event is reported, the PMD will set the data path pointers
> +	 * to dummy functions, and re-set the data path pointers to valid
> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> +	 *

Please use the IETF RFC conventions for wording here.
Use "should" only when it is optional. In these cases the word "must"
must be used.

	* If PMD supports proactive error recovery, it must trigger this
...


>  	 * @note Before the PMD reports the recovery result,
>  	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event again,
>  	 * because a larger error may occur during the recovery.
>  	 */
>  	RTE_ETH_EVENT_ERR_RECOVERING,
>  	/** Port recovers successfully from the error.
> -	 * The PMD already re-configured the port,
> -	 * and the effect is the same as a restart operation.
> +	 *
> +	 * The PMD already re-configured the port:
>  	 * a) The following operation will be retained: (alphabetically)
>  	 *    - DCB configuration
>  	 *    - FEC configuration
> @@ -4086,6 +4089,9 @@ enum rte_eth_event_type {
>  	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>  	 * c) Any other configuration will not be stored
>  	 *    and will need to be re-configured.
> +	 *
> +	 * The application should restore some additional configuration
> +	 * (see above case b/c), and then enable data path API invocation.
>  	 */
>  	RTE_ETH_EVENT_RECOVERY_SUCCESS,
>  	/** Port recovery failed.
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 1669055ca5..da592b63bc 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -346,6 +346,7 @@ INTERNAL {
>  	rte_eth_devices;
>  	rte_eth_dma_zone_free;
>  	rte_eth_dma_zone_reserve;
> +	rte_eth_fp_ops_setup;
>  	rte_eth_hairpin_queue_peer_bind;
>  	rte_eth_hairpin_queue_peer_unbind;
>  	rte_eth_hairpin_queue_peer_update;

My other concern is that changing fp_ops on a running port is not safe.
No part of eth_dev_fp_ops_setup() is atomic.


  reply	other threads:[~2024-10-10  0:47 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-01  3:06 [PATCH 0/5] fix race-condition of proactive error handling mode Chengwen Feng
2023-03-01  3:06 ` [PATCH 1/5] ethdev: " Chengwen Feng
2023-03-02 12:08   ` Konstantin Ananyev
2023-03-03 16:51     ` Ferruh Yigit
2023-03-05 14:53       ` Konstantin Ananyev
2023-03-06  8:55         ` Ferruh Yigit
2023-03-06 10:22           ` Konstantin Ananyev
2023-03-06 11:00             ` Ferruh Yigit
2023-03-06 11:05               ` Ajit Khaparde
2023-03-06 11:13                 ` Konstantin Ananyev
2023-03-07  8:25                   ` fengchengwen
2023-03-07  9:52                     ` Konstantin Ananyev
2023-03-07 10:11                       ` Konstantin Ananyev
2023-03-07 12:07                     ` Ferruh Yigit
2023-03-07 12:26                       ` fengchengwen
2023-03-07 12:39                         ` Konstantin Ananyev
2023-03-09  2:05                           ` Ajit Khaparde
2023-03-06  1:41       ` fengchengwen
2023-03-06  8:57         ` Ferruh Yigit
2023-03-06  9:10         ` Ferruh Yigit
2023-03-02 23:30   ` Honnappa Nagarahalli
2023-03-03  0:21     ` Konstantin Ananyev
2023-03-04  5:08       ` Honnappa Nagarahalli
2023-03-05 15:23         ` Konstantin Ananyev
2023-03-07  5:34           ` Honnappa Nagarahalli
2023-03-07  8:39             ` fengchengwen
2023-03-08  1:09               ` Honnappa Nagarahalli
2023-03-09  0:59                 ` fengchengwen
2023-03-09  3:03                   ` Honnappa Nagarahalli
2023-03-09 11:30                     ` fengchengwen
2023-03-10  3:25                       ` Honnappa Nagarahalli
2023-03-07  9:56             ` Konstantin Ananyev
2023-03-01  3:06 ` [PATCH 2/5] net/hns3: replace fp ops config function Chengwen Feng
2023-03-02  6:50   ` Dongdong Liu
2023-03-01  3:06 ` [PATCH 3/5] net/bnxt: fix race-condition when report error recovery Chengwen Feng
2023-03-02 12:23   ` Konstantin Ananyev
2023-03-01  3:06 ` [PATCH 4/5] net/bnxt: use fp ops setup function Chengwen Feng
2023-03-02 12:30   ` Konstantin Ananyev
2023-03-03  0:01     ` Konstantin Ananyev
2023-03-03  1:17       ` Ajit Khaparde
2023-03-03  2:02       ` fengchengwen
2023-03-03  1:38     ` fengchengwen
2023-03-05 15:57       ` Konstantin Ananyev
2023-03-06  2:47         ` Ajit Khaparde
2023-03-01  3:06 ` [PATCH 5/5] app/testpmd: add error recovery usage demo Chengwen Feng
2023-03-02 13:01   ` Konstantin Ananyev
2023-03-03  1:49     ` fengchengwen
2023-03-03 16:59       ` Ferruh Yigit
2023-09-21 11:12 ` [PATCH 0/5] fix race-condition of proactive error handling mode Ferruh Yigit
2023-10-07  2:32   ` fengchengwen
2023-10-20 10:07 ` [PATCH v2 0/7] " Chengwen Feng
2023-10-20 10:07   ` [PATCH v2 1/7] ethdev: " Chengwen Feng
2023-11-01  3:39     ` lihuisong (C)
2023-10-20 10:07   ` [PATCH v2 2/7] net/hns3: replace fp ops config function Chengwen Feng
2023-11-01  3:40     ` lihuisong (C)
2023-11-02 10:34     ` Konstantin Ananyev
2023-10-20 10:07   ` [PATCH v2 3/7] net/bnxt: fix race-condition when report error recovery Chengwen Feng
2023-11-02 16:28     ` Ajit Khaparde
2023-10-20 10:07   ` [PATCH v2 4/7] net/bnxt: use fp ops setup function Chengwen Feng
2023-11-01  3:48     ` lihuisong (C)
2023-11-02 10:34     ` Konstantin Ananyev
2023-11-02 16:29       ` Ajit Khaparde
2023-10-20 10:07   ` [PATCH v2 5/7] app/testpmd: add error recovery usage demo Chengwen Feng
2023-11-01  4:08     ` lihuisong (C)
2023-11-06 13:01       ` fengchengwen
2023-10-20 10:07   ` [PATCH v2 6/7] app/testpmd: extract event handling to event.c Chengwen Feng
2023-11-01  4:09     ` lihuisong (C)
2023-10-20 10:07   ` [PATCH v2 7/7] doc: testpmd support event handling section Chengwen Feng
2023-11-06  9:28     ` lihuisong (C)
2023-11-06 12:39       ` fengchengwen
2023-11-08  3:02         ` lihuisong (C)
2023-11-06  1:35   ` [PATCH v2 0/7] fix race-condition of proactive error handling mode fengchengwen
2023-11-06 13:11 ` [PATCH v3 " Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 1/7] ethdev: " Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 2/7] net/hns3: replace fp ops config function Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 3/7] net/bnxt: fix race-condition when report error recovery Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 4/7] net/bnxt: use fp ops setup function Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 5/7] app/testpmd: add error recovery usage demo Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 6/7] app/testpmd: extract event handling to event.c Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 7/7] doc: testpmd support event handling section Chengwen Feng
2023-11-08  3:03     ` lihuisong (C)
2023-12-05  2:30   ` [PATCH v3 0/7] fix race-condition of proactive error handling mode fengchengwen
2024-01-15  1:44     ` fengchengwen
2024-01-29  1:16       ` fengchengwen
2024-02-18  3:41         ` fengchengwen
2024-05-08  9:22           ` fengchengwen
2024-09-05  9:24 ` [PATCH v4 " Chengwen Feng
2024-09-05  9:24   ` [PATCH v4 1/7] ethdev: " Chengwen Feng
2024-10-10  0:46     ` Stephen Hemminger [this message]
2024-09-05  9:24   ` [PATCH v4 2/7] net/hns3: replace fp ops config function Chengwen Feng
2024-09-05  9:25   ` [PATCH v4 3/7] net/bnxt: fix race-condition when report error recovery Chengwen Feng
2024-09-05  9:25   ` [PATCH v4 4/7] net/bnxt: use fp ops setup function Chengwen Feng
2024-09-05  9:25   ` [PATCH v4 5/7] app/testpmd: add error recovery usage demo Chengwen Feng
2024-09-05  9:25   ` [PATCH v4 6/7] app/testpmd: extract event handling to event.c Chengwen Feng
2024-09-05  9:25   ` [PATCH v4 7/7] doc: testpmd support event handling section Chengwen Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241009174657.59491f20@hermes.local \
    --to=stephen@networkplumber.org \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=ajit.khaparde@broadcom.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=ferruh.yigit@amd.com \
    --cc=kalesh-anakkur.purayil@broadcom.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=somnath.kotur@broadcom.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.