All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Monjalon <thomas@monjalon.net>
To: Rongwei Liu <rongweil@nvidia.com>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Cc: dev@dpdk.org, matan@nvidia.com, viacheslavo@nvidia.com,
	orika@nvidia.com,  jerinjacobk@gmail.com,
	stephen@networkplumber.org, rasland@nvidia.com,
	Ferruh Yigit <ferruh.yigit@amd.com>,
	jerinj@marvell.com
Subject: Re: [PATCH v4 2/3] ethdev: add standby state for live migration
Date: Wed, 01 Feb 2023 09:27:49 +0100	[thread overview]
Message-ID: <2738874.BEx9A2HvPv@thomas> (raw)
In-Reply-To: <798721cd-92b8-3ee1-0aaa-20a63bb1cd67@oktetlabs.ru>

01/02/2023 08:52, Andrew Rybchenko:
> On 1/18/23 18:44, Rongwei Liu wrote:
> > +* **Added process state in ethdev to improve live migration.**
> > +
> > +  Hot upgrade of an application may be accelerated by configuring
> > +  the new application in standby state while the old one is still active.
> > +  Such double ethdev configuration of the same device is possible
> > +  with the added process state API.
> > +
> 
> Will we have the same API for other device classes? Any ideas
> how to avoid duplication? (I'm afraid we don't have generic
> place to put such functionality).

It could be needed to have such feature in other classes, yes.
That's a device feature, so it must be in each class,
with different flags/options per class.

> > +	/* Check if all devices support process role. */
> > +	RTE_ETH_FOREACH_DEV(port_id) {
> > +		dev = &rte_eth_devices[port_id];
> > +		if (*dev->dev_ops->process_set_role != NULL &&
> > +			*dev->dev_ops->dev_infos_get != NULL &&
> > +			(*dev->dev_ops->dev_infos_get)(dev, &dev_info) == 0 &&
> > +			(dev_info.dev_capa & RTE_ETH_DEV_CAPA_PROCESS_ROLE) != 0)
> 
> Why do we need both non-NULL callback and dev_capa flag?
> Isn't just non-NULL callback sufficient?

We could have a callback returning ENOTSUP.
Anyway we need the capability for application query.

> > +			continue;
> > +		rte_errno = ENOTSUP;
> > +		return -rte_errno;
> > +	}
> > +	/* Call the driver callbacks. */
> > +	RTE_ETH_FOREACH_DEV(port_id) {
> > +		dev = &rte_eth_devices[port_id];
> > +		if ((*dev->dev_ops->process_set_role)(dev, standby, flags) < 0)
> 
> Please, add error logging on failure.

good idea

> 
> > +			goto failure;
> 
> IMHO it would be useful to have info logs on success.

When setting the role, we could have debug log, yes.

> > +		ret++;
> 
> ret variable name is very confusing, please, be
> more specific in variable naming.
> 
> > +	}
> > +	return ret;
> > +
> > +failure:
> > +	/* Rollback all changed devices in case one failed. */
> > +	if (ret) {
> 
> Compare vs 0
> 
> > +		RTE_ETH_FOREACH_DEV(port_id) {
> > +			dev = &rte_eth_devices[port_id];
> > +			(*dev->dev_ops->process_set_role)(dev, !standby, flags);
> 
> IMHO, it would be useful to have error logs on rollback
> failure and info logs on success.
> 
> > +			if (--ret == 0)
> > +				break;
> > +		}
> > +	}
> > +	rte_errno = EPERM;
> 
> Why is it always EPERM? Isn't it better to forward
> failure code from driver?
> 
> > +	return -rte_errno;
> > +}

Thanks for the good detailed review Andrew.

> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1606,6 +1606,8 @@ struct rte_eth_conf {
> >   #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
> >   /** Device supports keeping shared flow objects across restart. */
> >   #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
> > +/** Device supports process role changing. @see rte_eth_process_set_active */
> 
> There is no rte_eth_process_set_active()
> 
> > +#define RTE_ETH_DEV_CAPA_PROCESS_ROLE           RTE_BIT64(5)
> >   /**@}*/
> >   
> >   /*
> > @@ -2204,6 +2206,60 @@ int rte_eth_dev_owner_delete(const uint64_t owner_id);
> >   int rte_eth_dev_owner_get(const uint16_t port_id,
> >   		struct rte_eth_dev_owner *owner);
> >   
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Set the role of the process to active or standby,
> > + * affecting network traffic handling.
> > + *
> > + * If one device does not support this operation or fails,
> > + * the whole operation is failed and rolled back.
> > + *
> > + * It is forbidden to have multiple processes with the same role
> > + * unless only one of them is configured to handle the traffic.
> 
> Why is it not allowed to have many processes in standby?
> 
> > + *
> > + * The application is active by default.
> > + * The configuration from the active process is effective immediately
> > + * while the configuration from the standby process is queued by hardware.
> 
> I'm afraid error handling could be tricky in this case.
> I think it makes sense to highlight that configuration
> propagation failures will be returned on attempt to set
> the process active.
> 
> > + * When configuring the device from a standby process,
> > + * it has no effect except for below situations:
> > + *   - traffic not handled by the active process configuration
> > + *   - no active process
> > + *
> > + * When a process is changed from a standby to an active role,
> > + * all preceding configurations that are queued by hardware
> > + * should become effective immediately.
> > + * Before role transition, all the traffic handling configurations
> > + * set by the active process should be flushed first.
> > + *
> > + * In summary, the operations are expected to happen in this order
> > + * in "old" and "new" applications:
> > + *   device: already configured by the old application
> > + *   new:    start as active
> > + *   new:    probe the same device
> > + *   new:    set as standby
> > + *   new:    configure the device
> > + *   device: has configurations from old and new applications
> > + *   old:    clear its device configuration
> > + *   device: has only 1 configuration from new application
> > + *   new:    set as active
> > + *   device: downtime for connecting all to the new application
> > + *   old:    shutdown
> > + *
> > + * @param standby
> > + *   Role active if false, standby if true.
> 
> Typically API with boolean parameters is bad. May be in this
> particular case it is better to have two functions:
> rte_eth_process_set_active() and rte_eth_process_set_standby().

Why?
It could be an enum as well.

> > + * @param flags
> > + *   Role specific flags.
> 
> Please, define namespace for flags.
> 
> > + * @return
> > + *   Positive value on success, -rte_errno value on error:
> > + *   - (> 0) Number of switched devices.
> 
> Isn't is success if there is no devices to switch?

It should be >= 0 I guess.




  reply	other threads:[~2023-02-01  8:27 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-01  8:20 [RFC 0/2] add API to set process to primary or secondary Rongwei Liu
2022-12-01  8:20 ` [RFC 1/2] ethdev: add group description Rongwei Liu
2022-12-01  8:20 ` [RFC 2/2] ethdev: add API to set process to primary or secondary Rongwei Liu
2022-12-01 15:10   ` Stephen Hemminger
2022-12-02  3:27     ` Rongwei Liu
2022-12-05 16:08       ` Stephen Hemminger
2022-12-06  3:47         ` Rongwei Liu
2022-12-06  5:54           ` Stephen Hemminger
2022-12-21  9:00             ` [RFC v3 0/2] add API to set process to active or standby Rongwei Liu
2022-12-21  9:00               ` [RFC v3 1/2] ethdev: add group description Rongwei Liu
2023-01-18 15:44                 ` [PATCH v4 0/3] add API for live migration Rongwei Liu
2023-01-18 15:44                   ` [PATCH v4 1/3] ethdev: add flow rule group description Rongwei Liu
2023-01-31 11:53                     ` Ori Kam
2023-02-06 12:15                       ` Rongwei Liu
2023-02-07  2:57                     ` [PATCH v5] " Rongwei Liu
2023-02-08 20:28                       ` Ferruh Yigit
2023-02-09  2:06                         ` Rongwei Liu
2023-02-09  7:32                         ` [PATCH v6] " Rongwei Liu
2023-02-09  8:01                           ` Ori Kam
2023-02-09 11:26                             ` Ferruh Yigit
2023-01-18 15:44                   ` [PATCH v4 2/3] ethdev: add standby state for live migration Rongwei Liu
2023-01-31 13:50                     ` Ori Kam
2023-01-31 18:14                     ` Jerin Jacob
2023-01-31 22:55                       ` Thomas Monjalon
2023-02-01  7:32                         ` Andrew Rybchenko
2023-02-01  8:31                           ` Thomas Monjalon
2023-02-01  8:40                           ` Jerin Jacob
2023-02-01  8:46                             ` Thomas Monjalon
2023-02-02 10:23                               ` Rongwei Liu
2023-02-01  7:52                     ` Andrew Rybchenko
2023-02-01  8:27                       ` Thomas Monjalon [this message]
2023-02-01  8:40                         ` Andrew Rybchenko
2023-01-18 15:44                   ` [PATCH v4 3/3] ethdev: add standby flags " Rongwei Liu
2023-01-23 13:20                     ` Jerin Jacob
2023-01-30  2:47                       ` Rongwei Liu
2023-01-30 17:10                         ` Jerin Jacob
2023-01-31  2:53                           ` Rongwei Liu
2023-01-31  8:45                             ` Jerin Jacob
2023-01-31  9:01                               ` Rongwei Liu
2023-01-31 14:37                                 ` Jerin Jacob
2023-01-31 14:45                                   ` Ori Kam
2023-01-31 17:50                                     ` Thomas Monjalon
2023-01-31 18:10                                       ` Jerin Jacob
2022-12-21  9:00               ` [RFC v3 2/2] ethdev: add API to set process to active or standby Rongwei Liu
2022-12-21  9:12                 ` Jerin Jacob
2022-12-21  9:32                   ` Rongwei Liu
2022-12-21 10:59                     ` Jerin Jacob
2022-12-21 12:05                       ` Rongwei Liu
2022-12-21 12:44                         ` Jerin Jacob
2022-12-21 12:50                           ` Rongwei Liu
2022-12-21 13:12                             ` Jerin Jacob
2022-12-21 14:33                               ` Rongwei Liu
2022-12-26 16:44                                 ` Ori Kam
2023-01-15 22:46                                   ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2738874.BEx9A2HvPv@thomas \
    --to=thomas@monjalon.net \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=jerinj@marvell.com \
    --cc=jerinjacobk@gmail.com \
    --cc=matan@nvidia.com \
    --cc=orika@nvidia.com \
    --cc=rasland@nvidia.com \
    --cc=rongweil@nvidia.com \
    --cc=stephen@networkplumber.org \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.