All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlad Buslov <vladbu@mellanox.com>
To: John Hurley <john.hurley@netronome.com>
Cc: Vlad Buslov <vladbu@mellanox.com>, Jiri Pirko <jiri@mellanox.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"simon.horman@netronome.com" <simon.horman@netronome.com>,
	"jakub.kicinski@netronome.com" <jakub.kicinski@netronome.com>,
	"oss-drivers@netronome.com" <oss-drivers@netronome.com>
Subject: Re: [RFC net-next 0/2] prevent sync issues with hw offload of flower
Date: Thu, 3 Oct 2019 16:26:28 +0000	[thread overview]
Message-ID: <vbfk19lokwe.fsf@mellanox.com> (raw)
In-Reply-To: <1570058072-12004-1-git-send-email-john.hurley@netronome.com>


On Thu 03 Oct 2019 at 02:14, John Hurley <john.hurley@netronome.com> wrote:
> Hi,
>
> Putting this out an RFC built on net-next. It fixes some issues
> discovered in testing when using the TC API of OvS to generate flower
> rules and subsequently offloading them to HW. Rules seen contain the same
> match fields or may be rule modifications run as a delete plus an add.
> We're seeing race conditions whereby the rules present in kernel flower
> are out of sync with those offloaded. Note that there are some issues
> that will need fixed in the RFC before it becomes a patch such as
> potential races between releasing locks and re-taking them. However, I'm
> putting this out for comments or potential alternative solutions.
>
> The main cause of the races seem to be in the chain table of cls_api. If
> a tcf_proto is destroyed then it is removed from its chain. If a new
> filter is then added to the same chain with the same priority and protocol
> a new tcf_proto will be created - this may happen before the first is
> fully removed and the hw offload message sent to the driver. In cls_flower
> this means that the fl_ht_insert_unique() function can pass as its
> hashtable is associated with the tcf_proto. We are then in a position
> where the 'delete' and the 'add' are in a race to get offloaded. We also
> noticed that doing an offload add, then checking if a tcf_proto is
> concurrently deleting, then remove the offload if it is, can extend the
> out of order messages. Drivers do not expect to get duplicate rules.
> However, the kernel TC datapath they are not duplicates so we can get out
> of sync here.
>
> The RFC fixes this by adding a pre_destroy hook to cls_api that is called
> when a tcf_proto is signaled to be destroyed but before it is removed from
> its chain (which is essentially the lock for allowing duplicates in
> flower). Flower then uses this new hook to send the hw delete messages
> from tcf_proto destroys, preventing them racing with duplicate adds. It
> also moves the check for 'deleting' to before the sending the hw add
> message.
>
> John Hurley (2):
>   net: sched: add tp_op for pre_destroy
>   net: sched: fix tp destroy race conditions in flower
>
>  include/net/sch_generic.h |  3 +++
>  net/sched/cls_api.c       | 29 ++++++++++++++++++++++++-
>  net/sched/cls_flower.c    | 55 ++++++++++++++++++++++++++---------------------
>  3 files changed, 61 insertions(+), 26 deletions(-)

Hi John,

Thanks for working on this!

Are there any other sources for race conditions described in this
letter? When you describe tcf_proto deletion you say "main cause" but
don't provide any others. If tcf_proto is the only problematic part,
then it might be worth to look into alternative ways to force concurrent
users to wait for proto deletion/destruction to be properly finished.
Maybe having some table that maps chain id + prio to completion would be
simpler approach? With such infra tcf_proto_create() can wait for
previous proto with same prio and chain to be fully destroyed (including
offloads) before creating a new one.

Regards,
Vlad

  parent reply	other threads:[~2019-10-03 16:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-02 23:14 [RFC net-next 0/2] prevent sync issues with hw offload of flower John Hurley
2019-10-02 23:14 ` [RFC net-next 1/2] net: sched: add tp_op for pre_destroy John Hurley
2019-10-02 23:14 ` [RFC net-next 2/2] net: sched: fix tp destroy race conditions in flower John Hurley
2019-10-03 16:18   ` Vlad Buslov
2019-10-03 16:39     ` John Hurley
2019-10-03 16:26 ` Vlad Buslov [this message]
2019-10-03 16:59   ` [RFC net-next 0/2] prevent sync issues with hw offload of flower John Hurley
2019-10-03 17:19     ` Vlad Buslov
2019-10-04 15:39       ` John Hurley
2019-10-04 15:58         ` Vlad Buslov
2019-10-04 16:06           ` John Hurley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=vbfk19lokwe.fsf@mellanox.com \
    --to=vladbu@mellanox.com \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@mellanox.com \
    --cc=john.hurley@netronome.com \
    --cc=netdev@vger.kernel.org \
    --cc=oss-drivers@netronome.com \
    --cc=simon.horman@netronome.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.