netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamal Hadi Salim <jhs@mojatatu.com>
To: Vlad Buslov <vladbu@nvidia.com>
Cc: netdev@vger.kernel.org, deb.chatterjee@intel.com,
	anjali.singhai@intel.com,  namrata.limaye@intel.com,
	tom@sipanda.io, mleitner@redhat.com,  Mahesh.Shirshyad@amd.com,
	tomasz.osinski@intel.com, jiri@resnulli.us,
	 xiyou.wangcong@gmail.com, davem@davemloft.net,
	edumazet@google.com,  kuba@kernel.org, pabeni@redhat.com,
	horms@kernel.org, kernel@mojatatu.com,  khalidm@nvidia.com,
	toke@redhat.com, mattyk@nvidia.com
Subject: Re: [PATCH RFC v6 net-next 13/17] p4tc: add table entry create, update, get, delete, flush and dump
Date: Mon, 9 Oct 2023 11:39:04 -0400	[thread overview]
Message-ID: <CAM0EoMnz-vFOizxqwgRocqiQtzwJ_6uk1fDvGHZce0yDpnRAGA@mail.gmail.com> (raw)
In-Reply-To: <8734yjzvsq.fsf@nvidia.com>

On Mon, Oct 9, 2023 at 10:59 AM Vlad Buslov <vladbu@nvidia.com> wrote:
>
>
> On Mon 09 Oct 2023 at 10:02, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > Hi Vlad,
> >
> > On Sun, Oct 8, 2023 at 12:36 PM Vlad Buslov <vladbu@nvidia.com> wrote:
> >>
> >> On Sat 30 Sep 2023 at 10:35, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > [..trimmed...]
> >
> >> > +/* Invoked from both control and data path  */
> >> > +static int __p4tc_table_entry_update(struct p4tc_pipeline *pipeline,
> >> > +                                  struct p4tc_table *table,
> >> > +                                  struct p4tc_table_entry *entry,
> >> > +                                  struct p4tc_table_entry_mask *mask,
> >> > +                                  u16 whodunnit, bool from_control)
> >> > +__must_hold(RCU)

Mark B - comment further down.

> >> > +{
> >> > +     struct p4tc_table_entry_mask *mask_found = NULL;
> >> > +     struct p4tc_table_entry_work *entry_work;
> >> > +     struct p4tc_table_entry_value *value_old;
> >> > +     struct p4tc_table_entry_value *value;
> >> > +     struct p4tc_table_entry *entry_old;
> >> > +     struct p4tc_table_entry_tm *tm_old;
> >> > +     struct p4tc_table_entry_tm *tm;
> >> > +     int ret;
> >> > +
> >> > +     value = p4tc_table_entry_value(entry);
> >> > +     /* We set it to zero on create an update to avoid having entry
> >> > +      * deletion in parallel before we report to user space.
> >> > +      */
> >> > +     refcount_set(&value->entries_ref, 0);
> >>
> >> TBH I already commented on one of the previous versions of this series
> >> that it is very hard to understand and review tons of different atomic
> >> reference counters, especially when they are modified with functions
> >> like refcount_dec_not_one() or unconditional set like in this place.
> >>
> >> I chose specifically this function because __must_hold(RCU) makes it
> >> look like it can be accessed concurrently from datapath, which is not
> >> obvious on multiple previous usages of reference counters in the series.
> >
> > True, tables can be manipulated from control plane/user space,
> > datapath as well as timers (mostly for delete).
> > Would using wrappers around these incr/decr help? i mean meaningful
> > inlines that will provide clarity as to what an incr/decr is?
>
> Considering existing wrappers I don't believe adding more would help.
> Looking at wrappers for entries_ref:
>
> static inline int p4tc_tbl_entry_get(struct p4tc_table_entry_value *value)
> {
>         return refcount_inc_not_zero(&value->entries_ref);
> }
>
> static inline bool p4tc_tbl_entry_put(struct p4tc_table_entry_value *value)
> {
>         return refcount_dec_if_one(&value->entries_ref);
> }
>
> static inline bool p4tc_tbl_entry_put_ref(struct p4tc_table_entry_value *value)
> {
>         return refcount_dec_not_one(&value->entries_ref);
> }
>
> it isn't obvious to me why 'put' is dec_if_one and put_ref is dec_not
> one.
>

Ok, so maybe answers to your questions below will help to clarify.
Also: Would it help if we also put comments in the data structs to say
what each refcnt, atomic and ref is for (they are clear to us but
we've been staring at this for too long).

> >> So what happens here if entries_ref was 0 to begin with? Or is possible
> >> for this function to be executed concurrently by multiple tasks, in
> >> which case all of them set entries_ref to 0, but first one that finishes
> >> resets the counter back to 1 at which point I assume it can be deleted
> >> in parallel by control path while some concurrent
> >> __p4tc_table_entry_update() are still running (at least that is what the
> >> comment here indicates)?
> >
> > It's rtnl-lockless, so you can imagine what would happen there ;->
> > Multiple concurent user space, kernel, timers all contending for this.
> > Exactly what you said: its zero in this case because some entity could
> > delete it in parallel.
> > See comment further down which says "In case of parallel update, the
> > thread that arrives here first will..."
>
> See below.
>
> > Consider it a poor man's lock. Does that help? Perhaps we could have
> > more discussion at the monthly tc meetup..
> > We have been testing this code a lot for concurrency and wrote some
> > user space tooling to catch such issues.
>
> I'm not arguing that it is somehow broken or not well-tested, just
> saying I read this and previous versions several times and still have no
> idea what is going on here.

Fair enough - hopefully we can rectify this.

> >
> >>
> >> > +
> >> > +     if (table->tbl_type != P4TC_TABLE_TYPE_EXACT) {
> >> > +             mask_found = p4tc_table_entry_mask_add(table, entry, mask);
> >> > +             if (IS_ERR(mask_found)) {
> >> > +                     ret = PTR_ERR(mask_found);
> >> > +                     goto out;
> >> > +             }
> >> > +     }
> >> > +
>
> Mark A, see next comment.
>
> >> > +     p4tc_table_entry_build_key(table, &entry->key, mask_found);
> >> > +
> >> > +     entry_old = p4tc_entry_lookup(table, &entry->key, value->prio);
> >> > +     if (!entry_old) {
> >> > +             ret = -ENOENT;
> >> > +             goto rm_masks_idr;
> >> > +     }
> >> > +
> >> > +     /* In case of parallel update, the thread that arrives here first will
> >> > +      * get the right to update.
> >> > +      *
> >> > +      * In case of a parallel get/update, whoever is second will fail appropriately
> >> > +      */
> >> > +     value_old = p4tc_table_entry_value(entry_old);
> >> > +     if (!p4tc_tbl_entry_put(value_old)) {
> >> > +             ret = -EAGAIN;
>
> I get that this prevents you from updating the old entry in parallel,
> but I was asking what would happen if concurrent update already finished
> by this time, set value->entries_ref to 1 and concurrent delete
> deallocated the new entry? Isn't use after free possible when calling
> p4tc_table_entry_build_key() and p4tc_entry_lookup() if all those events
> happen before mark A? (for example, if task spent a lot of time
> allocating new mask in p4tc_table_entry_mask_add())


The table entry is RCU and this function can only be called with RCU
read lock(See "Mark B" above), so a use-after-free scenario shouldn't
be possible.
I think we actually have a test case for the scenario you are
describing - but sometimes with these things it is based on luck to
hit the exact scenario; we could add delays at strategic code paths to
make it more deterministic to hit the scenario. Code review always
helps as you are doing right now (if you cant find issues in this
area, noone else could ;->).

cheers,
jamal

> >> > +             goto rm_masks_idr;
> >> > +     }
> >> > +
> >> > +     if (from_control) {
> >> > +             if (!p4tc_ctrl_update_ok(value_old->permissions)) {
> >> > +                     ret = -EPERM;
> >> > +                     goto set_entries_refcount;
> >> > +             }
> >> > +     } else {
> >> > +             if (!p4tc_data_update_ok(value_old->permissions)) {
> >> > +                     ret = -EPERM;
> >> > +                     goto set_entries_refcount;
> >> > +             }
> >> > +     }
> >> > +
> >> > +     tm = kzalloc(sizeof(*tm), GFP_ATOMIC);
> >> > +     if (unlikely(!tm)) {
> >> > +             ret = -ENOMEM;
> >> > +             goto set_entries_refcount;
> >> > +     }
> >> > +
> >> > +     tm_old = rcu_dereference_protected(value_old->tm, 1);
> >> > +     *tm = *tm_old;
> >> > +
> >> > +     tm->lastused = jiffies;
> >> > +     tm->who_updated = whodunnit;
> >> > +
> >> > +     if (value->permissions == P4TC_PERMISSIONS_UNINIT)
> >> > +             value->permissions = value_old->permissions;
> >> > +
> >> > +     rcu_assign_pointer(value->tm, tm);
> >> > +
> >> > +     entry_work = kzalloc(sizeof(*(entry_work)), GFP_ATOMIC);
> >> > +     if (unlikely(!entry_work)) {
> >> > +             ret = -ENOMEM;
> >> > +             goto free_tm;
> >> > +     }
> >> > +
> >> > +     entry_work->pipeline = pipeline;
> >> > +     entry_work->table = table;
> >> > +     entry_work->entry = entry;
> >> > +     value->entry_work = entry_work;
> >> > +     if (!value->is_dyn)
> >> > +             value->is_dyn = value_old->is_dyn;
> >> > +
> >> > +     if (value->is_dyn) {
> >> > +             /* Only use old entry value if user didn't specify new one */
> >> > +             value->aging_ms = value->aging_ms ?: value_old->aging_ms;
> >> > +
> >> > +             hrtimer_init(&value->entry_timer, CLOCK_MONOTONIC,
> >> > +                          HRTIMER_MODE_REL);
> >> > +             value->entry_timer.function = &entry_timer_handle;
> >> > +
> >> > +             hrtimer_start(&value->entry_timer, ms_to_ktime(value->aging_ms),
> >> > +                           HRTIMER_MODE_REL);
> >> > +     }
> >> > +
> >> > +     INIT_WORK(&entry_work->work, p4tc_table_entry_del_work);
> >> > +
> >> > +     if (rhltable_insert(&table->tbl_entries, &entry->ht_node,
> >> > +                         entry_hlt_params) < 0) {
> >> > +             ret = -EEXIST;
> >> > +             goto free_entry_work;
> >> > +     }
> >> > +
> >> > +     p4tc_table_entry_destroy_noida(table, entry_old);
> >> > +
> >> > +     if (!from_control)
> >> > +             p4tc_tbl_entry_emit_event(entry_work, RTM_P4TC_UPDATE,
> >> > +                                       GFP_ATOMIC);
> >> > +
> >> > +     return 0;
> >> > +
> >> > +free_entry_work:
> >> > +     kfree(entry_work);
> >> > +
> >> > +free_tm:
> >> > +     kfree(tm);
> >> > +
> >> > +set_entries_refcount:
> >> > +     refcount_set(&value_old->entries_ref, 1);
> >> > +
> >> > +rm_masks_idr:
> >> > +     if (table->tbl_type != P4TC_TABLE_TYPE_EXACT)
> >> > +             p4tc_table_entry_mask_del(table, entry);
> >> > +
> >> > +out:
> >> > +     return ret;
> >> > +}
> >> > +
> >
> > [..trimmed..]
> >
> >  cheers,
> > jamal
>

  reply	other threads:[~2023-10-09 15:39 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-30 14:35 [PATCH RFC v6 net-next 00/17] Introducing P4TC Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 01/17] net: sched: act_api: Introduce dynamic actions list Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 02/17] net/sched: act_api: increase action kind string length Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 03/17] net/sched: act_api: Update tc_action_ops to account for dynamic actions Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 04/17] net/sched: act_api: add struct p4tc_action_ops as a parameter to lookup callback Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 05/17] net: sched: act_api: Add support for preallocated dynamic action instances Jamal Hadi Salim
2023-10-08 15:26   ` Vlad Buslov
2023-09-30 14:35 ` [PATCH RFC v6 net-next 06/17] net: introduce rcu_replace_pointer_rtnl Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 07/17] rtnl: add helper to check if group has listeners Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 08/17] p4tc: add P4 data types Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 09/17] p4tc: add pipeline create, get, update, delete Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 10/17] p4tc: add header field create, get, delete, flush and dump Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 11/17] p4tc: add action template create, update, delete, get, " Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 12/17] p4tc: add table " Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 13/17] p4tc: add table entry create, update, get, delete, " Jamal Hadi Salim
2023-10-08 16:26   ` Vlad Buslov
2023-10-08 16:53     ` Andrew Lunn
2023-10-09  6:29       ` Vlad Buslov
2023-10-09 14:02     ` Jamal Hadi Salim
2023-10-09 14:27       ` Vlad Buslov
2023-10-09 15:39         ` Jamal Hadi Salim [this message]
2023-09-30 14:35 ` [PATCH RFC v6 net-next 14/17] p4tc: add set of P4TC table kfuncs Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 15/17] p4tc: add P4 classifier Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 16/17] p4tc: Add P4 extern interface Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 17/17] MAINTAINERS: add p4tc entry Jamal Hadi Salim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM0EoMnz-vFOizxqwgRocqiQtzwJ_6uk1fDvGHZce0yDpnRAGA@mail.gmail.com \
    --to=jhs@mojatatu.com \
    --cc=Mahesh.Shirshyad@amd.com \
    --cc=anjali.singhai@intel.com \
    --cc=davem@davemloft.net \
    --cc=deb.chatterjee@intel.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jiri@resnulli.us \
    --cc=kernel@mojatatu.com \
    --cc=khalidm@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=mattyk@nvidia.com \
    --cc=mleitner@redhat.com \
    --cc=namrata.limaye@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=toke@redhat.com \
    --cc=tom@sipanda.io \
    --cc=tomasz.osinski@intel.com \
    --cc=vladbu@nvidia.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).