From: Jamal Hadi Salim <jhs@mojatatu.com>
To: Vlad Buslov <vladbu@nvidia.com>
Cc: netdev@vger.kernel.org, deb.chatterjee@intel.com,
anjali.singhai@intel.com, namrata.limaye@intel.com,
tom@sipanda.io, mleitner@redhat.com, Mahesh.Shirshyad@amd.com,
tomasz.osinski@intel.com, jiri@resnulli.us,
xiyou.wangcong@gmail.com, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
horms@kernel.org, kernel@mojatatu.com, khalidm@nvidia.com,
toke@redhat.com, mattyk@nvidia.com
Subject: Re: [PATCH RFC v6 net-next 13/17] p4tc: add table entry create, update, get, delete, flush and dump
Date: Mon, 9 Oct 2023 10:02:11 -0400 [thread overview]
Message-ID: <CAM0EoM=2ObA1yrasNWRFoSzB+JZ0su2TKrXH-D0k+Pth=aOUxg@mail.gmail.com> (raw)
In-Reply-To: <87edi5ysun.fsf@nvidia.com>
Hi Vlad,
On Sun, Oct 8, 2023 at 12:36 PM Vlad Buslov <vladbu@nvidia.com> wrote:
>
> On Sat 30 Sep 2023 at 10:35, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
[..trimmed...]
> > +/* Invoked from both control and data path */
> > +static int __p4tc_table_entry_update(struct p4tc_pipeline *pipeline,
> > + struct p4tc_table *table,
> > + struct p4tc_table_entry *entry,
> > + struct p4tc_table_entry_mask *mask,
> > + u16 whodunnit, bool from_control)
> > +__must_hold(RCU)
> > +{
> > + struct p4tc_table_entry_mask *mask_found = NULL;
> > + struct p4tc_table_entry_work *entry_work;
> > + struct p4tc_table_entry_value *value_old;
> > + struct p4tc_table_entry_value *value;
> > + struct p4tc_table_entry *entry_old;
> > + struct p4tc_table_entry_tm *tm_old;
> > + struct p4tc_table_entry_tm *tm;
> > + int ret;
> > +
> > + value = p4tc_table_entry_value(entry);
> > + /* We set it to zero on create an update to avoid having entry
> > + * deletion in parallel before we report to user space.
> > + */
> > + refcount_set(&value->entries_ref, 0);
>
> TBH I already commented on one of the previous versions of this series
> that it is very hard to understand and review tons of different atomic
> reference counters, especially when they are modified with functions
> like refcount_dec_not_one() or unconditional set like in this place.
>
> I chose specifically this function because __must_hold(RCU) makes it
> look like it can be accessed concurrently from datapath, which is not
> obvious on multiple previous usages of reference counters in the series.
True, tables can be manipulated from control plane/user space,
datapath as well as timers (mostly for delete).
Would using wrappers around these incr/decr help? i mean meaningful
inlines that will provide clarity as to what an incr/decr is?
> So what happens here if entries_ref was 0 to begin with? Or is possible
> for this function to be executed concurrently by multiple tasks, in
> which case all of them set entries_ref to 0, but first one that finishes
> resets the counter back to 1 at which point I assume it can be deleted
> in parallel by control path while some concurrent
> __p4tc_table_entry_update() are still running (at least that is what the
> comment here indicates)?
It's rtnl-lockless, so you can imagine what would happen there ;->
Multiple concurent user space, kernel, timers all contending for this.
Exactly what you said: its zero in this case because some entity could
delete it in parallel.
See comment further down which says "In case of parallel update, the
thread that arrives here first will..."
Consider it a poor man's lock. Does that help? Perhaps we could have
more discussion at the monthly tc meetup..
We have been testing this code a lot for concurrency and wrote some
user space tooling to catch such issues.
>
> > +
> > + if (table->tbl_type != P4TC_TABLE_TYPE_EXACT) {
> > + mask_found = p4tc_table_entry_mask_add(table, entry, mask);
> > + if (IS_ERR(mask_found)) {
> > + ret = PTR_ERR(mask_found);
> > + goto out;
> > + }
> > + }
> > +
> > + p4tc_table_entry_build_key(table, &entry->key, mask_found);
> > +
> > + entry_old = p4tc_entry_lookup(table, &entry->key, value->prio);
> > + if (!entry_old) {
> > + ret = -ENOENT;
> > + goto rm_masks_idr;
> > + }
> > +
> > + /* In case of parallel update, the thread that arrives here first will
> > + * get the right to update.
> > + *
> > + * In case of a parallel get/update, whoever is second will fail appropriately
> > + */
> > + value_old = p4tc_table_entry_value(entry_old);
> > + if (!p4tc_tbl_entry_put(value_old)) {
> > + ret = -EAGAIN;
> > + goto rm_masks_idr;
> > + }
> > +
> > + if (from_control) {
> > + if (!p4tc_ctrl_update_ok(value_old->permissions)) {
> > + ret = -EPERM;
> > + goto set_entries_refcount;
> > + }
> > + } else {
> > + if (!p4tc_data_update_ok(value_old->permissions)) {
> > + ret = -EPERM;
> > + goto set_entries_refcount;
> > + }
> > + }
> > +
> > + tm = kzalloc(sizeof(*tm), GFP_ATOMIC);
> > + if (unlikely(!tm)) {
> > + ret = -ENOMEM;
> > + goto set_entries_refcount;
> > + }
> > +
> > + tm_old = rcu_dereference_protected(value_old->tm, 1);
> > + *tm = *tm_old;
> > +
> > + tm->lastused = jiffies;
> > + tm->who_updated = whodunnit;
> > +
> > + if (value->permissions == P4TC_PERMISSIONS_UNINIT)
> > + value->permissions = value_old->permissions;
> > +
> > + rcu_assign_pointer(value->tm, tm);
> > +
> > + entry_work = kzalloc(sizeof(*(entry_work)), GFP_ATOMIC);
> > + if (unlikely(!entry_work)) {
> > + ret = -ENOMEM;
> > + goto free_tm;
> > + }
> > +
> > + entry_work->pipeline = pipeline;
> > + entry_work->table = table;
> > + entry_work->entry = entry;
> > + value->entry_work = entry_work;
> > + if (!value->is_dyn)
> > + value->is_dyn = value_old->is_dyn;
> > +
> > + if (value->is_dyn) {
> > + /* Only use old entry value if user didn't specify new one */
> > + value->aging_ms = value->aging_ms ?: value_old->aging_ms;
> > +
> > + hrtimer_init(&value->entry_timer, CLOCK_MONOTONIC,
> > + HRTIMER_MODE_REL);
> > + value->entry_timer.function = &entry_timer_handle;
> > +
> > + hrtimer_start(&value->entry_timer, ms_to_ktime(value->aging_ms),
> > + HRTIMER_MODE_REL);
> > + }
> > +
> > + INIT_WORK(&entry_work->work, p4tc_table_entry_del_work);
> > +
> > + if (rhltable_insert(&table->tbl_entries, &entry->ht_node,
> > + entry_hlt_params) < 0) {
> > + ret = -EEXIST;
> > + goto free_entry_work;
> > + }
> > +
> > + p4tc_table_entry_destroy_noida(table, entry_old);
> > +
> > + if (!from_control)
> > + p4tc_tbl_entry_emit_event(entry_work, RTM_P4TC_UPDATE,
> > + GFP_ATOMIC);
> > +
> > + return 0;
> > +
> > +free_entry_work:
> > + kfree(entry_work);
> > +
> > +free_tm:
> > + kfree(tm);
> > +
> > +set_entries_refcount:
> > + refcount_set(&value_old->entries_ref, 1);
> > +
> > +rm_masks_idr:
> > + if (table->tbl_type != P4TC_TABLE_TYPE_EXACT)
> > + p4tc_table_entry_mask_del(table, entry);
> > +
> > +out:
> > + return ret;
> > +}
> > +
[..trimmed..]
cheers,
jamal
next prev parent reply other threads:[~2023-10-09 14:02 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-30 14:35 [PATCH RFC v6 net-next 00/17] Introducing P4TC Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 01/17] net: sched: act_api: Introduce dynamic actions list Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 02/17] net/sched: act_api: increase action kind string length Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 03/17] net/sched: act_api: Update tc_action_ops to account for dynamic actions Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 04/17] net/sched: act_api: add struct p4tc_action_ops as a parameter to lookup callback Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 05/17] net: sched: act_api: Add support for preallocated dynamic action instances Jamal Hadi Salim
2023-10-08 15:26 ` Vlad Buslov
2023-09-30 14:35 ` [PATCH RFC v6 net-next 06/17] net: introduce rcu_replace_pointer_rtnl Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 07/17] rtnl: add helper to check if group has listeners Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 08/17] p4tc: add P4 data types Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 09/17] p4tc: add pipeline create, get, update, delete Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 10/17] p4tc: add header field create, get, delete, flush and dump Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 11/17] p4tc: add action template create, update, delete, get, " Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 12/17] p4tc: add table " Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 13/17] p4tc: add table entry create, update, get, delete, " Jamal Hadi Salim
2023-10-08 16:26 ` Vlad Buslov
2023-10-08 16:53 ` Andrew Lunn
2023-10-09 6:29 ` Vlad Buslov
2023-10-09 14:02 ` Jamal Hadi Salim [this message]
2023-10-09 14:27 ` Vlad Buslov
2023-10-09 15:39 ` Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 14/17] p4tc: add set of P4TC table kfuncs Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 15/17] p4tc: add P4 classifier Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 16/17] p4tc: Add P4 extern interface Jamal Hadi Salim
2023-09-30 14:35 ` [PATCH RFC v6 net-next 17/17] MAINTAINERS: add p4tc entry Jamal Hadi Salim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAM0EoM=2ObA1yrasNWRFoSzB+JZ0su2TKrXH-D0k+Pth=aOUxg@mail.gmail.com' \
--to=jhs@mojatatu.com \
--cc=Mahesh.Shirshyad@amd.com \
--cc=anjali.singhai@intel.com \
--cc=davem@davemloft.net \
--cc=deb.chatterjee@intel.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=kernel@mojatatu.com \
--cc=khalidm@nvidia.com \
--cc=kuba@kernel.org \
--cc=mattyk@nvidia.com \
--cc=mleitner@redhat.com \
--cc=namrata.limaye@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=toke@redhat.com \
--cc=tom@sipanda.io \
--cc=tomasz.osinski@intel.com \
--cc=vladbu@nvidia.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).