From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A580AC4360F for ; Wed, 3 Apr 2019 16:44:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 34D332082C for ; Wed, 3 Apr 2019 16:44:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=Mellanox.com header.i=@Mellanox.com header.b="hnQX6j97" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728756AbfDCQnv (ORCPT ); Wed, 3 Apr 2019 12:43:51 -0400 Received: from mail-eopbgr10047.outbound.protection.outlook.com ([40.107.1.47]:41028 "EHLO EUR02-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726755AbfDCQm1 (ORCPT ); Wed, 3 Apr 2019 12:42:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=86LMvF3UOZglgHPDtaD3JZ4cDkodPj/89vWOk9EV/1U=; b=hnQX6j97LqyiPgT3rqfoiIq1WjS5NHB/eaZlF785ZKDQa+HpTdnNR07IDAyU9woU6BFK/UVt1HgxkXyO0ejFDSYgLD7WfkZbSlDpJnC+LYFNa/Jv+ZieBWY9pSPO4VG5pEceWjhZh/j2eJ0UR3Nw1tsl7IJWcb+M/lEEKo/ynGM= Received: from HE1PR0502MB3641.eurprd05.prod.outlook.com (10.167.127.11) by HE1PR0502MB2987.eurprd05.prod.outlook.com (10.175.30.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1750.17; Wed, 3 Apr 2019 16:42:17 +0000 Received: from HE1PR0502MB3641.eurprd05.prod.outlook.com ([fe80::50cb:2948:19aa:a023]) by HE1PR0502MB3641.eurprd05.prod.outlook.com ([fe80::50cb:2948:19aa:a023%5]) with mapi id 15.20.1750.017; Wed, 3 Apr 2019 16:42:17 +0000 From: Vlad Buslov To: John Hurley CC: Jiri Pirko , "davem@davemloft.net" , "xiyou.wangcong@gmail.com" , "netdev@vger.kernel.org" , Vlad Buslov , "oss-drivers@netronome.com" Subject: Re: [RFC net-next 1/1] net: sched: fix hw filter offload in tc flower Thread-Topic: [RFC net-next 1/1] net: sched: fix hw filter offload in tc flower Thread-Index: AQHU6hn3Krn9mbEBN0SQhXl33thouKYqpCyA Date: Wed, 3 Apr 2019 16:42:17 +0000 Message-ID: References: <1554295025-11056-1-git-send-email-john.hurley@netronome.com> In-Reply-To: <1554295025-11056-1-git-send-email-john.hurley@netronome.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: LO2P265CA0364.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:a3::16) To HE1PR0502MB3641.eurprd05.prod.outlook.com (2603:10a6:7:85::11) authentication-results: spf=none (sender IP is ) smtp.mailfrom=vladbu@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [37.142.13.130] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: e66d4ef5-1fd0-473a-5699-08d6b8535574 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(5600139)(711020)(4605104)(4618075)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7193020);SRVR:HE1PR0502MB2987; x-ms-traffictypediagnostic: HE1PR0502MB2987: x-microsoft-antispam-prvs: x-forefront-prvs: 0996D1900D x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(376002)(396003)(366004)(39860400002)(136003)(346002)(199004)(189003)(3846002)(52116002)(6246003)(54906003)(26005)(316002)(97736004)(2906002)(6512007)(76176011)(53936002)(6116002)(486006)(476003)(6436002)(256004)(5660300002)(25786009)(446003)(4326008)(36756003)(68736007)(8676002)(478600001)(2616005)(6916009)(81156014)(14454004)(105586002)(14444005)(81166006)(11346002)(71200400001)(71190400001)(305945005)(229853002)(7736002)(386003)(186003)(102836004)(6506007)(106356001)(8936002)(99286004)(66066001)(86362001)(6486002);DIR:OUT;SFP:1101;SCL:1;SRVR:HE1PR0502MB2987;H:HE1PR0502MB3641.eurprd05.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: PXTeN6vZE7BpNZntUqGfjonlcKvI1ifa0z8Ydx5iBeSGU9AS7z0LFQ9wTdN1HxpoPi1OeLG+auNXt+ipRGyxSH01QKkKmKWzXJyx++ZcfjKO/5+qFr5hslNtxFMBjqp4TJeC+NeIpTWSbFlhaneSqzUkGvYi42IK0lrXdlTjwG3Qt/lYpd3V8C9aVo4jjXakZq0RUjhSPI1qv3Opl12mBs7shkXdQuXGEFUJeDUrdMh+U/YcSPsclSWDweaDkY4tN63l7RiR6fFEy6IRmpnc7lm6UwrmatuK60cGubT/ig9fAe0uygbzX43uxVn5g/66gu8fwP5S7dT/UuT+LzTouc8oxdd2wu0sK+94+c3U1D6R+ZGS4UGrTm9qpNyClsBa+NbmCb8mzK3j+7xQglVwz3sdmou6ggKcsBPlarvTljk= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: e66d4ef5-1fd0-473a-5699-08d6b8535574 X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Apr 2019 16:42:17.7649 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0502MB2987 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed 03 Apr 2019 at 15:37, John Hurley wrote: > Recent refactoring of fl_change aims to use the classifier spinlock to > avoid the need for rtnl lock. In doing so, the fl_hw_replace_filer() > function was moved to before the lock is taken. This can create problems > for drivers if duplicate filters are created (commmon in ovs tc offload > due to filters being triggered by user-space matches). > > Drivers registered for such filters will now receive multiple copies of > the same rule, each with a different cookie value. This means that the > drivers would need to do a full match field lookup to determine > duplicates, repeating work that will happen in flower __fl_lookup(). > Currently, drivers do not expect to receive duplicate filters. > > Fix this by moving the hw_replace_function to after the software filter > processing. This way, offload messages are only triggered after they are > verified. Add a new flag to each filter that indicates that it is being > sent to hw. This prevents the flow from being deleted before processing i= s > finished, even if the tp lock is released. > > NOTE: > There may still be a race condition here with the reoffload function. Whe= n > the __SENDING_TO_HW bit is set we do not know if the filter has actually > been sent to HW or not at time of reoffload. This means we could > potentially not offload a valid flow here (or not delete one). > > Fixes: 620da4860827 ("net: sched: flower: refactor fl_change") > Signed-off-by: John Hurley > Reviewed-by: Simon Horman > Acked-by: Jakub Kicinski > --- > net/sched/cls_flower.c | 110 ++++++++++++++++++++++++++++++++++++++++---= ------ > 1 file changed, 90 insertions(+), 20 deletions(-) > > diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c > index 0638f17..e2c144f 100644 > --- a/net/sched/cls_flower.c > +++ b/net/sched/cls_flower.c > @@ -94,6 +94,10 @@ struct cls_fl_head { > struct idr handle_idr; > }; > > +enum cls_fl_filter_state_t { > + __SENDING_TO_HW, > +}; > + > struct cls_fl_filter { > struct fl_flow_mask *mask; > struct rhash_head ht_node; > @@ -113,6 +117,7 @@ struct cls_fl_filter { > */ > refcount_t refcnt; > bool deleted; > + unsigned long atomic_flags; > }; > > static const struct rhashtable_params mask_ht_params =3D { > @@ -542,12 +547,18 @@ static int __fl_delete(struct tcf_proto *tp, struct= cls_fl_filter *f, > > *last =3D false; > > +replay: > spin_lock(&tp->lock); > if (f->deleted) { > spin_unlock(&tp->lock); > return -ENOENT; > } > > + if (test_bit(__SENDING_TO_HW, &f->atomic_flags)) { > + spin_unlock(&tp->lock); > + goto replay; > + } > + > f->deleted =3D true; > rhashtable_remove_fast(&f->mask->ht, &f->ht_node, > f->mask->filter_ht_params); > @@ -1528,15 +1539,6 @@ static int fl_change(struct net *net, struct sk_bu= ff *in_skb, > if (err) > goto errout; > > - if (!tc_skip_hw(fnew->flags)) { > - err =3D fl_hw_replace_filter(tp, fnew, rtnl_held, extack); > - if (err) > - goto errout_mask; > - } > - > - if (!tc_in_hw(fnew->flags)) > - fnew->flags |=3D TCA_CLS_FLAGS_NOT_IN_HW; > - > spin_lock(&tp->lock); > > /* tp was deleted concurrently. -EAGAIN will cause caller to lookup > @@ -1544,15 +1546,18 @@ static int fl_change(struct net *net, struct sk_b= uff *in_skb, > */ > if (tp->deleting) { > err =3D -EAGAIN; > - goto errout_hw; > + goto errout_mask; > } > > refcount_inc(&fnew->refcnt); > if (fold) { > - /* Fold filter was deleted concurrently. Retry lookup. */ > - if (fold->deleted) { > + /* Fold filter was deleted or is being added to hw concurrently. > + * Retry lookup. > + */ > + if (fold->deleted || > + test_bit(__SENDING_TO_HW, &fold->atomic_flags)) { > err =3D -EAGAIN; > - goto errout_hw; > + goto errout_mask; > } > > fnew->handle =3D handle; > @@ -1560,7 +1565,7 @@ static int fl_change(struct net *net, struct sk_buf= f *in_skb, > err =3D rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node, > fnew->mask->filter_ht_params); > if (err) > - goto errout_hw; > + goto errout_mask; > > rhashtable_remove_fast(&fold->mask->ht, > &fold->ht_node, > @@ -1568,9 +1573,23 @@ static int fl_change(struct net *net, struct sk_bu= ff *in_skb, > idr_replace(&head->handle_idr, fnew, fnew->handle); > list_replace_rcu(&fold->list, &fnew->list); > fold->deleted =3D true; > + if (!tc_skip_hw(fnew->flags)) > + set_bit(__SENDING_TO_HW, &fnew->atomic_flags); > > spin_unlock(&tp->lock); > > + if (!tc_skip_hw(fnew->flags)) { > + err =3D fl_hw_replace_filter(tp, fnew, rtnl_held, extack); > + if (err) { > + spin_lock(&tp->lock); > + goto errout_hw_replace; > + } > + } > + > + if (!tc_in_hw(fnew->flags)) > + fnew->flags |=3D TCA_CLS_FLAGS_NOT_IN_HW; > + > + clear_bit(__SENDING_TO_HW, &fnew->atomic_flags); > fl_mask_put(head, fold->mask, true); > if (!tc_skip_hw(fold->flags)) > fl_hw_destroy_filter(tp, fold, rtnl_held, NULL); > @@ -1584,7 +1603,7 @@ static int fl_change(struct net *net, struct sk_buf= f *in_skb, > } else { > if (__fl_lookup(fnew->mask, &fnew->mkey)) { > err =3D -EEXIST; > - goto errout_hw; > + goto errout_mask; > } > > if (handle) { > @@ -1606,7 +1625,7 @@ static int fl_change(struct net *net, struct sk_buf= f *in_skb, > INT_MAX, GFP_ATOMIC); > } > if (err) > - goto errout_hw; > + goto errout_mask; > > fnew->handle =3D handle; > > @@ -1616,7 +1635,22 @@ static int fl_change(struct net *net, struct sk_bu= ff *in_skb, > goto errout_idr; > > list_add_tail_rcu(&fnew->list, &fnew->mask->filters); > + if (!tc_skip_hw(fnew->flags)) > + set_bit(__SENDING_TO_HW, &fnew->atomic_flags); > + > spin_unlock(&tp->lock); > + > + if (!tc_skip_hw(fnew->flags)) { > + err =3D fl_hw_replace_filter(tp, fnew, rtnl_held, extack); > + if (err) { > + spin_lock(&tp->lock); > + goto errout_rhash; > + } > + } > + > + if (!tc_in_hw(fnew->flags)) > + fnew->flags |=3D TCA_CLS_FLAGS_NOT_IN_HW; > + clear_bit(__SENDING_TO_HW, &fnew->atomic_flags); > } > > *arg =3D fnew; > @@ -1625,13 +1659,37 @@ static int fl_change(struct net *net, struct sk_b= uff *in_skb, > kfree(mask); > return 0; > > +errout_hw_replace: > + if (rhashtable_insert_fast(&fold->mask->ht, &fold->ht_node, > + fold->mask->filter_ht_params)) { > + NL_SET_ERR_MSG(extack, "Filter replace failed and could not revert."); > + fl_mask_put(head, fold->mask, true); > + if (!tc_skip_hw(fold->flags)) > + fl_hw_destroy_filter(tp, fold, rtnl_held, NULL); > + tcf_unbind_filter(tp, &fold->res); > + tcf_exts_get_net(&fold->exts); > + refcount_dec(&fold->refcnt); > + __fl_put(fold); > + } else { > + idr_replace(&head->handle_idr, fold, fold->handle); > + list_replace_rcu(&fnew->list, &fold->list); > + fold->deleted =3D false; > + } > +errout_rhash: > + if (fnew->deleted) { > + spin_unlock(&tp->lock); > + kfree(tb); > + kfree(mask); > + return err; > + } > + list_del_rcu(&fnew->list); > + rhashtable_remove_fast(&fnew->mask->ht, &fnew->ht_node, > + fnew->mask->filter_ht_params); > + fnew->deleted =3D true; > errout_idr: > idr_remove(&head->handle_idr, fnew->handle); > -errout_hw: > - spin_unlock(&tp->lock); > - if (!tc_skip_hw(fnew->flags)) > - fl_hw_destroy_filter(tp, fnew, rtnl_held, NULL); > errout_mask: > + spin_unlock(&tp->lock); > fl_mask_put(head, fnew->mask, true); > errout: > tcf_exts_destroy(&fnew->exts); > @@ -1669,6 +1727,12 @@ static void fl_walk(struct tcf_proto *tp, struct t= cf_walker *arg, > arg->count =3D arg->skip; > > while ((f =3D fl_get_next_filter(tp, &arg->cookie)) !=3D NULL) { > + if (test_bit(__SENDING_TO_HW, &f->atomic_flags)) { > + __fl_put(f); > + arg->cookie++; > + continue; > + } > + > if (arg->fn(tp, f, arg) < 0) { > __fl_put(f); > arg->stop =3D 1; > @@ -1695,6 +1759,9 @@ static int fl_reoffload(struct tcf_proto *tp, bool = add, tc_setup_cb_t *cb, > if (tc_skip_hw(f->flags)) > continue; > > + if (test_bit(__SENDING_TO_HW, &f->atomic_flags)) > + continue; > + > cls_flower.rule =3D > flow_rule_alloc(tcf_exts_num_actions(&f->exts)); > if (!cls_flower.rule) > @@ -2273,6 +2340,9 @@ static int fl_dump(struct net *net, struct tcf_prot= o *tp, void *fh, > if (!f) > return skb->len; > > + if (test_bit(__SENDING_TO_HW, &f->atomic_flags)) > + return skb->len; > + > t->tcm_handle =3D f->handle; > > nest =3D nla_nest_start(skb, TCA_OPTIONS); Hi John, I understand problem that you are trying to fix, but I intentionally made fl_change() to offload filters before making them visible to concurrent tasks through flower data structures (hashtable, idr, linked list) because it removes the headache that you are dealing with by means of __SENDING_TO_HW flag. Maybe we can come up with something simpler? For example, we can check for duplicates and insert the filter before calling hw offloads to hash table only, and mark it with something like your __SENDING_TO_HW flag. Hashtable is only used for fast path lookup and for checking for duplicates in fl_chage(), which means that flow is not visible to concurrent accesses through fl_walk() and fl_get(). With this, we can modify fl_lookup() to return NULL for all flows marked with __SENDING_TO_HW flag in order for the flow to be ignored by fast path. I might be missing something, but such implementation would be much simpler and less error prone, and wouldn't race with reoffload. You need not be fixing my bug instead of me, so please let me know if you prefer to work on it yourself or let me do it. Thanks, Vlad