From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35D97C10F0E for ; Tue, 9 Apr 2019 17:10:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EBE4020850 for ; Tue, 9 Apr 2019 17:10:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=netronome-com.20150623.gappssmtp.com header.i=@netronome-com.20150623.gappssmtp.com header.b="PRzuo3a1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726562AbfDIRK2 (ORCPT ); Tue, 9 Apr 2019 13:10:28 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:33485 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbfDIRK2 (ORCPT ); Tue, 9 Apr 2019 13:10:28 -0400 Received: by mail-qt1-f196.google.com with SMTP id k14so20741714qtb.0 for ; Tue, 09 Apr 2019 10:10:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=u23oaY4DORFjVsk9hOUPu1fNYTQ4eBDbmQA5ar0vEAc=; b=PRzuo3a1jlcvAmdku8gdvcntB88dKZx8UfRfsgvmN/hLKz93aB9BzRa0Ubku9YL12j rVYk8CUMSg6QYwvnvkiaFydIK5i7AdnZbPu/TzL8xt4ghIBUB/6JLGdkxPkzbYCnC5fP 9si7mF2UuK/B8m25wOiHjBr2fHApXxas5or6LwrDUtiydTOfatcmU+oKB0l0GZ4IFIYT iKwLQqYMO0fN73lJRdHaRrNJ5vIynEax3YyVa2srRLTYy6oVUz1cL4i6w9L3ze1vc97w w44c+k9iogmoa0kIsZwVpvo7iGeJoYt6jk00ZBwnCBlRMTPTO6Byh4QAk3cgnpRwC3Cu Fz8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=u23oaY4DORFjVsk9hOUPu1fNYTQ4eBDbmQA5ar0vEAc=; b=rWDtxOsOLUgZI3W8ncBLHHOXwuoVFYBlfTm/mvv9MZOJgNSm8A8K4mXeehgO5PBFBh bc5pXkYsQEv/hhMph5dWF8IiiEqu/ExehVpEI8xVH7atF4QD5MrhWsTMVXENgFwoT8pT SuidnucNsCUWjvz3nMWTaUUOqNCniNZnUdnIXbKFiFTL7NYJRS2wFCMsar0e519LdntT u5basUHlPyVjFtSiSKyG0+7WlVTG7iOhL4RI9ciGUCG8O9Eyb2pbygQ6i6+/tQrpxMmt TSaJvJVwIwFncSXNzupKA5Pyj0FL2mZQVIgS1RSRZfWKbXwG3PbrHVzdyUrQFmhQ8wT8 DnHg== X-Gm-Message-State: APjAAAUKQM0hthOrnTN5H7DuJ4VKpUwhj25XLfxLpmGT6E9QNWeI2X8H 3adc/NlUa1j0iI7Cc8OYLZ+VG2zD8P0= X-Google-Smtp-Source: APXvYqyP9ecfc6BKoR+XHP5s+TCRZezbd91f5ZKyrKzOoeW4hdaWyPVXAahk8nwwQQG4a3DU6Qq7GA== X-Received: by 2002:a0c:b05c:: with SMTP id l28mr30469498qvc.95.1554829827495; Tue, 09 Apr 2019 10:10:27 -0700 (PDT) Received: from cakuba.netronome.com ([66.60.152.14]) by smtp.gmail.com with ESMTPSA id t30sm4636849qkm.25.2019.04.09.10.10.26 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 09 Apr 2019 10:10:27 -0700 (PDT) Date: Tue, 9 Apr 2019 10:10:23 -0700 From: Jakub Kicinski To: Vlad Buslov Cc: "netdev@vger.kernel.org" , "jhs@mojatatu.com" , "xiyou.wangcong@gmail.com" , "jiri@resnulli.us" , "davem@davemloft.net" , "john.hurley@netronome.com" Subject: Re: [PATCH net-next] net: sched: flower: insert filter to ht before offloading it to hw Message-ID: <20190409101023.4fe0ad04@cakuba.netronome.com> In-Reply-To: References: <20190405175626.4123-1-vladbu@mellanox.com> <20190408152655.1891ee77@cakuba.netronome.com> Organization: Netronome Systems, Ltd. MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, 9 Apr 2019 08:23:40 +0000, Vlad Buslov wrote: > On Tue 09 Apr 2019 at 01:26, Jakub Kicinski wrote: > > On Fri, 5 Apr 2019 20:56:26 +0300, Vlad Buslov wrote: > >> John reports: > >> > >> Recent refactoring of fl_change aims to use the classifier spinlock to > >> avoid the need for rtnl lock. In doing so, the fl_hw_replace_filer() > >> function was moved to before the lock is taken. This can create problems > >> for drivers if duplicate filters are created (commmon in ovs tc offload > >> due to filters being triggered by user-space matches). > >> > >> Drivers registered for such filters will now receive multiple copies of > >> the same rule, each with a different cookie value. This means that the > >> drivers would need to do a full match field lookup to determine > >> duplicates, repeating work that will happen in flower __fl_lookup(). > >> Currently, drivers do not expect to receive duplicate filters. > >> > >> To fix this, verify that filter with same key is not present in flower > >> classifier hash table and insert the new filter to the flower hash table > >> before offloading it to hardware. Implement helper function > >> fl_ht_insert_unique() to atomically verify/insert a filter. > >> > >> This change makes filter visible to fast path at the beginning of > >> fl_change() function, which means it can no longer be freed directly in > >> case of error. Refactor fl_change() error handling code to deallocate the > >> filter with rcu timeout. > >> > >> Fixes: 620da4860827 ("net: sched: flower: refactor fl_change") > >> Reported-by: John Hurley > >> Signed-off-by: Vlad Buslov > > > > How is re-offload consistency guaranteed? IIUC the code is: > > > > insert into HT > > offload > > insert into IDR > > > > What guarantees re-offload consistency if new callback is added just > > after offload is requested but before rules ends up in IDR? > > Hi Jakub, > > At the moment cls hardware offloads API is always called with rtnl lock, > so rule can't be offloaded while reoffload is in progress. Does that somehow imply atomicity of offloading vs inserting into IDR? Doesn't seem so from a cursory look. Or do you mean rtnl_held is always true? > For my next patch set that unlocks the offloads API I implemented the > algorithm to track reoffload count for each tp that works like this: > > 1. struct tcf_proto is extended with reoffload_count counter that > incremented each time reoffload is called on particular tp instance. > Counter is protected by tp->lock. > > 2. struct cls_fl_filter is also extended with reoffload_count counter. > Its value is set to current tp->reoffload_count when offloading the > filter. > > 3. After offloading the filter, but before inserting it to idr, > f->reoffload_count is compared with tp->reoffload_count. If values > don't match, filter is deleted and -EAGAIN is returned. Cls API > retries filter insertion on -EAGAIN. Sounds good for add. Does this solve delete case as well? CPU 0 CPU 1 __fl_delete IDR remove cb unregister hw delete all flows <- doesn't see the remove in progress hw delete <- doesn't see the removed cb