From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51648C433EF for ; Tue, 12 Apr 2022 17:52:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346578AbiDLRyZ (ORCPT ); Tue, 12 Apr 2022 13:54:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233776AbiDLRyX (ORCPT ); Tue, 12 Apr 2022 13:54:23 -0400 Received: from nbd.name (nbd.name [IPv6:2a01:4f8:221:3d45::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C7F947AF4; Tue, 12 Apr 2022 10:52:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nbd.name; s=20160729; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:Subject: From:References:Cc:To:MIME-Version:Date:Message-ID:Sender:Reply-To:Content-ID :Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To: Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe :List-Post:List-Owner:List-Archive; bh=jmqFnXO2ZqZwvHpgNAB6QeZncj7ETg2mnMiU2ejcaaE=; b=kFJmPNgcrTrkb88HIJO/HCKje7 2eFdAtCq7U3T1febm9lJDI4aTtGD+sTL26jKqiTHXT+2O1kmvaGyAYr25staFA96BQImT3N5aEqpW Pr9lK7/XWH2DSeTm23KdBoN14/1TBE+drx6zgZkTSOe3Exh7Mz/gzG9+HaPa96DEZk8Y=; Received: from p57a6f1f9.dip0.t-ipconnect.de ([87.166.241.249] helo=nf.local) by ds12 with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1neKgD-0008Kt-FP; Tue, 12 Apr 2022 19:51:53 +0200 Message-ID: <2989e566-a1d2-2288-8ef3-759f20aa0c2e@nbd.name> Date: Tue, 12 Apr 2022 19:51:52 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Content-Language: en-US To: Andrew Lunn Cc: netdev@vger.kernel.org, John Crispin , Sean Wang , Mark Lee , "David S. Miller" , Jakub Kicinski , Paolo Abeni , Matthias Brugger , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-kernel@vger.kernel.org, Jiri Pirko , Ido Schimmel , Florian Fainelli , Vladimir Oltean References: <20220405195755.10817-1-nbd@nbd.name> <20220405195755.10817-15-nbd@nbd.name> <29cecc87-8689-6a73-a5ef-43eb2b8f33cd@nbd.name> From: Felix Fietkau Subject: Re: [PATCH v2 14/14] net: ethernet: mtk_eth_soc: support creating mac address based offload entries In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 12.04.22 19:37, Andrew Lunn wrote: >> It basically has to keep track of all possible destination ports, their STP >> state, all their fdb entries, member VLANs of all ports. It has to quickly >> react to changes in any of these. > > switchdev gives you all of those i think. DSA does not make use of > them all, in particularly the fdb entries, because of the low > bandwidth management link to the switch. But look at the Mellanox > switch, it keeps its hardware fdb entries in sync with the software > fdb. > > And you get every quick access to these, sometimes too quick in that > it is holding a spinlock when it calls the switchdev functions, and > you need to defer the handling in your driver if you want to use a > mutex, perform blocking IO etc. > >> In order to implement this properly, I would also need to make more changes >> to mac80211. Right now, mac80211 drivers do not have access to the >> net_device pointer of virtual interfaces. So mac80211 itself would likely >> need to implement the switchdev ops and handle some of this. > > So this again sounds like something which would be shared by IPA, and > any other hardware which can accelerate forwarding between WiFi and > some other sort of interface. I would really like to see an example of how this should be done. Is there a work in progress tree for IPA with offloading? Because the code that I see upstream doesn't seem to have any of that - or did I look in the wrong place? >> There are also some other issues where I don't know how this is supposed to >> be solved properly: >> On MT7622 most of the bridge ports are connected to a MT7531 switch using >> DSA. Offloading (lan->wlan bridging or L3/L4 NAT/routing) is not handled by >> the switch itself, it is handled by a packet processing engine in the SoC, >> which knows how to handle the DSA tags of the MT7531 switch. >> >> So if I were to handle this through switchdev implemented on the wlan and >> ethernet devices, it would technically not be part of the same switch, since >> it's a behind a different component with a different driver. > > What is important here is the user experience. The user is not > expected to know there is an accelerate being used. You setup the > bridge just as normal, using iproute2. You add routes in the normal > way, either by iproute2, or frr can add routes from OSPF, BGP, RIP or > whatever, via zebra. I'm not sure anybody has yet accelerated NAT, but > the same principle should be used, using iptables in the normal way, > and the accelerate is then informed and should accelerate it if > possible. Accelerated NAT on MT7622 is already present in the upstream code for a while. It's there for ethernet, and with my patches it also works for ethernet -> wlan. > switchdev gives you notification of when anything changes. You can > have multiple receivers of these notifications, so the packet > processor can act on them as well as the DSA switch. > >> Also, is switchdev able to handle the situation where only parts of the >> traffic is offloaded and the rest (e.g. multicast) is handled through the >> regular software path? > > Yes, that is not a problem. I deliberately use the term > accelerator. We accelerate what Linux can already do. If the > accelerator hardware is not capable of something, Linux still is, so > just pass it the frames and it will do the right thing. Multicast is a > good example of this, many of the DSA switch drivers don't accelerate > it. Don't get me wrong, I'm not against switchdev support at all. I just don't know how to do it yet, and the code that I put in place is useful for non-switchdev use cases as well. >> In my opinion, handling it through the TC offload has a number of >> advantages: >> - It's a lot simpler >> - It uses the same kind of offloading rules that my software fastpath >> already uses >> - It allows more fine grained control over which traffic should be offloaded >> (src mac -> destination MAC tuple) >> >> I also plan on extending my software fast path code to support emulating >> bridging of WiFi client mode interfaces. This involves doing some MAC >> address translation with some IP address tracking. I want that to support >> hardware offload as well. >> >> I really don't think that desire for supporting switchdev based offload >> should be a blocker for accepting this code now, especially since my >> implementation relies on existing Linux network APIs without inventing any >> new ones, and there are valid use cases for using it, even with switchdev >> support in place. > > What we need to avoid is fragmentation of the way we do things. It has > been decided that switchdev is how we use accelerators, and the user > should not really know anything about the accelerator. No other in > kernel network accelerator needs a user space component listening to > netlink notifications and programming the accelerator from user space. > Do we really want two ways to do this? There's always some overlap in what the APIs can do. And when it comes to the "client mode bridge" use case that I mentioned, I would also need exactly the same API that I put in place here. And this is not something that can (or even should) be done using switchdev. mac80211 prevents adding client mode interfaces to bridges for a reason. - Felix