From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B5A8C10F11 for ; Wed, 24 Apr 2019 15:03:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 279AC2084F for ; Wed, 24 Apr 2019 15:03:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730695AbfDXPDp (ORCPT ); Wed, 24 Apr 2019 11:03:45 -0400 Received: from dispatch1-us1.ppe-hosted.com ([67.231.154.164]:37056 "EHLO dispatch1-us1.ppe-hosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730588AbfDXPDp (ORCPT ); Wed, 24 Apr 2019 11:03:45 -0400 X-Virus-Scanned: Proofpoint Essentials engine Received: from webmail.solarflare.com (webmail.solarflare.com [12.187.104.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mx1-us3.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTPS id E5FC7B800D3; Wed, 24 Apr 2019 15:03:42 +0000 (UTC) Received: from [10.17.20.203] (10.17.20.203) by ocex03.SolarFlarecom.com (10.20.40.36) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Wed, 24 Apr 2019 08:03:39 -0700 Subject: Re: TC stats / hw offload question To: Pablo Neira Ayuso CC: Jamal Hadi Salim , netdev , "Jiri Pirko" , Cong Wang References: <26f0cfc9-3bef-8579-72cc-aa6c5ccecd43@solarflare.com> <4cb765dd-453f-3139-bce6-6e0b31167aec@mojatatu.com> <20190424141139.5c5vhihie5mryxlt@salvia> From: Edward Cree Message-ID: <26afcaaf-abdf-42ad-1715-5af9c6f3c2ef@solarflare.com> Date: Wed, 24 Apr 2019 16:03:37 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.0 MIME-Version: 1.0 In-Reply-To: <20190424141139.5c5vhihie5mryxlt@salvia> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Content-Language: en-GB X-Originating-IP: [10.17.20.203] X-TM-AS-Product-Ver: SMEX-12.5.0.1300-8.5.1010-24570.005 X-TM-AS-Result: No-11.403900-4.000000-10 X-TMASE-MatchedRID: Kx0w2sAofbsbF9xF7zzuNSa1MaKuob8PC/ExpXrHizydCqKtxM6bh4Md LBQl5wHaooZvDjMmKWCC2+nAZJSfpiV0r9GRzgZajoyKzEmtrEe4vBuE2X0HlSJ8zskw0dbrVMX qnzGYjNEBr35gzwsy0YAEXFy6OTH9llhxSlteu7hOBLd+2M4V1OTWKSbLQnNIir0sB2fc8VT8/L 2o6LnOl+i8wPg8Gcd12cUr6gabECwEqR0OrR3XX6H20WQcgZgOX5TqQagR07d+YesuCgkiXLhRB UHVcXW44jC9Jt8Nx0OglZoIjy9eZS9FtW7XfHueV6iWWmDPLEDjmgMQ17h5699RlPzeVuQQ5O0D d1w86cymAlkvzKcnVj14W1LSttBnuUeK9BOG8ee5PNqdvNvYV99kHLH9StneHFSQk97VYGq0P6h mZ6xHlCzsWdFzpkEwBPwACG8gzOFuzw/aNxRy78pQKjU7fBXVf6/Md8Lb2l9X14Hy+eYp73JS81 mqB724QdSa/xeIHE4FfXI7LH9oyZPhC4EdGR/2dXu122+iJtq2McZY43zJ4/HFoBcOsKezlwWf7 /4SyDve+yj2N5XPjl+24nCsUSFNjaPj0W1qn0TKayT/BQTiGgwWxr7XDKH86ROqR1NBspPY+E3b HLg1EtRtrv0hPLYspMbUsFJDQ3Skb4CkMZGxEw== X-TM-AS-User-Approved-Sender: No X-TM-AS-User-Blocked-Sender: No X-TMASE-Result: 10--11.403900-4.000000 X-TMASE-Version: SMEX-12.5.0.1300-8.5.1010-24570.005 X-MDID: 1556118223-gTofCO9w4rh2 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 24/04/2019 15:11, Pablo Neira Ayuso wrote: > On Wed, Apr 24, 2019 at 03:05:05PM +0100, Edward Cree wrote: >> So, I had this working for a while, by calling tcf_action_stats_update() >>  directly on the correct element of tc->exts->actions[], instead of using >>  tcf_exts_stats_update().  And this was fine, until I tried to port my >>  code to the 5.1 kernel, with Pablo's flow action infrastructure.  On that >>  it's not possible, because there is only flow_stats_update(), which takes >>  a single bytes value for _all_ the actions in the rule. > Then, we need to extend the API to allow to update counters per > action, no driver in the tree has been supporting this so far. May I > have a look at your driver code to infer what we might need? > > Thanks. Somewhat abridged and out-of-context, because this is for stuff that's a  long way away from being ready to upstream (mumble unreleased hardware),  here's the counter-handling part of it. struct efx_tc_action_set {     u16 vlan_push:2;     u16 vlan_pop:2;     /* other action flag bits */     __be16 vlan_tci[2]; /* TCIs for vlan_push */     __be16 vlan_proto[2]; /* Ethertypes for vlan_push */     struct efx_tc_counter_index *count;     int count_action_idx; /* Which tc_action uses the counter */     int count_size_delta; /* bytes per packet change */     /* other action metadata and gubbins */ }; static void calculate_count_delta(struct efx_tc_action_set *act) {     act->count_size_delta = 0;     if (act->vlan_pop & 2)         act->count_size_delta -= 4;     if (act->vlan_pop & 1)         act->count_size_delta -= 4;     if (act->vlan_push & 1)         act->count_size_delta += 4;     if (act->vlan_push & 2)         act->count_size_delta += 4; } static int efx_tc_flower_replace(struct efx_nic *efx,                                  struct net_device *net_dev,                                  struct tc_cls_flower_offload *tc) {     struct efx_tc_action_set *act;     /* parse the match */     tcf_exts_for_each_action(i, a, tc->exts) {         if (a->ops && a->ops->stats_update) {             /* act is the hw action we're building */             act->count = allocate_a_counter();             act->count_action_idx = i;             calculate_count_delta(act);         }         /* handle the actual action */         /* in the case of 'mirror' we'll stick 'act' on a list and          * allocate a new one that starts out the same, to represent          * the piped packet.          */     }     /* insert the rule in hw*/     return ...; } static int efx_tc_flower_stats(struct efx_nic *efx, struct net_device *net_dev,                                struct tc_cls_flower_offload *tc) {     struct efx_tc_action_set *act;     struct efx_tc_counter *cnt;     u64 packets, bytes;     rule = /* lookup the rule */;     list_for_each_entry(act, &/*list of rule actions*/, list)         if (act->count) {             struct tc_action *a;             u64 d_packets, d_bytes;             cnt = /* get the counter */;             /* Report only new pkts/bytes since last time TC asked */             packets = cnt->packets;             bytes = cnt->bytes;             d_packets = packets - cnt->old_packets;             /* Apply corrections for any VLAN pops/pushes before              * our count action (but after the HW counter)              */             d_bytes = bytes - cnt->old_bytes +                       act->count_size_delta * d_packets;             a = tc->exts->actions[act->count_action_idx];             tcf_action_stats_update(a, d_bytes, d_packets,                                     cnt->touched, true);             cnt->old_packets = packets;             cnt->old_bytes = bytes;         }     return 0; } With this code, if for instance the rule is tc match using flower blah \ action mirred egress mirror eth1 \ action vlan push blah \ action mirred egress redirect eth2 and a packed of length 100 bytes is matched, then tc stats show will report: action mirred egress mirror eth1:     1 packets, 100 bytes action vlan push blah     0 packets, 0 bytes /* Doesn't have an ops->stats_update */ action mirred egress redirect eth2     1 packets, 104 bytes which AIUI was the correct semantics, although no drivers upstream did it. TC does have all the information it needs to make those corrections itself  (it knows that the mirred eth2 comes after a vlan push), but currently it  doesn't do so. So one possible solution would be for flow_stats_update() to take bytes 'as  matched', and make the corrections as it goes along.  Unfortunately, this  doesn't quite work for decap (tunnel_key unset), because the length of the  tunnel header is not statically known.  (For this reason, our hw counts  bytes _after_ decap but _before_ other actions like VLAN push/pop.)  Since  from TC's perspective tunnel_key is metadata rather than packet data, this  works if we are willing to define the counters as not including tunnel  headers — that's what I've done currently.  Then VLAN push/pop are the  only TC actions that alter packet length.  It is however a bit unintuitive  that in the case of     tc filter add dev vxlan0 ... flower blah \     action mirred egress mirror eth1     action tunnel_key unset     action mirred egress redirect eth2  both mirreds will show the same (decapped) bytes count, yet eth1 will (at  least as I've interpreted it for offload) receive the packet with the  encap header still present. So to implement this we could do something like (include/net/pkt_cls.h,  written in mail client & not at all tested...) static inline void tcf_exts_stats_update(const struct tcf_exts *exts,                       u64 bytes, u64 packets, u64 lastuse) { #ifdef CONFIG_NET_CLS_ACT     int i, delta = 0;     preempt_disable();     for (i = 0; i < exts->nr_actions; i++) {         struct tc_action *a = exts->actions[i];         tcf_action_stats_update(a, bytes + packets * delta, packets,                                 lastuse, true);         if (is_tcf_vlan(a))             switch (tcf_vlan_action(a)) {             case TCA_VLAN_ACT_POP:                 delta -= 4;                 break;             case TCA_VLAN_ACT_PUSH:                 delta += 4;                 break;             case TCA_VLAN_ACT_MODIFY:             default:                 break;             }     }     preempt_enable(); #endif } Does that seem reasonable? -Ed