From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24567C43381 for ; Mon, 25 Mar 2019 17:02:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DBBFC20896 for ; Mon, 25 Mar 2019 17:02:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="km3mkEw5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729882AbfCYRCR (ORCPT ); Mon, 25 Mar 2019 13:02:17 -0400 Received: from mail-pl1-f195.google.com ([209.85.214.195]:39859 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729829AbfCYRCL (ORCPT ); Mon, 25 Mar 2019 13:02:11 -0400 Received: by mail-pl1-f195.google.com with SMTP id b65so235666plb.6 for ; Mon, 25 Mar 2019 10:02:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=mUIaUIoCDTJmDs1AgUtemx/6xJoIzJRQRdhlGZ+pUnA=; b=km3mkEw5uD9mW7/IR/jHsJtfk4OFQMt557ls3ror2A16Rs7aXT61sgtRHQWAiv+Lkv 5oGvVveY7MPqrBg0NakFPO2xGFhfryrXI80vtnB/L6ayDc/nGY/npmNSmIif0y/YgKMs 7MJ0yt5rGSHxGLJiBR9Npm4SVxonOg9xeotV3lKu9aGpXFMRMNmOk4H30k8Od855HL3D 6o2Z8pBEVsxbtpIMxS4C40xhzHdaFjJq4GyG1ZyaWrkm98g3c82fkqqc9TSZnI/RJQKJ nykbZkOee81XxY2pWk67lly7weZ1b1qPrgchhFeFCbWLlOsqduO1KghOg6K1phE7l15C ZBTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=mUIaUIoCDTJmDs1AgUtemx/6xJoIzJRQRdhlGZ+pUnA=; b=V181NOuW8VY56tYj2EWiiBfBdOXMoxUhI4aedfHxrxyvP1FWJfslKjpQhB8k9UGi7r /d0c1mDUz3ukZYLul5tTwPMz7fkL5O1oQbUSH+b/oVSBRVyL2nfuXG4KUWwTuj+GwcKu pNwAi6mSfAEH389gUd+hrhqSXsCxM1s3uJckvXUXzRYSe25PC4mw6fTNO7+K90Pwtdhr dhJy2uh6PViHQ6fO2X8BdRbR94f4WWfdMc9STaIEqO/lmV2kDFGRdxPj4au4RS49i/Yh tjnryZXu4mxxOQTlxeZqCaGVGG6xc7ze9Qot/jnsKTpttfpS2hWWfYGrPitDwQCdpBxl ANmw== X-Gm-Message-State: APjAAAViPSNDNKg/8+RleygROlu9uop9tKyoUPmVQb4tiqg81pA+CsFM SUq6WIj3r+LR5pFj/YSlCco= X-Google-Smtp-Source: APXvYqxtotQJ7W1H7FNcIpN4aaq0ck66fdzq8EdZNTGbxHsizSaTwtHq0iZOWKq38UHxLCaM3ordyA== X-Received: by 2002:a17:902:968b:: with SMTP id n11mr19492994plp.118.1553533330081; Mon, 25 Mar 2019 10:02:10 -0700 (PDT) Received: from ?IPv6:2601:282:800:fd80:9570:70fa:138c:c89? ([2601:282:800:fd80:9570:70fa:138c:c89]) by smtp.googlemail.com with ESMTPSA id h13sm21488940pfn.114.2019.03.25.10.02.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 10:02:09 -0700 (PDT) Subject: Re: [PATCH net-next] ipv6: Move ipv6 stubs to a separate header file To: Alexei Starovoitov Cc: David Miller , netdev@vger.kernel.org, edumazet@google.com References: <20190322130609.11655-1-dsahern@kernel.org> <20190323.214023.610983922857554034.davem@davemloft.net> <20190324035550.b4qjyl5ccfvc3tzi@ast-mbp> <20190325032641.5xyav65phoeadgye@ast-mbp> From: David Ahern Message-ID: <61520dad-939f-46ff-626b-dea91b845aa3@gmail.com> Date: Mon, 25 Mar 2019 11:02:07 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20190325032641.5xyav65phoeadgye@ast-mbp> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 3/24/19 9:26 PM, Alexei Starovoitov wrote: > On Sun, Mar 24, 2019 at 06:56:42AM -0600, David Ahern wrote: >> >> This change also enables many other key features: >> 1. IPv4 multipath routes are not evicted just because 1 hop goes down. >> 2. IPv6 multipath routes with device only nexthops (e.g., tunnels). >> 3. IPv6 nexthop with IPv4 route (aka, RFC 5549) which enables a more >> natural BGP unnumbered. >> 4. Lower memory consumption for IPv6 FIB entries which has no sharing at >> all like IPv4 does. >> 5. Allows atomic update of nexthop definitions with a single replace >> command as opposed to replacing the N-routes using it. > > Does kernel work as data plane or control plane in any of the above > features ? > Sadly the patches allow it to do both, but cumulus doesn't use it > for data path. The kernel on control plane cpu is merely a database. > And today it doesn't scale when used as a database. > The kernel has to be fast as a dataplane but these extra features > will slow down the routing by making kernel-as-database scale a bit better. > Hence my suggestion in the previous email: use proper database > to store routes, nexthops and whatever else necessary to program the asic. > The kernel doesn't need to hold this information. > The first 40 patches align fib_nh and fib6_nh providing more consistency and alignment between IPv4 and IPv6 and allowing more code re-use between the protocols. The end result is the ability to have IPv6 gateways with IPv4 routes, a much needed control plane feature other companies have been harassing me about as well as the internal need for Cumulus. In the refactoring I have been very careful about changes to data structure layout and cacheline hits as well as adverse changes to memory use. I believe at the end of this change set there is no impact to existing performance - control plane or data plane. That is followed by refactoring IPv6 again in a direction that makes IPv4 and IPv6 more consistent and enables changes (outside of the nexthop sets) that will improve IPv6 for a number of cases by removing the need to always generate a dst_entry. After that are a few patches exporting functions for use by nexthop code and then diving into the refactoring enabling separate nexthop objects. Again, impacts to performance have been top of mind, and I have done what I can to minimize any overhead in the datapath - to the point of a few ‘if (nh)’ checks wrapped in an unlikely. And with the nexthop code in place it gives users an alternative to a broken IPv6 multipath API as one example. As far as scalability goes, I can already inject a million routes into the kernel FIB. This allows me to it more efficiently and to manage the FIBs more efficiently in the face of changes such as a link going down as we move to higher end systems - such as spectrum2. As for routes in the kernel, they need to be there for any control plane processes to properly function. One example is ping and traceroute to troubleshoot data path problems, and another is for bgp (or any other service) to connect to a peer through the data plane (do not assume a peer is on a directly connected route). Further, the routes need to go through the kernel to get to the switchdev driver. The routes need to be there for XDP forwarding and routing on the host. Pawel has already expressed interest in using XDP for fast path forwarding with FRR managing the route table. You keep trying to make this about Cumulus. This is about bringing next level features to Linux and in the process bringing more consistency and code sharing between IPv4 and IPv6. This is about 1-API for the data center be it servers, hosts, switches or routers regardless of datapath (hardware offload, XDP, or kernel forwarding), and maintaining consistency in configuring, monitoring and troubleshooting across those systems. That is the common theme of both the netdev talk last summer and the talk at LPC in November. Again, I have tried to be very careful with the intrusion of checks into the datapath with the goal of no measurable impact to performance. I am invested to seeing that through and will continue looking for ways to improve it for all use cases.