From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by smtp.lore.kernel.org (Postfix) with ESMTP id 458CCC433EF for ; Sat, 16 Apr 2022 00:04:43 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 9E3BF49EE8; Fri, 15 Apr 2022 20:04:42 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@google.com Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6Tt5lzxfy+6J; Fri, 15 Apr 2022 20:04:41 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 33534411BD; Fri, 15 Apr 2022 20:04:41 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id CD6CF40D0B for ; Fri, 15 Apr 2022 20:04:39 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iR0xbtiKdBek for ; Fri, 15 Apr 2022 20:04:38 -0400 (EDT) Received: from mail-io1-f45.google.com (mail-io1-f45.google.com [209.85.166.45]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 73E5C40C58 for ; Fri, 15 Apr 2022 20:04:38 -0400 (EDT) Received: by mail-io1-f45.google.com with SMTP id 125so9581624iov.10 for ; Fri, 15 Apr 2022 17:04:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=v7EUmoZBDcoOSgCEPMYrLBJ9e85Zg5h9TjW7zpC21Dg=; b=ftWZfy7v/f/DG93DcGz77dRWOYr4bNA/SvNTc0zAPec+2UHfmMs3ZUugcqTPEdoCfJ KNcML7ZGC5BUit7RjIUr40BB3oHrFVHRx5fhXnvDGvIQjR/8qF2Y8gstiPEz0EGS+yI4 xj1M0UlSE/f7RxwByZ1qXQCI2LHFsfFXGOtuyl9Za00sqr2OeHp1v12z7fdwsjz9+n82 9XY4rty7YJD2qfnqkycJWRfhazyKVdzXsZ7nyn1gT+kLrkpoOBD1ymW24oiq74aDEyIH 15osNd6623ac/vHHXdv2rRScwiO8kEkwzLn+v7+Wl0m+FpvHs76OpYBZIpGOrFYJcAp6 xKTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=v7EUmoZBDcoOSgCEPMYrLBJ9e85Zg5h9TjW7zpC21Dg=; b=Ly49S2eFjcUU2Eit9HsBDugla7/bkgHtDBtYPd/kql5CiqqrCkWxGfq2xYy45zuzIs IqnOy/Etk5i8yRTDQz2Xhh1nmWLm3KdbOEGJ90gyv3/s3ATUheGQpv4GK5NE03L7RMNf D64uVNMnIttjnqjxZz+3Ik8cQFehDWZwvvOrU7FN96FKJWo8S2zILOWVrRcBN6SfNWgt VQTokhbxFq5gpDwIdgQ+5CQY3zVi4rbAGGoPcqcZg45NqqoJkkxx9M/RP5eleDign8t0 ptwntjHMLvjVKtq4liTxLlP3l0doVcscNpow4NaS3EPEyhXJUYijXd9OoRVLQV+338Mt 8I+g== X-Gm-Message-State: AOAM533Xbdtlxr6tdBqUgWId2OxXKox4f5ZWV1RpS5xNq88GLbPoLxzi EjfEp/Z4w/RhZeUfFc0E9z9WOQ== X-Google-Smtp-Source: ABdhPJx5WNudH60IdHmm8u5EeSYrYi3W2BxrIqiNRXfOdkSArYISyE9fEPmanjk3RcvvF4+wvKKxgg== X-Received: by 2002:a5d:8b8f:0:b0:649:ec6d:98e9 with SMTP id p15-20020a5d8b8f000000b00649ec6d98e9mr469641iol.30.1650067477528; Fri, 15 Apr 2022 17:04:37 -0700 (PDT) Received: from google.com (194.225.68.34.bc.googleusercontent.com. [34.68.225.194]) by smtp.gmail.com with ESMTPSA id i3-20020a056602134300b0064620a85b6dsm4156467iov.12.2022.04.15.17.04.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Apr 2022 17:04:36 -0700 (PDT) Date: Sat, 16 Apr 2022 00:04:33 +0000 From: Oliver Upton To: David Matlack Subject: Re: [RFC PATCH 00/17] KVM: arm64: Parallelize stage 2 fault handling Message-ID: References: <20220415215901.1737897-1-oupton@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Cc: kvm list , Marc Zyngier , Ben Gardon , Peter Shier , Paolo Bonzini , KVMARM , linux-arm-kernel@lists.infradead.org X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Fri, Apr 15, 2022 at 04:35:24PM -0700, David Matlack wrote: > On Fri, Apr 15, 2022 at 2:59 PM Oliver Upton wrote: > > > > Presently KVM only takes a read lock for stage 2 faults if it believes > > the fault can be fixed by relaxing permissions on a PTE (write unprotect > > for dirty logging). Otherwise, stage 2 faults grab the write lock, which > > predictably can pile up all the vCPUs in a sufficiently large VM. > > > > The x86 port of KVM has what it calls the TDP MMU. Basically, it is an > > MMU protected by the combination of a read-write lock and RCU, allowing > > page walkers to traverse in parallel. > > > > This series is strongly inspired by the mechanics of the TDP MMU, > > making use of RCU to protect parallel walks. Note that the TLB > > invalidation mechanics are a bit different between x86 and ARM, so we > > need to use the 'break-before-make' sequence to split/collapse a > > block/table mapping, respectively. > > An alternative (or perhaps "v2" [1]) is to make x86's TDP MMU > arch-neutral and port it to support ARM's stage-2 MMU. This is based > on a few observations: > > - The problems that motivated the development of the TDP MMU are not > x86-specific (e.g. parallelizing faults during the post-copy phase of > Live Migration). > - The synchronization in the TDP MMU (read/write lock, RCU for PT > freeing, atomic compare-exchanges for modifying PTEs) is complex, but > would be equivalent across architectures. > - Eventually RISC-V is going to want similar performance (my > understanding is RISC-V MMU is already a copy-paste of the ARM MMU), > and it'd be a shame to re-implement TDP MMU synchronization a third > time. > - The TDP MMU includes support for various performance features that > would benefit other architectures, such as eager page splitting, > deferred zapping, lockless write-protection resolution, and (coming > soon) in-place huge page promotion. > - And then there's the obvious wins from less code duplication in KVM > (e.g. get rid of the RISC-V MMU copy, increased code test coverage, > ...). I definitely agree with the observation -- we're all trying to solve the same set of issues. And I completely agree that a good long term goal would be to create some common parts for all architectures. Less work for us ARM folks it would seem ;-) What's top of mind is how we paper over the architectural differences between all of the architectures, especially when we need to do entirely different things because of the arch. For example, I whine about break-before-make a lot throughout this series which is somewhat unique to ARM. I don't think we can do eager page splitting on the base architecture w/o doing the TLBI for every block. Not only that, we can't do a direct valid->valid change without first making an invalid PTE visible to hardware. Things get even more exciting when hardware revisions relax break-before-make requirements. There's also significant architectural differences between KVM on x86 and KVM for ARM. Our paging code runs both in the host kernel and the hyp/lowvisor, and does: - VM two dimensional paging (stage 2 MMU) - Hyp's own MMU (stage 1 MMU) - Host kernel isolation (stage 2 MMU) each with its own quirks. The 'not exactly in the kernel' part will make instrumentation a bit of a hassle too. None of this is meant to disagree with you in the slightest. I firmly agree we need to share as many parts between the architectures as possible. I'm just trying to call out a few of the things relating to ARM that will make this annoying so that way whoever embarks on the adventure will see it. > The side of this I haven't really looked into yet is ARM's stage-2 > MMU, and how amenable it would be to being managed by the TDP MMU. But > I assume it's a conventional page table structure mapping GPAs to > HPAs, which is the most important overlap. > > That all being said, an arch-neutral TDP MMU would be a larger, more > complex code change than something like this series (hence my "v2" > caveat above). But I wanted to get this idea out there since the > rubber is starting to hit the road on improving ARM MMU scalability. All for it. I cc'ed you on the series for this exact reason, I wanted to grab your attention to spark the conversation :) -- Thanks, Oliver _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C035EC433F5 for ; Sat, 16 Apr 2022 00:05:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=/VHEudJL803WAsxm3IForA72czxX9lTovsRpZEs1b5A=; b=UiQRxLHiopEx/R S1kWY5iE+k/JueHshziIdLxh4Hc5g72EKkpdKU2/GeLc37i84w7sMWjr4uVZQc6uqY5onG/qghG7o RXIit6b0GeU77RCx673Fn+lIAcwJ+mjNlJkQA5/y8leiDynChc0EvNSs7TI0YCkrDX0XywNHG8phF V6eRs2jh/Ee5io471j/1JcHeIbua1dFaqfxV355DZyRo7as26CKDy8IF0g3CotitBPZMgviSdmURZ lqq9NkzRq6x0G3sG/QI5VxwUtOh59tYcEQE++NxFIjaHqAXFswiDhuJ8FQEfQUzIJTjgyVzjeh6c0 dotX4EfxsWCnINgYl97Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nfVvj-00BlTF-0l; Sat, 16 Apr 2022 00:04:47 +0000 Received: from mail-io1-xd2c.google.com ([2607:f8b0:4864:20::d2c]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nfVvf-00BlRc-H8 for linux-arm-kernel@lists.infradead.org; Sat, 16 Apr 2022 00:04:45 +0000 Received: by mail-io1-xd2c.google.com with SMTP id o127so4300742iof.12 for ; Fri, 15 Apr 2022 17:04:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=v7EUmoZBDcoOSgCEPMYrLBJ9e85Zg5h9TjW7zpC21Dg=; b=ftWZfy7v/f/DG93DcGz77dRWOYr4bNA/SvNTc0zAPec+2UHfmMs3ZUugcqTPEdoCfJ KNcML7ZGC5BUit7RjIUr40BB3oHrFVHRx5fhXnvDGvIQjR/8qF2Y8gstiPEz0EGS+yI4 xj1M0UlSE/f7RxwByZ1qXQCI2LHFsfFXGOtuyl9Za00sqr2OeHp1v12z7fdwsjz9+n82 9XY4rty7YJD2qfnqkycJWRfhazyKVdzXsZ7nyn1gT+kLrkpoOBD1ymW24oiq74aDEyIH 15osNd6623ac/vHHXdv2rRScwiO8kEkwzLn+v7+Wl0m+FpvHs76OpYBZIpGOrFYJcAp6 xKTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=v7EUmoZBDcoOSgCEPMYrLBJ9e85Zg5h9TjW7zpC21Dg=; b=5R/WKkni0vfh+CSGqkq3jmA2i3RbhkHf5BhCPk+EFPl8R4oEpGougBOXfZBRkVOsHH uMkmEI/fDodZVtFz4PmTiiLi6Msk5qMA/gK4AaarJuj93M9svselWE7bcTyzI1JPs/QI hqj5XtZtEms40NOlhsTXCK4OndXmQLWUPgSZY5lauU61rQXRMtsLQEy3nMeB7ZpG+Ga8 nN/vUksk1JKg1aV/xubrBYhDsGXOH63BBpMk4HrnBOv2pH1cNqtcdL01TbG7jD1ELM0w vQ0pCdufI4UC5LINWG0tkE5M7jYhUbbNpjEFwXuCSSZheStT13mJ/y4pIWFOlBIgyfaj tMAQ== X-Gm-Message-State: AOAM532+YHjmqkAFWG6hDeUProlE3aFi6LjcMdsA8vu1g+dXzS3DDdJg Nm4BHN3kJ/7sZaJ+hLnsfQXtog== X-Google-Smtp-Source: ABdhPJx5WNudH60IdHmm8u5EeSYrYi3W2BxrIqiNRXfOdkSArYISyE9fEPmanjk3RcvvF4+wvKKxgg== X-Received: by 2002:a5d:8b8f:0:b0:649:ec6d:98e9 with SMTP id p15-20020a5d8b8f000000b00649ec6d98e9mr469641iol.30.1650067477528; Fri, 15 Apr 2022 17:04:37 -0700 (PDT) Received: from google.com (194.225.68.34.bc.googleusercontent.com. [34.68.225.194]) by smtp.gmail.com with ESMTPSA id i3-20020a056602134300b0064620a85b6dsm4156467iov.12.2022.04.15.17.04.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Apr 2022 17:04:36 -0700 (PDT) Date: Sat, 16 Apr 2022 00:04:33 +0000 From: Oliver Upton To: David Matlack Cc: KVMARM , kvm list , Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Peter Shier , Ricardo Koller , Reiji Watanabe , Paolo Bonzini , Sean Christopherson , Ben Gardon Subject: Re: [RFC PATCH 00/17] KVM: arm64: Parallelize stage 2 fault handling Message-ID: References: <20220415215901.1737897-1-oupton@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220415_170443_597484_699A71A0 X-CRM114-Status: GOOD ( 35.40 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Apr 15, 2022 at 04:35:24PM -0700, David Matlack wrote: > On Fri, Apr 15, 2022 at 2:59 PM Oliver Upton wrote: > > > > Presently KVM only takes a read lock for stage 2 faults if it believes > > the fault can be fixed by relaxing permissions on a PTE (write unprotect > > for dirty logging). Otherwise, stage 2 faults grab the write lock, which > > predictably can pile up all the vCPUs in a sufficiently large VM. > > > > The x86 port of KVM has what it calls the TDP MMU. Basically, it is an > > MMU protected by the combination of a read-write lock and RCU, allowing > > page walkers to traverse in parallel. > > > > This series is strongly inspired by the mechanics of the TDP MMU, > > making use of RCU to protect parallel walks. Note that the TLB > > invalidation mechanics are a bit different between x86 and ARM, so we > > need to use the 'break-before-make' sequence to split/collapse a > > block/table mapping, respectively. > > An alternative (or perhaps "v2" [1]) is to make x86's TDP MMU > arch-neutral and port it to support ARM's stage-2 MMU. This is based > on a few observations: > > - The problems that motivated the development of the TDP MMU are not > x86-specific (e.g. parallelizing faults during the post-copy phase of > Live Migration). > - The synchronization in the TDP MMU (read/write lock, RCU for PT > freeing, atomic compare-exchanges for modifying PTEs) is complex, but > would be equivalent across architectures. > - Eventually RISC-V is going to want similar performance (my > understanding is RISC-V MMU is already a copy-paste of the ARM MMU), > and it'd be a shame to re-implement TDP MMU synchronization a third > time. > - The TDP MMU includes support for various performance features that > would benefit other architectures, such as eager page splitting, > deferred zapping, lockless write-protection resolution, and (coming > soon) in-place huge page promotion. > - And then there's the obvious wins from less code duplication in KVM > (e.g. get rid of the RISC-V MMU copy, increased code test coverage, > ...). I definitely agree with the observation -- we're all trying to solve the same set of issues. And I completely agree that a good long term goal would be to create some common parts for all architectures. Less work for us ARM folks it would seem ;-) What's top of mind is how we paper over the architectural differences between all of the architectures, especially when we need to do entirely different things because of the arch. For example, I whine about break-before-make a lot throughout this series which is somewhat unique to ARM. I don't think we can do eager page splitting on the base architecture w/o doing the TLBI for every block. Not only that, we can't do a direct valid->valid change without first making an invalid PTE visible to hardware. Things get even more exciting when hardware revisions relax break-before-make requirements. There's also significant architectural differences between KVM on x86 and KVM for ARM. Our paging code runs both in the host kernel and the hyp/lowvisor, and does: - VM two dimensional paging (stage 2 MMU) - Hyp's own MMU (stage 1 MMU) - Host kernel isolation (stage 2 MMU) each with its own quirks. The 'not exactly in the kernel' part will make instrumentation a bit of a hassle too. None of this is meant to disagree with you in the slightest. I firmly agree we need to share as many parts between the architectures as possible. I'm just trying to call out a few of the things relating to ARM that will make this annoying so that way whoever embarks on the adventure will see it. > The side of this I haven't really looked into yet is ARM's stage-2 > MMU, and how amenable it would be to being managed by the TDP MMU. But > I assume it's a conventional page table structure mapping GPAs to > HPAs, which is the most important overlap. > > That all being said, an arch-neutral TDP MMU would be a larger, more > complex code change than something like this series (hence my "v2" > caveat above). But I wanted to get this idea out there since the > rubber is starting to hit the road on improving ARM MMU scalability. All for it. I cc'ed you on the series for this exact reason, I wanted to grab your attention to spark the conversation :) -- Thanks, Oliver _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDF9BC433F5 for ; Sat, 16 Apr 2022 00:04:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356686AbiDPAHJ (ORCPT ); Fri, 15 Apr 2022 20:07:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356697AbiDPAHI (ORCPT ); Fri, 15 Apr 2022 20:07:08 -0400 Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com [IPv6:2607:f8b0:4864:20::d35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D11221808 for ; Fri, 15 Apr 2022 17:04:38 -0700 (PDT) Received: by mail-io1-xd35.google.com with SMTP id c125so2314079iof.9 for ; Fri, 15 Apr 2022 17:04:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=v7EUmoZBDcoOSgCEPMYrLBJ9e85Zg5h9TjW7zpC21Dg=; b=ftWZfy7v/f/DG93DcGz77dRWOYr4bNA/SvNTc0zAPec+2UHfmMs3ZUugcqTPEdoCfJ KNcML7ZGC5BUit7RjIUr40BB3oHrFVHRx5fhXnvDGvIQjR/8qF2Y8gstiPEz0EGS+yI4 xj1M0UlSE/f7RxwByZ1qXQCI2LHFsfFXGOtuyl9Za00sqr2OeHp1v12z7fdwsjz9+n82 9XY4rty7YJD2qfnqkycJWRfhazyKVdzXsZ7nyn1gT+kLrkpoOBD1ymW24oiq74aDEyIH 15osNd6623ac/vHHXdv2rRScwiO8kEkwzLn+v7+Wl0m+FpvHs76OpYBZIpGOrFYJcAp6 xKTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=v7EUmoZBDcoOSgCEPMYrLBJ9e85Zg5h9TjW7zpC21Dg=; b=qeeYFiJvXDyKwmbHfztpz7LDXDxcyRGtX8xTjixo3UY5y8Z5hcLz2PT47S59TNZL6y dHwl1d+CHUAIQU5kNOZbl7bXtiINLIpwvYqhPm+1++ACEJLd/1mlAXVqu0QiBjNAWke9 bpAm12p230vWVf83z9ueW3K7Fy1EcT8DfgYuQtTAMBEtVYRVzp9qxa9FELHdlXoZrb8h AGdAwx0NTsimfUG51kgMVxJ8Dm6YO8eBXzAkJLS7GzW5NhEStk5d8GBtKCmKJ39NuZ6p DYyC4ROH5Er5CkZZFXBZajJPeTdHFMYM0cFARGspooV2VyUWZQTyudSpMmrJtZFnKZa+ gulQ== X-Gm-Message-State: AOAM5313yvqBe/h7EtSLXKuFgdSsnscRwhIOiWakumuV0D0KT2NyIJOf NnTmWGIR8hXHuDkqIfqct3COmw== X-Google-Smtp-Source: ABdhPJx5WNudH60IdHmm8u5EeSYrYi3W2BxrIqiNRXfOdkSArYISyE9fEPmanjk3RcvvF4+wvKKxgg== X-Received: by 2002:a5d:8b8f:0:b0:649:ec6d:98e9 with SMTP id p15-20020a5d8b8f000000b00649ec6d98e9mr469641iol.30.1650067477528; Fri, 15 Apr 2022 17:04:37 -0700 (PDT) Received: from google.com (194.225.68.34.bc.googleusercontent.com. [34.68.225.194]) by smtp.gmail.com with ESMTPSA id i3-20020a056602134300b0064620a85b6dsm4156467iov.12.2022.04.15.17.04.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Apr 2022 17:04:36 -0700 (PDT) Date: Sat, 16 Apr 2022 00:04:33 +0000 From: Oliver Upton To: David Matlack Cc: KVMARM , kvm list , Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, Peter Shier , Ricardo Koller , Reiji Watanabe , Paolo Bonzini , Sean Christopherson , Ben Gardon Subject: Re: [RFC PATCH 00/17] KVM: arm64: Parallelize stage 2 fault handling Message-ID: References: <20220415215901.1737897-1-oupton@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Fri, Apr 15, 2022 at 04:35:24PM -0700, David Matlack wrote: > On Fri, Apr 15, 2022 at 2:59 PM Oliver Upton wrote: > > > > Presently KVM only takes a read lock for stage 2 faults if it believes > > the fault can be fixed by relaxing permissions on a PTE (write unprotect > > for dirty logging). Otherwise, stage 2 faults grab the write lock, which > > predictably can pile up all the vCPUs in a sufficiently large VM. > > > > The x86 port of KVM has what it calls the TDP MMU. Basically, it is an > > MMU protected by the combination of a read-write lock and RCU, allowing > > page walkers to traverse in parallel. > > > > This series is strongly inspired by the mechanics of the TDP MMU, > > making use of RCU to protect parallel walks. Note that the TLB > > invalidation mechanics are a bit different between x86 and ARM, so we > > need to use the 'break-before-make' sequence to split/collapse a > > block/table mapping, respectively. > > An alternative (or perhaps "v2" [1]) is to make x86's TDP MMU > arch-neutral and port it to support ARM's stage-2 MMU. This is based > on a few observations: > > - The problems that motivated the development of the TDP MMU are not > x86-specific (e.g. parallelizing faults during the post-copy phase of > Live Migration). > - The synchronization in the TDP MMU (read/write lock, RCU for PT > freeing, atomic compare-exchanges for modifying PTEs) is complex, but > would be equivalent across architectures. > - Eventually RISC-V is going to want similar performance (my > understanding is RISC-V MMU is already a copy-paste of the ARM MMU), > and it'd be a shame to re-implement TDP MMU synchronization a third > time. > - The TDP MMU includes support for various performance features that > would benefit other architectures, such as eager page splitting, > deferred zapping, lockless write-protection resolution, and (coming > soon) in-place huge page promotion. > - And then there's the obvious wins from less code duplication in KVM > (e.g. get rid of the RISC-V MMU copy, increased code test coverage, > ...). I definitely agree with the observation -- we're all trying to solve the same set of issues. And I completely agree that a good long term goal would be to create some common parts for all architectures. Less work for us ARM folks it would seem ;-) What's top of mind is how we paper over the architectural differences between all of the architectures, especially when we need to do entirely different things because of the arch. For example, I whine about break-before-make a lot throughout this series which is somewhat unique to ARM. I don't think we can do eager page splitting on the base architecture w/o doing the TLBI for every block. Not only that, we can't do a direct valid->valid change without first making an invalid PTE visible to hardware. Things get even more exciting when hardware revisions relax break-before-make requirements. There's also significant architectural differences between KVM on x86 and KVM for ARM. Our paging code runs both in the host kernel and the hyp/lowvisor, and does: - VM two dimensional paging (stage 2 MMU) - Hyp's own MMU (stage 1 MMU) - Host kernel isolation (stage 2 MMU) each with its own quirks. The 'not exactly in the kernel' part will make instrumentation a bit of a hassle too. None of this is meant to disagree with you in the slightest. I firmly agree we need to share as many parts between the architectures as possible. I'm just trying to call out a few of the things relating to ARM that will make this annoying so that way whoever embarks on the adventure will see it. > The side of this I haven't really looked into yet is ARM's stage-2 > MMU, and how amenable it would be to being managed by the TDP MMU. But > I assume it's a conventional page table structure mapping GPAs to > HPAs, which is the most important overlap. > > That all being said, an arch-neutral TDP MMU would be a larger, more > complex code change than something like this series (hence my "v2" > caveat above). But I wanted to get this idea out there since the > rubber is starting to hit the road on improving ARM MMU scalability. All for it. I cc'ed you on the series for this exact reason, I wanted to grab your attention to spark the conversation :) -- Thanks, Oliver