From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BE61C433EF for ; Tue, 3 May 2022 14:17:34 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id F12B24B1BC; Tue, 3 May 2022 10:17:33 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@google.com Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Szw5ovxHi0vh; Tue, 3 May 2022 10:17:32 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id B9D784B152; Tue, 3 May 2022 10:17:32 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 7DA334B13E for ; Tue, 3 May 2022 10:17:31 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UwuImrgjLi6f for ; Tue, 3 May 2022 10:17:30 -0400 (EDT) Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 35C7C49EC1 for ; Tue, 3 May 2022 10:17:30 -0400 (EDT) Received: by mail-ej1-f49.google.com with SMTP id dk23so33732349ejb.8 for ; Tue, 03 May 2022 07:17:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=fbgMapne1rH6/kKlRXWbC6sN+d3viTII31UcYeBu5g4=; b=p5StzTBTAohsMwlL6jHQBWtTQobIcqESt3kLVZnkNHJSG2DdBFZF7Tgs5QaZmPnlCr F1l8QZ0lnzdHlhRVHnwsSRNzlNALEMzZuSnQCxowMh6uXeXOg3ZbmSEabk7olar1pdpp 9pZRvS+REXlThgWE/quskEMYIGVZw1x55jioiJE011eju4WdbP9ZextM5/dSqGnzJ/OA Ip3n3G3Q5wiRXN3pJWPKqD+r0Wk199FnUkUPdXWP8Vhe6xFQIDSvsecKYYscAjJp/an6 Yukn9oKICYgAyT+VGWoiAnlgJf7HCdoInuF1DHTluTYZYb7iUxSiRZ05KmZ0heDnRw16 aUzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=fbgMapne1rH6/kKlRXWbC6sN+d3viTII31UcYeBu5g4=; b=hJLpYgl90nfh1oZKaBm+ovajc+4TZF24Rrh1JFnnuaPZkwIObM+0oFLHvYp+KQu+OR cTZWqWjN4u/5KHKdoNwynboHcTbZyaChjAvl6A0qBn2FxiKLZFPdnvLX6G2EefKUYZ9Q Ft5HV/ORYEc7frNlZzS3NgVPuvtg4oFCJHNefHNaoDkYgT2xujWQHy/ku+8WterbLOya oCQNEF12jEAs4z0Cs1iCgV5BVg/wVomU5kI033GyTyrMJYyxFbWEOU6fooY5ZITcf7Hb QUolrFOPVTdZbCWGivmvRu/paglWVA7ooQ10Nb1Odkg4pJ6k8asPR2HHQ66Ogr/tOMV3 h5Ow== X-Gm-Message-State: AOAM533IGsRXj1YdNjL9SSITqjvXoRvBDPxWuhVyCILK2nrg8E1ovmZP YlLOfuotAoQxC8UIGfQPxmOZYg== X-Google-Smtp-Source: ABdhPJzSpQ9MzxynGtzoIUDLeJ/DfizCDd1t2kS5FNmZLRZhA9C+R6Uh6yCEZ5JtfHxxtU8soK6nAQ== X-Received: by 2002:a17:906:d555:b0:6da:ac8c:f66b with SMTP id cr21-20020a170906d55500b006daac8cf66bmr15765063ejc.107.1651587448896; Tue, 03 May 2022 07:17:28 -0700 (PDT) Received: from google.com (30.171.91.34.bc.googleusercontent.com. [34.91.171.30]) by smtp.gmail.com with ESMTPSA id hg13-20020a1709072ccd00b006f3ef214df3sm4657306ejc.89.2022.05.03.07.17.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 May 2022 07:17:28 -0700 (PDT) Date: Tue, 3 May 2022 14:17:25 +0000 From: Quentin Perret To: Oliver Upton Subject: Re: [RFC PATCH 09/17] KVM: arm64: Tear down unlinked page tables in parallel walk Message-ID: References: <20220415215901.1737897-1-oupton@google.com> <20220415215901.1737897-10-oupton@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Cc: kvm@vger.kernel.org, Marc Zyngier , Peter Shier , Ben Gardon , David Matlack , Paolo Bonzini , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Friday 22 Apr 2022 at 20:41:47 (+0000), Oliver Upton wrote: > On Fri, Apr 22, 2022 at 04:00:45PM +0000, Quentin Perret wrote: > > On Thursday 21 Apr 2022 at 16:40:56 (+0000), Oliver Upton wrote: > > > The other option would be to not touch the subtree at all until the rcu > > > callback, as at that point software will not tweak the tables any more. > > > No need for atomics/spinning and can just do a boring traversal. > > > > Right that is sort of what I had in mind. Note that I'm still trying to > > make my mind about the overall approach -- I can see how RCU protection > > provides a rather elegant solution to this problem, but this makes the > > whole thing inaccessible to e.g. pKVM where RCU is a non-starter. > > Heh, figuring out how to do this for pKVM seemed hard hence my lazy > attempt :) > > > A > > possible alternative that comes to mind would be to have all walkers > > take references on the pages as they walk down, and release them on > > their way back, but I'm still not sure how to make this race-safe. I'll > > have a think ... > > Does pKVM ever collapse tables into blocks? That is the only reason any > of this mess ever gets roped in. If not I think it is possible to get > away with a rwlock with unmap on the write side and everything else on > the read side, right? > > As far as regular KVM goes we get in this business when disabling dirty > logging on a memslot. Guest faults will lazily collapse the tables back > into blocks. An equally valid implementation would be just to unmap the > whole memslot and have the guest build out the tables again, which could > work with the aforementioned rwlock. Apologies for the delay on this one, I was away for a while. Yup, that all makes sense. FWIW the pKVM use-case I have in mind is slightly different. Specifically, in the pKVM world the hypervisor maintains a stage-2 for the host, that is all identity mapped. So we use nice big block mappings as much as we can. But when a protected guest starts, the hypervisor needs to break down the host stage-2 blocks to unmap the 4K guest pages from the host (which is where the protection comes from in pKVM). And when the guest is torn down, the host can reclaim its pages, hence putting us in a position to coallesce its stage-2 into nice big blocks again. Note that none of this coallescing is currently implemented even in our pKVM prototype, so it's a bit unfair to ask you to deal with this stuff now, but clearly it'd be cool if there was a way we could make these things coexist and even ideally share some code... _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 682F4C433EF for ; Tue, 3 May 2022 14:18:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=mNi/9RN4NZQ3Nq3mMfW31Z2hhmn+pSJfG3OjSBERP5o=; b=xp/A2oqwGA8Ayg U3L1hw/qAmpaX3wwf+VrAsMBuDIY5OhI4ClJGpS6RQzHyIaek+p3BRhkc6oRjF0b9Ewg9xIUupbXH lhtGfU9EPvPqokSQvEiiENkjpQmRJyEXVYH5JQVrPLA/+DlBvTQguzEfTMSbInM6GXJGd+SHWb2ch UZubaFs4mjAOvif5gLd/0azFlYsleToaWiB7tDNloujeKQq0OZCli+/4JW8V6+RqqQNJt0K6Kg+rB Pj/fHLDqS6ZCyrUBF97bWTWH3ubCD1A0b9HVMIXDZsf0IVAVy9rgeqzLrK9hbiea+5C3YDTKo4BB7 92Xgzsyl6/ppmnsel8cQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nltLP-006Cqc-FD; Tue, 03 May 2022 14:17:39 +0000 Received: from mail-ej1-x62f.google.com ([2a00:1450:4864:20::62f]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nltLK-006CmB-HA for linux-arm-kernel@lists.infradead.org; Tue, 03 May 2022 14:17:37 +0000 Received: by mail-ej1-x62f.google.com with SMTP id n10so16320150ejk.5 for ; Tue, 03 May 2022 07:17:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=fbgMapne1rH6/kKlRXWbC6sN+d3viTII31UcYeBu5g4=; b=p5StzTBTAohsMwlL6jHQBWtTQobIcqESt3kLVZnkNHJSG2DdBFZF7Tgs5QaZmPnlCr F1l8QZ0lnzdHlhRVHnwsSRNzlNALEMzZuSnQCxowMh6uXeXOg3ZbmSEabk7olar1pdpp 9pZRvS+REXlThgWE/quskEMYIGVZw1x55jioiJE011eju4WdbP9ZextM5/dSqGnzJ/OA Ip3n3G3Q5wiRXN3pJWPKqD+r0Wk199FnUkUPdXWP8Vhe6xFQIDSvsecKYYscAjJp/an6 Yukn9oKICYgAyT+VGWoiAnlgJf7HCdoInuF1DHTluTYZYb7iUxSiRZ05KmZ0heDnRw16 aUzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=fbgMapne1rH6/kKlRXWbC6sN+d3viTII31UcYeBu5g4=; b=b8izsOVWSmz7tzkqdAzXVMiJpyps1Zhxzzb1qESy9jn1kE8kjAQiItdYgxWEP4f/5D GZ2wlSQ9gtfBeYe0ksAoqMLNO5VVMSTfKXk1yrB5f7riBkYrLqh7d7/XbsDYn7xwsfyh yTl67XdQ1nteh+oySNbl7BeZIiZL0GGbQ8Z8YRC/dNRjqiqtsj087zgz66JL7N1njYHx llxDOPcjPfWIYwklwdILOUJZNglW/hYlB3yj7xsP/QxgeKXl/PZczEEATYG/ug9yP9t4 OFtUu1uXNINxOvbZgs7aJMr4MaujPVg7lDEKiRPzdRH+sFh1YgzebouJ/Hwr2o4G5+GP GpkA== X-Gm-Message-State: AOAM531Dqv6iO8EIcqmW98bdR/3Lx4r5IVm9FJb14F0Oyl21rMWhT7hQ 6aU40F4/Pi0ubsODeIgEu2a1rA== X-Google-Smtp-Source: ABdhPJzSpQ9MzxynGtzoIUDLeJ/DfizCDd1t2kS5FNmZLRZhA9C+R6Uh6yCEZ5JtfHxxtU8soK6nAQ== X-Received: by 2002:a17:906:d555:b0:6da:ac8c:f66b with SMTP id cr21-20020a170906d55500b006daac8cf66bmr15765063ejc.107.1651587448896; Tue, 03 May 2022 07:17:28 -0700 (PDT) Received: from google.com (30.171.91.34.bc.googleusercontent.com. [34.91.171.30]) by smtp.gmail.com with ESMTPSA id hg13-20020a1709072ccd00b006f3ef214df3sm4657306ejc.89.2022.05.03.07.17.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 May 2022 07:17:28 -0700 (PDT) Date: Tue, 3 May 2022 14:17:25 +0000 From: Quentin Perret To: Oliver Upton Cc: kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Marc Zyngier , Ben Gardon , Peter Shier , David Matlack , Paolo Bonzini , linux-arm-kernel@lists.infradead.org Subject: Re: [RFC PATCH 09/17] KVM: arm64: Tear down unlinked page tables in parallel walk Message-ID: References: <20220415215901.1737897-1-oupton@google.com> <20220415215901.1737897-10-oupton@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220503_071734_621104_A72E3C39 X-CRM114-Status: GOOD ( 29.33 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Friday 22 Apr 2022 at 20:41:47 (+0000), Oliver Upton wrote: > On Fri, Apr 22, 2022 at 04:00:45PM +0000, Quentin Perret wrote: > > On Thursday 21 Apr 2022 at 16:40:56 (+0000), Oliver Upton wrote: > > > The other option would be to not touch the subtree at all until the rcu > > > callback, as at that point software will not tweak the tables any more. > > > No need for atomics/spinning and can just do a boring traversal. > > > > Right that is sort of what I had in mind. Note that I'm still trying to > > make my mind about the overall approach -- I can see how RCU protection > > provides a rather elegant solution to this problem, but this makes the > > whole thing inaccessible to e.g. pKVM where RCU is a non-starter. > > Heh, figuring out how to do this for pKVM seemed hard hence my lazy > attempt :) > > > A > > possible alternative that comes to mind would be to have all walkers > > take references on the pages as they walk down, and release them on > > their way back, but I'm still not sure how to make this race-safe. I'll > > have a think ... > > Does pKVM ever collapse tables into blocks? That is the only reason any > of this mess ever gets roped in. If not I think it is possible to get > away with a rwlock with unmap on the write side and everything else on > the read side, right? > > As far as regular KVM goes we get in this business when disabling dirty > logging on a memslot. Guest faults will lazily collapse the tables back > into blocks. An equally valid implementation would be just to unmap the > whole memslot and have the guest build out the tables again, which could > work with the aforementioned rwlock. Apologies for the delay on this one, I was away for a while. Yup, that all makes sense. FWIW the pKVM use-case I have in mind is slightly different. Specifically, in the pKVM world the hypervisor maintains a stage-2 for the host, that is all identity mapped. So we use nice big block mappings as much as we can. But when a protected guest starts, the hypervisor needs to break down the host stage-2 blocks to unmap the 4K guest pages from the host (which is where the protection comes from in pKVM). And when the guest is torn down, the host can reclaim its pages, hence putting us in a position to coallesce its stage-2 into nice big blocks again. Note that none of this coallescing is currently implemented even in our pKVM prototype, so it's a bit unfair to ask you to deal with this stuff now, but clearly it'd be cool if there was a way we could make these things coexist and even ideally share some code... _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB013C433F5 for ; Tue, 3 May 2022 14:17:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236861AbiECOVE (ORCPT ); Tue, 3 May 2022 10:21:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235673AbiECOVD (ORCPT ); Tue, 3 May 2022 10:21:03 -0400 Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96EDB27CCE for ; Tue, 3 May 2022 07:17:30 -0700 (PDT) Received: by mail-ej1-x62d.google.com with SMTP id bv19so33732776ejb.6 for ; Tue, 03 May 2022 07:17:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=fbgMapne1rH6/kKlRXWbC6sN+d3viTII31UcYeBu5g4=; b=p5StzTBTAohsMwlL6jHQBWtTQobIcqESt3kLVZnkNHJSG2DdBFZF7Tgs5QaZmPnlCr F1l8QZ0lnzdHlhRVHnwsSRNzlNALEMzZuSnQCxowMh6uXeXOg3ZbmSEabk7olar1pdpp 9pZRvS+REXlThgWE/quskEMYIGVZw1x55jioiJE011eju4WdbP9ZextM5/dSqGnzJ/OA Ip3n3G3Q5wiRXN3pJWPKqD+r0Wk199FnUkUPdXWP8Vhe6xFQIDSvsecKYYscAjJp/an6 Yukn9oKICYgAyT+VGWoiAnlgJf7HCdoInuF1DHTluTYZYb7iUxSiRZ05KmZ0heDnRw16 aUzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=fbgMapne1rH6/kKlRXWbC6sN+d3viTII31UcYeBu5g4=; b=XlCd8E/hqmXipF8wHhg1MSb8wo5aHpko5kmwQaiCP/3FGc/ZTL2KKrSmTaHulQ3/zl cj789UkbsC4+9M1dNgLHdej345sFwxA3gpbYOr4H/w7uRU9XvIrjHk/NkIaYHH2Kcz+3 UO29X6mJexLrTZ8HY5emDJCuAxTtk68Uc9j77fg40D6VpeECh93Lmf4gEHsFQCp9DmB3 OJZ4wTeBdDl7PtCyxho9ZiaglChDz7FOvOeN5sQ2t5nFO134Rv05EzntnVQQPi+TUO3z fXS17XVh+BvsUfLYBu42cfjsRCrjLOKFMVwjOX4ejCrtiJDZXEEj9c6HZXLetrRPuuJw uF+g== X-Gm-Message-State: AOAM532vi9V72DRFrV9qCOrHDU9NFID5JSKRsKcVDG257x0Tup6l4bFy G9GFwbuoHj4p9KefNl/E+8zVhw== X-Google-Smtp-Source: ABdhPJzSpQ9MzxynGtzoIUDLeJ/DfizCDd1t2kS5FNmZLRZhA9C+R6Uh6yCEZ5JtfHxxtU8soK6nAQ== X-Received: by 2002:a17:906:d555:b0:6da:ac8c:f66b with SMTP id cr21-20020a170906d55500b006daac8cf66bmr15765063ejc.107.1651587448896; Tue, 03 May 2022 07:17:28 -0700 (PDT) Received: from google.com (30.171.91.34.bc.googleusercontent.com. [34.91.171.30]) by smtp.gmail.com with ESMTPSA id hg13-20020a1709072ccd00b006f3ef214df3sm4657306ejc.89.2022.05.03.07.17.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 May 2022 07:17:28 -0700 (PDT) Date: Tue, 3 May 2022 14:17:25 +0000 From: Quentin Perret To: Oliver Upton Cc: kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Marc Zyngier , Ben Gardon , Peter Shier , David Matlack , Paolo Bonzini , linux-arm-kernel@lists.infradead.org Subject: Re: [RFC PATCH 09/17] KVM: arm64: Tear down unlinked page tables in parallel walk Message-ID: References: <20220415215901.1737897-1-oupton@google.com> <20220415215901.1737897-10-oupton@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Friday 22 Apr 2022 at 20:41:47 (+0000), Oliver Upton wrote: > On Fri, Apr 22, 2022 at 04:00:45PM +0000, Quentin Perret wrote: > > On Thursday 21 Apr 2022 at 16:40:56 (+0000), Oliver Upton wrote: > > > The other option would be to not touch the subtree at all until the rcu > > > callback, as at that point software will not tweak the tables any more. > > > No need for atomics/spinning and can just do a boring traversal. > > > > Right that is sort of what I had in mind. Note that I'm still trying to > > make my mind about the overall approach -- I can see how RCU protection > > provides a rather elegant solution to this problem, but this makes the > > whole thing inaccessible to e.g. pKVM where RCU is a non-starter. > > Heh, figuring out how to do this for pKVM seemed hard hence my lazy > attempt :) > > > A > > possible alternative that comes to mind would be to have all walkers > > take references on the pages as they walk down, and release them on > > their way back, but I'm still not sure how to make this race-safe. I'll > > have a think ... > > Does pKVM ever collapse tables into blocks? That is the only reason any > of this mess ever gets roped in. If not I think it is possible to get > away with a rwlock with unmap on the write side and everything else on > the read side, right? > > As far as regular KVM goes we get in this business when disabling dirty > logging on a memslot. Guest faults will lazily collapse the tables back > into blocks. An equally valid implementation would be just to unmap the > whole memslot and have the guest build out the tables again, which could > work with the aforementioned rwlock. Apologies for the delay on this one, I was away for a while. Yup, that all makes sense. FWIW the pKVM use-case I have in mind is slightly different. Specifically, in the pKVM world the hypervisor maintains a stage-2 for the host, that is all identity mapped. So we use nice big block mappings as much as we can. But when a protected guest starts, the hypervisor needs to break down the host stage-2 blocks to unmap the 4K guest pages from the host (which is where the protection comes from in pKVM). And when the guest is torn down, the host can reclaim its pages, hence putting us in a position to coallesce its stage-2 into nice big blocks again. Note that none of this coallescing is currently implemented even in our pKVM prototype, so it's a bit unfair to ask you to deal with this stuff now, but clearly it'd be cool if there was a way we could make these things coexist and even ideally share some code...