From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32F4DB640 for ; Thu, 30 Oct 2025 16:05:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761840310; cv=none; b=oKQeKZM49XGsghDEB14fmAEwOfisMoyBbcRsTH27faGdYg9ig0WHbmntzVk48Bc5lJq8iajFY9hvjkni7BCjuRqjLuGnMdJe9WlBSVoCNZJUhlXBFPUPeGZ9P27SsiKX8wenaHDZTL/SI1rzO9A4j8vskLznY0wPlLK/vRaxNhg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761840310; c=relaxed/simple; bh=c9b8XC/fH65U9UHUGq3SmXgvpJYhZ/Lo5B2qlcSERQ0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=uxMbpZZaaqA1thAVxH2D2CSYq8xw/QOcZWn951gr318A6lmpJHhVHcw5v7WbdojSXo5UpenSvtxcYkxJBHjtSQNtWqtbGXDwe6ZzdrWgrstEL5A8VQzOBTGUtdqzAWgi0GaGYuSL/sMTHrqyvlhLdWXN4AqD9NMbL5Euiiu3xoQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=CDvt75El; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CDvt75El" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-426d4f59cbcso1356311f8f.1 for ; Thu, 30 Oct 2025 09:05:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1761840306; x=1762445106; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=c9b8XC/fH65U9UHUGq3SmXgvpJYhZ/Lo5B2qlcSERQ0=; b=CDvt75ElY3hdyLEZnu5e1h6J5sMfYbfZLViMq0FtvdBo6f3AztmUt90lVbhofK72EK b745wROqFofANCzBlhNHxm2KN+pTzU7kvyPTuPbvqLi1LG6nKhZqILpjkBgZPRq3kQuZ eyaOBe6qbuD9b0wvX7pcUmBDcYD2CI22Sy3ewlxtzOTvkRQlq2xVjXPpNff1VPHu7hCt ohJUvsZxnM6hRmARijIiA/iL77tvS5Cr/S5iPd0vUw9HVrbfAcdYfOiY0/wb2ZB398fe i0MBwZqx3tS3Cn0LlwnHMPfjgclAImjU3fkpRYKslFfi8jX2Sc0EZmrHnTIGFrth8N0S fUZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761840306; x=1762445106; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=c9b8XC/fH65U9UHUGq3SmXgvpJYhZ/Lo5B2qlcSERQ0=; b=LwZfis8zLODRjH4uaHAa8bSwEInl7nsrKjBG02Q2W3mPzIwJldyO1IejfMBj1cQMHr IV0n8SjfGchYLGMn6PrKunPoTNnKYNl37g/m/gY+xZLr0jN9Qz2Z4320W42yny2rIR1u I66WJp5RJiCpSCKFFIF3hmMl7iob0VF3FLybXca66v+FnZRfW0Gfez06rTo7fS7Y+6cu ODic2A3K0qcVn7eFrySBeEUsZg0FFKzKiY9SStRU+Tgs8nzZ+VIk07wpvHLhjoIInB8/ CNCGZt7Sr8HGiaWtYM1VyrjW6IH+K+/CGMLTnwRWF03eCxUxyJtGdE2lqy7UtKVmOK79 uFwQ== X-Forwarded-Encrypted: i=1; AJvYcCXoYqnZgWNPqH118jcm8P300JIh8n8M3Pv+O6IHyHo6Fdgz10XtA2NTXj2FLbwbjrce6BXLcyrTL1Q=@vger.kernel.org X-Gm-Message-State: AOJu0YyhtAJxI9SCoXqsNCwUVMg84EqoINaNitx4M8X3xIxSvSD+ze3E Lw1fNSmLlXAuAy1h12eXUhdoyLTbLpfFhErs/5z3MmaCidSV3B/8u0gnx89QY8qJit8N4XY39Yf JJHSl2M7QO4bwYw== X-Google-Smtp-Source: AGHT+IHjMaURNuY5ADspjcwjcl/WRoZFYO9bioemMflpQGaxNDVvaG4JDdR/nXAPdbz/Yfz+Utvfc7Y4KaunSA== X-Received: from wmbz6.prod.google.com ([2002:a05:600c:c086:b0:471:6089:1622]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:2408:b0:429:8b8a:c32b with SMTP id ffacd0b85a97d-429b4c83176mr3266075f8f.22.1761840306279; Thu, 30 Oct 2025 09:05:06 -0700 (PDT) Date: Thu, 30 Oct 2025 16:05:05 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250924151101.2225820-4-patrick.roy@campus.lmu.de> <20250924152214.7292-1-roypat@amazon.co.uk> <20250924152214.7292-3-roypat@amazon.co.uk> X-Mailer: aerc 0.21.0 Message-ID: Subject: Re: [PATCH v7 06/12] KVM: guest_memfd: add module param for disabling TLB flushing From: Brendan Jackman To: Dave Hansen , "Roy, Patrick" Cc: "pbonzini@redhat.com" , "corbet@lwn.net" , "maz@kernel.org" , "oliver.upton@linux.dev" , "joey.gouly@arm.com" , "suzuki.poulose@arm.com" , "yuzenghui@huawei.com" , "catalin.marinas@arm.com" , "will@kernel.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "dave.hansen@linux.intel.com" , "x86@kernel.org" , "hpa@zytor.com" , "luto@kernel.org" , "peterz@infradead.org" , "willy@infradead.org" , "akpm@linux-foundation.org" , "david@redhat.com" , "lorenzo.stoakes@oracle.com" , "Liam.Howlett@oracle.com" , "vbabka@suse.cz" , "rppt@kernel.org" , "surenb@google.com" , "mhocko@suse.com" , "song@kernel.org" , "jolsa@kernel.org" , "ast@kernel.org" , "daniel@iogearbox.net" , "andrii@kernel.org" , "martin.lau@linux.dev" , "eddyz87@gmail.com" , "yonghong.song@linux.dev" , "john.fastabend@gmail.com" , "kpsingh@kernel.org" , "sdf@fomichev.me" , "haoluo@google.com" , "jgg@ziepe.ca" , "jhubbard@nvidia.com" , "peterx@redhat.com" , "jannh@google.com" , "pfalcato@suse.de" , "shuah@kernel.org" , "seanjc@google.com" , "kvm@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "kvmarm@lists.linux.dev" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "bpf@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , "Cali, Marco" , "Kalyazin, Nikita" , "Thomson, Jack" , "derekmn@amazon.co.uk" , "tabba@google.com" , "ackerleytng@google.com" Content-Type: text/plain; charset="UTF-8" On Thu Sep 25, 2025 at 6:27 PM UTC, Dave Hansen wrote: > On 9/24/25 08:22, Roy, Patrick wrote: >> Add an option to not perform TLB flushes after direct map manipulations. > > I'd really prefer this be left out for now. It's a massive can of worms. > Let's agree on something that works and has well-defined behavior before > we go breaking it on purpose. As David pointed out in the MM Alignment Session yesterday, I might be able to help here. In [0] I've proposed a way to break up the direct map by ASI's "sensitivity" concept, which is weaker than the "totally absent from the direct map" being proposed here, but it has kinda similar implementation challenges. Basically it introduces a thing called a "freetype" that extends the idea of migratetype. Like the existing idea of migratetype, it's used to physically group pages when allocating, and you can index free pages by it, i.e. each freetype gets its own freelist. But it can also encode other information than mobility (and the other stuff that's encoded in migratetype...). Could it make sense to use that logic to just have entire pageblocks that are absent from the direct map? Then when allocating memory for the guest_memfd we get it from one of those pageblocks. Then we only have to flush the TLB if there's no memory left in pageblocks of this freetype (so the allocator has to flip another pageblock over to the "no direct map" freetype, after removing it from the direct map). I haven't yet investigated this properly, I'll start doing that now. But I thought I'd immediately drop this note in case anyone can immediately see a reason why this doesn't work. [0] https://lore.kernel.org/all/20250924-b4-asi-page-alloc-v1-0-2d861768041f@google.com/T/#t BTW, I think if the skip-flush flag is the only thing blocking this patchset, it would be great to merge it without it. Even if that means it's no use for Firecracker usecases that doesn't mean the underlying feature isn't valuable for _someone_. Then we can figure out how to make it work for Firecracker afterwards, one way or another. (Just to be transparent: my nefarious ulterior motive is that it would give me an angle to start merging code that will eventually support ASI. But, I'm serious that there are probably users who would like this feature even if it's slow!)