From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73E3A1CB325 for ; Wed, 29 May 2024 22:58:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717023487; cv=none; b=MvLJouzgr+OtqLYuEYyQxqP19uG2wMtt6SI5RLwTwovRVtmrHC5ycOuE0SV861ko0JHvbDNQkZQaHxPs0hZ3cRHAmOiubJKy4GZ8Lcqe6C5J85O9GVMQMKyfvifv7YOjykyleIeuV0dlMM5jur0Tna1CaINTapqQJjL5W6mNl7A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717023487; c=relaxed/simple; bh=ILTrqYj7gGL+uJmOCLBAjzct0EhWj0aN1ME6oQovXBA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=gaaZEoZLWfSbzT1OOcEEQ3gmfH0GmxnPbOZIRhSHFlWW7yQulLAYGaHWXjMvYIhFZyhnK2NwGj4vxbilCbNUBACqkj+j7NkfFFa3xX5BksY8djiKwbzmlnRRV8u6NU/ByypGkqHqvT/IiJ9AwEUW2OAKYsRmfRiT12+3rJ5ClGY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=hQASlsLC; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hQASlsLC" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-df4f32e9f46so374229276.0 for ; Wed, 29 May 2024 15:58:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717023484; x=1717628284; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=lXQ4aAmXpFR+Rnn+LVSp/P9w+zocv3IAjl6jcVlXHAQ=; b=hQASlsLCL8kat+pj2/OkA1ZECcq0+xKl8UztvJtH9+LXg6cCfWSX0oohqQTDHgHoIb J6rtEUd3mudMo4BrLifi5jjMNrb//SYmWuebakj8yeapa+M7cM5rUoZcZlx3Ve+NlJDg zd/JJqHa5+YYXVp4BgXx5VQTmO3mwBsaM7h74wh90vWCuVcNEX66rVWxK5hJ7BvneDvO CajLCrG7dWW7LRy2ArZt5Y6OfZjNOArR1+A54pEBICg/tKqOHQxZTurZ7Wor7jsvEqOW reBjDmxC96PJzNL61dz+shkHdzwkmQRJziF8517eNEIX/mNdrh/qkIuS3DWbDtU6yAAX Q/1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717023484; x=1717628284; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=lXQ4aAmXpFR+Rnn+LVSp/P9w+zocv3IAjl6jcVlXHAQ=; b=RaDF2uGiud4ZuGSM9kZNz12+La+bm2qRfLNIAUbu+ver1/ph17p+h6zjN+DSD7lcIa kn97vibsGNgkOJQR1XylV7fDZzuGI/7E1gLpkmLwQg656c72LJvVKs39UeYAAjVr2Ari JDo8o7ZRZCNE+kR0i7Lz0ostllWbQUUAbXA07whfGI42p6HJgiNdagIYpegHBjrrucWO FpJgUOwOqCtN3Xw91llQjKys9UplTp/lQn3iPNPZ6wgHxX0Ro8EbHhRoZOGeACRpvQ6M btQAlSo+WOKGB2XIOm5SSdBkz5MZhD++ODyYHsv3IISx49pt9YlOArbEFn/u84BDEaas akTA== X-Forwarded-Encrypted: i=1; AJvYcCUqkmjCMcVM/l2HzGSbU5IFx1DaUFwBFJ+813CLgwojglzjB6iUGQnvM2RRU10D9w7hJxSQXFGP2KRWNAW/UA0r4VIv5nB+c28v0Q== X-Gm-Message-State: AOJu0YwM5DwKbb4zUhILDNJP0QK918+/OugH8kVhXS08thXjT+9q6he/ 3QVzlI0HJSMcMhMoehoJAIT74yTbxNZpmoonAcTH30TGYCEM40z5gBF7F4DMrKsxW4pB1OYHSyl qQw== X-Google-Smtp-Source: AGHT+IEcoxWAWGQljmQC1ObvPASoASZSWeSUzQr2Szq7dgxCnuAYyloQCwOCyj3B7twEp9K1BnNOHbqA4OU= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:1007:b0:dfa:5838:b919 with SMTP id 3f1490d57ef6-dfa5a68828dmr142777276.10.1717023484293; Wed, 29 May 2024 15:58:04 -0700 (PDT) Date: Wed, 29 May 2024 15:58:02 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-mips@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> <20240529180510.2295118-3-jthoughton@google.com> Message-ID: Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging From: Sean Christopherson To: Yu Zhao Cc: James Houghton , Andrew Morton , Paolo Bonzini , Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Wed, May 29, 2024, Yu Zhao wrote: > On Wed, May 29, 2024 at 3:59=E2=80=AFPM Sean Christopherson wrote: > > > > On Wed, May 29, 2024, Yu Zhao wrote: > > > On Wed, May 29, 2024 at 12:05=E2=80=AFPM James Houghton wrote: > > > > > > > > Secondary MMUs are currently consulted for access/age information a= t > > > > eviction time, but before then, we don't get accurate age informati= on. > > > > That is, pages that are mostly accessed through a secondary MMU (li= ke > > > > guest memory, used by KVM) will always just proceed down to the old= est > > > > generation, and then at eviction time, if KVM reports the page to b= e > > > > young, the page will be activated/promoted back to the youngest > > > > generation. > > > > > > Correct, and as I explained offline, this is the only reasonable > > > behavior if we can't locklessly walk secondary MMUs. > > > > > > Just for the record, the (crude) analogy I used was: > > > Imagine a large room with many bills ($1, $5, $10, ...) on the floor, > > > but you are only allowed to pick up 10 of them (and put them in your > > > pocket). A smart move would be to survey the room *first and then* > > > pick up the largest ones. But if you are carrying a 500 lbs backpack, > > > you would just want to pick up whichever that's in front of you rathe= r > > > than walk the entire room. > > > > > > MGLRU should only scan (or lookaround) secondary MMUs if it can be > > > done lockless. Otherwise, it should just fall back to the existing > > > approach, which existed in previous versions but is removed in this > > > version. > > > > IIUC, by "existing approach" you mean completely ignore secondary MMUs = that > > don't implement a lockless walk? >=20 > No, the existing approach only checks secondary MMUs for LRU folios, > i.e., those at the end of the LRU list. It might not find the best > candidates (the coldest ones) on the entire list, but it doesn't pay > as much for the locking. MGLRU can *optionally* scan MMUs (secondary > included) to find the best candidates, but it can only be a win if the > scanning incurs a relatively low overhead, e.g., done locklessly for > the secondary MMU. IOW, this is a balance between the cost of > reclaiming not-so-cold (warm) folios and that of finding the coldest > folios. Gotcha. I tend to agree with Yu, driving the behavior via a Kconfig may generate si= mpler _code_, but I think it increases the overall system complexity. E.g. distr= os will likely enable the Kconfig, and in my experience people using KVM with = a distro kernel usually aren't kernel experts, i.e. likely won't know that th= ere's even a decision to be made, let alone be able to make an informed decision. Having an mmu_notifier hook that is conditionally implemented doesn't seem = overly complex, e.g. even if there's a runtime aspect at play, it'd be easy enough= for KVM to nullify its mmu_notifier hook during initialization. The hardest pa= rt is likely going to be figuring out the threshold for how much overhead is too = much.