From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D82854BD1;
	Thu, 25 Jan 2024 10:55:22 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1706180124; cv=none; b=jfFVL2ZQMfK4IRgeTIYnYzMtvQh5UCXOETXA2ByziS7MqKMo//dsT/+FvDr4b1qImQhRFZ+sjlfLWpEYGRdUIDLAznnuuMyHpmN5FN+rM18X7PmA2F5ZjE17AYCsLz6+We+1ZVRPDcjV1HT2BpAp7Alzlb0+JZaMm88q9t7pLsA=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1706180124; c=relaxed/simple;
	bh=lgWjkpPUrAxK+RKWlUCeLeY4GW8Rd9cDoAf7MGzIDYs=;
	h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References:
	 MIME-Version:Content-Type; b=hS34nFjRxidmB3dITtOAv5laQMRqWcJiXNmj3BYW7UjxqGuAs3YzS5SFuS8vXYCob8vBoi8tTUd8Udm6R69om7EUHITsVbYly7uFq6rzekKLZdZLyXHl+4HJhlJFxIRgvlObEUvSxHiWQEl8Naf+eRa/fZSfJfvYcRt1GfgPsPk=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=l+IWZT5B; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="l+IWZT5B"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id B5BC9C433F1;
	Thu, 25 Jan 2024 10:55:22 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1706180122;
	bh=lgWjkpPUrAxK+RKWlUCeLeY4GW8Rd9cDoAf7MGzIDYs=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=l+IWZT5B87Hbr5HlyVCzrJOK15e/ReRySXxTBQgUsrh4ZyQ/ijenK8AQpoO3OHhSL
	 8WjF0Qo0JOrMMS6W/r8VxVOWczbYw8J6SFEK7NY1iDt+ZR9WGq4/cbQcWsMXeZULl1
	 AsrzzBROPvvnCFDfNJTcHavh6LzWL97n2bQH+VCB8eKFq3EARpHTbbM/NZS4QMUZMl
	 njs4/8psbBKK+yGRS6iIiuEuN1XH9HoO15+MJjRJmO31U3Ou62tqJ0Bq+Ns6amvYAs
	 ot8nvyetSv1lW0LZxcEu7CHIJElehdpnrWt/xPtiLxjN2vnuchr1cdOLf+kIdxsdPi
	 woho9LunxAwOw==
Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org)
	by disco-boy.misterjones.org with esmtpsa  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.95)
	(envelope-from <maz@kernel.org>)
	id 1rSxOC-00Eh2I-9R;
	Thu, 25 Jan 2024 10:55:20 +0000
Date: Thu, 25 Jan 2024 10:55:19 +0000
Message-ID: <861qa58yy0.wl-maz@kernel.org>
From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: kvmarm@lists.linux.dev,
	kvm@vger.kernel.org,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Raghavendra Rao Ananta <rananta@google.com>,
	Jing Zhang <jingzhangos@google.com>
Subject: Re: [PATCH 12/15] KVM: arm64: vgic-its: Pick cache victim based on usage count
In-Reply-To: <20240124204909.105952-13-oliver.upton@linux.dev>
References: <20240124204909.105952-1-oliver.upton@linux.dev>
	<20240124204909.105952-13-oliver.upton@linux.dev>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.1
 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO)
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset=US-ASCII
X-SA-Exim-Connect-IP: 185.219.108.64
X-SA-Exim-Rcpt-To: oliver.upton@linux.dev, kvmarm@lists.linux.dev, kvm@vger.kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, rananta@google.com, jingzhangos@google.com
X-SA-Exim-Mail-From: maz@kernel.org
X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false

On Wed, 24 Jan 2024 20:49:06 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> To date the translation cache LRU policy relies on the ordering of the
> linked-list to pick the victim, as entries are moved to the head of the
> list on every cache hit. These sort of transformations are incompatible
> with an rculist, necessitating a different strategy for recording usage
> in-place.
> 
> Count the number of cache hits since the last translation cache miss for
> every entry. The preferences for selecting a victim are as follows:
> 
>  - Invalid entries over valid entries
> 
>  - Valid entry with the lowest usage count
> 
>  - In the case of a tie, pick the entry closest to the tail (oldest)
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/vgic/vgic-its.c | 42 ++++++++++++++++++++++++++--------
>  1 file changed, 32 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> index aec82d9a1b3c..ed0c6c333a6c 100644
> --- a/arch/arm64/kvm/vgic/vgic-its.c
> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> @@ -154,6 +154,7 @@ struct vgic_translation_cache_entry {
>  	u32			devid;
>  	u32			eventid;
>  	struct vgic_irq		*irq;
> +	atomic64_t		usage_count;
>  };
>  
>  /**
> @@ -577,13 +578,7 @@ static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist,
>  		    cte->eventid != eventid)
>  			continue;
>  
> -		/*
> -		 * Move this entry to the head, as it is the most
> -		 * recently used.
> -		 */
> -		if (!list_is_first(&cte->entry, &dist->lpi_translation_cache))
> -			list_move(&cte->entry, &dist->lpi_translation_cache);
> -
> +		atomic64_inc(&cte->usage_count);
>  		return cte->irq;
>  	}
>  
> @@ -616,6 +611,30 @@ static unsigned int vgic_its_max_cache_size(struct kvm *kvm)
>  	return atomic_read(&kvm->online_vcpus) * LPI_DEFAULT_PCPU_CACHE_SIZE;
>  }
>  
> +static struct vgic_translation_cache_entry *vgic_its_cache_victim(struct vgic_dist *dist)
> +{
> +	struct vgic_translation_cache_entry *cte, *victim = NULL;
> +	u64 min, tmp;
> +
> +	/*
> +	 * Find the least used cache entry since the last cache miss, preferring
> +	 * older entries in the case of a tie. Note that usage accounting is
> +	 * deliberately non-atomic, so this is all best-effort.
> +	 */
> +	list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
> +		if (!cte->irq)
> +			return cte;
> +
> +		tmp = atomic64_xchg_relaxed(&cte->usage_count, 0);
> +		if (!victim || tmp <= min) {

min is not initialised until after the first round. Not great. How
comes the compiler doesn't spot this?

> +			victim = cte;
> +			min = tmp;
> +		}
> +	}

So this resets all the counters on each search for a new insertion?
Seems expensive, specially on large VMs (512 * 16 = up to 8K SWP
instructions in a tight loop, and I'm not even mentioning the fun
without LSE). I can at least think of a box that will throw its
interconnect out of the pram it tickled that way.

I'd rather the new cache entry inherits the max of the current set,
making it a lot cheaper. We can always detect the overflow and do a
full invalidation in that case (worse case -- better options exist).

> +
> +	return victim;
> +}
> +
>  static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
>  				       u32 devid, u32 eventid,
>  				       struct vgic_irq *irq)
> @@ -645,9 +664,12 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
>  		goto out;
>  
>  	if (dist->lpi_cache_count >= vgic_its_max_cache_size(kvm)) {
> -		/* Always reuse the last entry (LRU policy) */
> -		victim = list_last_entry(&dist->lpi_translation_cache,
> -				      typeof(*cte), entry);
> +		victim = vgic_its_cache_victim(dist);
> +		if (WARN_ON_ONCE(!victim)) {
> +			victim = new;
> +			goto out;
> +		}

I don't understand how this could happen. It sort of explains the
oddity I was mentioning earlier, but I don't think we need this
complexity.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.