From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B9840C87FCB for ; Mon, 4 Aug 2025 13:09:03 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.1069368.1433225 (Exim 4.92) (envelope-from ) id 1uiuvu-0006JN-C8; Mon, 04 Aug 2025 13:08:54 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 1069368.1433225; Mon, 04 Aug 2025 13:08:54 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uiuvu-0006JG-9C; Mon, 04 Aug 2025 13:08:54 +0000 Received: by outflank-mailman (input) for mailman id 1069368; Mon, 04 Aug 2025 13:08:53 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uiuvt-0006J9-KU for xen-devel@lists.xenproject.org; Mon, 04 Aug 2025 13:08:53 +0000 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [2a00:1450:4864:20::32b]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 23ecdedd-7134-11f0-b898-0df219b8e170; Mon, 04 Aug 2025 15:08:40 +0200 (CEST) Received: by mail-wm1-x32b.google.com with SMTP id 5b1f17b1804b1-4563a57f947so17252435e9.1 for ; Mon, 04 Aug 2025 06:08:40 -0700 (PDT) Received: from localhost (112.pool92-178-7.dynamic.orange.es. [92.178.7.112]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-3b79c3b95f4sm15435084f8f.23.2025.08.04.06.08.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Aug 2025 06:08:38 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 23ecdedd-7134-11f0-b898-0df219b8e170 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=citrix.com; s=google; t=1754312920; x=1754917720; darn=lists.xenproject.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=MSIZhFSRoFRHF6QoEa7m5+whRac4naGoL1s/P59JX60=; b=uOBSyMoy5d/6LMm+wF4D07MaHRxeA2Z1Id3ZtVua70bkNDp1yxd9jj3vczqSWZtDbB sfg/pRPC1QgmdD5IqAKPAl6mm/OeOL5uwTPUFii1+QtiIx32vd6kyrHQgo9X9i8CGVGZ y7xIl5GhBpYGPPS6ijBDQjKMvkmzxYnYzH0DM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754312920; x=1754917720; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MSIZhFSRoFRHF6QoEa7m5+whRac4naGoL1s/P59JX60=; b=cZR/K7JJ2tyV/0o4+qw3at/mUWhQZYvsWb5j5GRQsuV+3rXrcMRrp+bI5yyGZR7u8F 5j99cZAPcfrtyCWEpz/yRpIWYkF8ogD22rbh22uzIKXZdEzX4ZNnD0EgMgVHchBxHr1A 2BROkWG1Sp3s8fkURrRgMLj1eLo//xfdxpqvgVjTdehr0iUBk6PvZM255LSFdoiEQ+r6 5Lh/PHWM8p/y6CELoH6OAnaOgPFFI6TfTIdk1FfBVWxI8ql/UuwD/xC0aFb1zZRXWbcy RUYm/Ux5sMJaTVZEgON6BVklTWrAgwm5i6bahT1ucKF5a0PpJSIHGFjJxtDHqrZXSYQX zQsQ== X-Forwarded-Encrypted: i=1; AJvYcCVMzxNtYEyov9HFFUpx2rlHD16X42P9sDHuFxTUpmlCzdIbyrvCd9uGcvATp7NtN9A8lVmeYF42o48=@lists.xenproject.org X-Gm-Message-State: AOJu0YylXxwJlG0GyIb1cv/qNpDuRJ3ldt3mc6GYGAoQB1PkXElciGje mn0FBRMH+Aj9YnnVqyyj7pvc24i7d9fFof4hJi01nU92rw6VEtuopsB2ur90bkYaVpE= X-Gm-Gg: ASbGncuMq/cD7IlKPEjVD54TsNFAkIuSlh58NfRD2Nup50kQT4slMneZMGiRZefvK6t /whD/O3AVftAOZomU7g9lxkbGfQOYvQgHlwM3nVRj2BvoxfdzV6CP/AJ/9sN11iZq7QBwPR24Xj TwdW+tcR/7ts6vIt1XUfFM/wuTkbIg7GleKZtUFRKuP90NqWunZ9M3bqCKIaTAzN1dAUJdkxAts uCNqLqbw6lhnPzkoufB+/AZ1mHpTVFSqG59sYQb5jE8nVpwryC5llgPVE4vbgHqbDby+xtT8NaH SnChUytFtGjNI3AgvYG0w9Pw6x3BsmufI5Yg3jFgmckrcYEF0sa3IWvzg1tzb+XA1Xb7CXasGuG CofagEI0Q5kWg/aGtiTTPcKDgce5NuEeU19Xbkf6IkiFV2/y3D6LPWfu1gX8urN2R0h0HfJL7Aa RD X-Google-Smtp-Source: AGHT+IGEs83Uj2/oRUpsAQ5N8R7Z63h/wcWyNwoTmH7LL4cVINCAZioNaJR54yJf3n7th86hOSO7wg== X-Received: by 2002:a05:600c:46d3:b0:456:21f4:a0a3 with SMTP id 5b1f17b1804b1-458b59e3f41mr84967645e9.5.1754312919702; Mon, 04 Aug 2025 06:08:39 -0700 (PDT) Date: Mon, 4 Aug 2025 15:08:34 +0200 From: Roger Pau =?utf-8?B?TW9ubsOp?= To: Jan Beulich Cc: Oleksii Kurochko , Community Manager , Andrew Cooper , Anthony PERARD , Michal Orzel , Julien Grall , Stefano Stabellini , xen-devel@lists.xenproject.org Subject: Re: [PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets Message-ID: References: <20250724110416.2081-1-roger.pau@citrix.com> <20250724110416.2081-8-roger.pau@citrix.com> <87c1a72d-62cf-475a-8320-70e1954cd0f5@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87c1a72d-62cf-475a-8320-70e1954cd0f5@suse.com> On Tue, Jul 29, 2025 at 04:16:15PM +0200, Jan Beulich wrote: > On 24.07.2025 13:04, Roger Pau Monne wrote: > > --- a/xen/common/pdx.c > > +++ b/xen/common/pdx.c > > @@ -24,6 +24,7 @@ > > #include > > #include > > #include > > +#include > > > > /** > > * Maximum (non-inclusive) usable pdx. Must be > > @@ -40,6 +41,12 @@ bool __mfn_valid(unsigned long mfn) > > > > #ifdef CONFIG_PDX_MASK_COMPRESSION > > invalid |= mfn & pfn_hole_mask; > > +#elif defined(CONFIG_PDX_OFFSET_COMPRESSION) > > +{ > > + unsigned long base = pfn_bases[PFN_TBL_IDX(mfn)]; > > + > > + invalid |= mfn < base || mfn >= base + pdx_region_size; > > Leveraging wrapping, this could be simplified to > > invalid |= mfn - base >= pdx_region_size; > > I think. Considering it's a frequently used path, doing so may be worthwhile. I don't think that would work for all cases, take the following example: PFN compression using lookup table shift 18 and region size 0x40000 range 0 [0000000280000, 00000002bffff] PFN IDX 10 : 0000000280000 If you pass mfn 0 to mfn_valid() with your proposed adjustment, the result of the subtraction would be: 0 - ~0UL == 1 Which wouldn't satisfy the >= condition, and hence pfn 0 would be reported as a valid mfn. I think we need to keep both sides of the check. > > @@ -75,6 +82,13 @@ void set_pdx_range(unsigned long smfn, unsigned long emfn) > > # error "Missing architecture maximum number of RAM ranges" > > #endif > > > > +/* Some functions should be init when not using PDX mask compression. */ > > +#ifndef CONFIG_PDX_MASK_COMPRESSION > > +# define __init_or_pdx_mask __init > > +#else > > +# define __init_or_pdx_mask > > +#endif > > Considering this is local to the CU, "pdx" in the name isn't very meaningful. > Instead it being a compression type may be, so how about __init_or_mask_compr > or some such? (If we gain further compression types, this may be getting > unwieldy anyway.) I'm fine with your proposal. The current naming was derived from the CONFIG_ define. > > +static void __init cf_check swp_node(void *a, void *b, size_t size) > > +{ > > + SWAP(a, b); > > This doesn't look right - you swap a and b, not what they point to. > > > +static bool __init pfn_offset_sanitize_ranges(void) > > +{ > > + unsigned int i = 0; > > + > > + if ( nr_ranges == 1 ) > > + { > > + ASSERT(PFN_TBL_IDX(ranges[0].base) == > > + PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1)); > > I think this points out a naming issue in patch 2: "base" and "size" look > as if these were address / byte count, when they're PFN / page count. > Which caught my eye because of the values being passed to something that > actually wants a PFN (and hence looked wrong at the first glance). The struct name is pfn_range, I could rename the fields to base_pfn and pages or similar, but my impression was that the struct name was enough of a pointer that those are PFN ranges. Do you have any alternative proposal about how to name those? > > + return true; > > + } > > + > > + while ( i + 1 < nr_ranges ) > > + { > > + /* > > + * Ensure ranges [start, end] use the same offset table index. Should > > + * be guaranteed by the logic that calculates the pfn shift. > > + */ > > + if ( PFN_TBL_IDX(ranges[i].base) != > > + PFN_TBL_IDX(ranges[i].base + ranges[i].size - 1) || > > + PFN_TBL_IDX(ranges[i + 1].base) != > > + PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) ) > > It feels a little odd to re-do the 2nd half of the checking here on the next > iteration, when the table wouldn't have changed when ... > > > + { > > + ASSERT_UNREACHABLE(); > > + return false; > > + } > > + > > + if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + 1].base) ) > > + { > > + i++; > > + continue; > > ... taking this path. Could I talk you into moving the latter half ... > > > + } > > ... here? If that's needed at all, as ... > > > + /* Merge ranges with the same table index. */ > > + ranges[i].size = ranges[i + 1].base + ranges[i + 1].size - > > + ranges[i].base; > > ... the new range should also fulfill the requirement. Just that the last > such range then wouldn't be checked, unless ... > > > + memmove(&ranges[i + 1], &ranges[i + 2], > > + (nr_ranges - (i + 2)) * sizeof(ranges[0])); > > + nr_ranges--; > > + } > > ... that checking was put past the loop. Which then would allow to remove > the special casing of nr_ranges == 1 at the top of the function: You'd > uniformly check the ranges[nr_ranges - 1] here. What about doing it like: static bool __init pfn_offset_sanitize_ranges(void) { unsigned int i = 0; if ( PFN_TBL_IDX(ranges[0].base) != PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1) ) { ASSERT_UNREACHABLE(); return false; } while ( i + 1 < nr_ranges ) { /* * Ensure ranges [start, end] use the same offset table index. Should * be guaranteed by the logic that calculates the pfn shift. */ if ( PFN_TBL_IDX(ranges[i + 1].base) != PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) ) { ASSERT_UNREACHABLE(); return false; } if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + 1].base) ) { i++; continue; } /* Merge ranges with the same table index. */ ranges[i].size = ranges[i + 1].base + ranges[i + 1].size - ranges[i].base; memmove(&ranges[i + 1], &ranges[i + 2], (nr_ranges - (i + 2)) * sizeof(ranges[0])); nr_ranges--; } return true; } I've pulled the index 0 check ahead of the loop, which then covers for the case where nr_ranges == 1. There's also no duplicate checking of the ranges, since the range at i + 1 will always be a non-checked one; either because the array has been shifted as a result of a range merging, or the index has been advanced. > > + ranges[i].size = start + ranges[i].size - ranges[i].base; > > + > > + /* > > + * Only merge overlapped regions now, leave adjacent regions separated. > > + * They would be merged later if both use the same index into the > > + * lookup table. > > + */ > > + if ( !i || ranges[i].base >= (ranges[i - 1].base + ranges[i - 1].size) ) > > + { > > + /* > > + * We might parse the region at position 0 more than once, as for > > + * index 0 we don't attempt to merge to keep this simple. > > + */ > > + mask |= pdx_region_mask(ranges[i].base, ranges[i].size); > > + continue; > > In how far is it relevant here that slot 0 may be looked at more than once? Not really, just that someone might wonder about the possibly inefficiency of this. I will remove the comment then. > > + } > > + > > + ranges[i - 1].size = ranges[i].base + ranges[i].size - > > + ranges[i - 1].base; > > + > > + if ( i + 1 < nr_ranges ) > > + memmove(&ranges[i], &ranges[i + 1], > > + (nr_ranges - (i + 1)) * sizeof(ranges[0])); > > + else /* last range */ > > + mask |= pdx_region_mask(ranges[i].base, ranges[i].size); > > + nr_ranges--; > > + i--; > > + } > > + > > + /* > > + * Populate a mask with the non-equal bits of the different ranges, do this > > + * to calculate the maximum PFN shift to use as the lookup table index. > > + */ > > + for ( i = 0; i < nr_ranges; i++ ) > > + for ( unsigned int j = 0; j < nr_ranges; j++ ) > > + idx_mask |= (ranges[i].base & ~mask) ^ (ranges[j].base & ~mask); > > + > > + if ( !idx_mask ) > > + /* Single region case. */ > > + pfn_index_shift = flsl(mask); > > + else if ( flsl(idx_mask) - ffsl(idx_mask) < CONFIG_PDX_OFFSET_TBL_ORDER ) > > + /* The changed mask fits in the table index width. */ > > + pfn_index_shift = ffsl(idx_mask) - 1; > > + else > > + /* Changed mask is wider than array size, use most significant bits. */ > > + pfn_index_shift = flsl(idx_mask) - CONFIG_PDX_OFFSET_TBL_ORDER; > > Perhaps emit a log message here to indicate that bigger PDX_OFFSET_TBL_ORDER > may yield better results? What about: printk(XENLOG_DEBUG "PFN compression table index truncated, requires order %u\n", flsl(idx_mask) - ffsl(idx_mask) + 1); > > + /* > > + * Ensure correctness of the calculated values, plus merge ranges if they > > + * use the same lookup table index. > > + */ > > + if ( !pfn_offset_sanitize_ranges() ) > > + { > > + printk(XENLOG_DEBUG "PFN compression is invalid, disabling\n"); > > + pfn_pdx_compression_reset(); > > + return false; > > + } > > + > > + /* > > + * Find the maximum PFN range size after having merged ranges with same > > + * index. The rounded up region size will be the base for the PDX region > > + * size and shift. > > + */ > > + for ( i = 0; i < nr_ranges; i++ ) > > + size = max(size, ranges[i].size); > > + > > + /* pdx_init_mask() already takes MAX_ORDER into account. */ > > + mask = PFN_DOWN(pdx_init_mask(size << PAGE_SHIFT)); > > We're in arch-neutral code here, so I think you need to cast size to paddr_t > before shifting. Indeed, size is unsigned long here. Thanks, Roger.