Re: [PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>,
	Community Manager <community.manager@xenproject.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Anthony PERARD <anthony.perard@vates.tech>,
	Michal Orzel <michal.orzel@amd.com>,
	Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets
Date: Mon, 4 Aug 2025 15:08:34 +0200	[thread overview]
Message-ID: <aJCw0vS2rw2CXJqq@macbook.local> (raw)
In-Reply-To: <87c1a72d-62cf-475a-8320-70e1954cd0f5@suse.com>

On Tue, Jul 29, 2025 at 04:16:15PM +0200, Jan Beulich wrote:
> On 24.07.2025 13:04, Roger Pau Monne wrote:
> > --- a/xen/common/pdx.c
> > +++ b/xen/common/pdx.c
> > @@ -24,6 +24,7 @@
> >  #include <xen/param.h>
> >  #include <xen/pfn.h>
> >  #include <xen/sections.h>
> > +#include <xen/sort.h>
> >  
> >  /**
> >   * Maximum (non-inclusive) usable pdx. Must be
> > @@ -40,6 +41,12 @@ bool __mfn_valid(unsigned long mfn)
> >  
> >  #ifdef CONFIG_PDX_MASK_COMPRESSION
> >      invalid |= mfn & pfn_hole_mask;
> > +#elif defined(CONFIG_PDX_OFFSET_COMPRESSION)
> > +{
> > +    unsigned long base = pfn_bases[PFN_TBL_IDX(mfn)];
> > +
> > +    invalid |= mfn < base || mfn >= base + pdx_region_size;
> 
> Leveraging wrapping, this could be simplified to
> 
>     invalid |= mfn - base >= pdx_region_size;
> 
> I think. Considering it's a frequently used path, doing so may be worthwhile.

I don't think that would work for all cases, take the following
example:

PFN compression using lookup table shift 18 and region size 0x40000
 range   0 [0000000280000, 00000002bffff] PFN IDX  10 : 0000000280000

If you pass mfn 0 to mfn_valid() with your proposed adjustment, the
result of the subtraction would be:

0 - ~0UL == 1

Which wouldn't satisfy the >= condition, and hence pfn 0 would be
reported as a valid mfn.  I think we need to keep both sides of the
check.

> > @@ -75,6 +82,13 @@ void set_pdx_range(unsigned long smfn, unsigned long emfn)
> >  # error "Missing architecture maximum number of RAM ranges"
> >  #endif
> >  
> > +/* Some functions should be init when not using PDX mask compression. */
> > +#ifndef CONFIG_PDX_MASK_COMPRESSION
> > +# define __init_or_pdx_mask __init
> > +#else
> > +# define __init_or_pdx_mask
> > +#endif
> 
> Considering this is local to the CU, "pdx" in the name isn't very meaningful.
> Instead it being a compression type may be, so how about __init_or_mask_compr
> or some such? (If we gain further compression types, this may be getting
> unwieldy anyway.)

I'm fine with your proposal.  The current naming was derived from the
CONFIG_ define.

> > +static void __init cf_check swp_node(void *a, void *b, size_t size)
> > +{
> > +    SWAP(a, b);
> 
> This doesn't look right - you swap a and b, not what they point to.
> 
> > +static bool __init pfn_offset_sanitize_ranges(void)
> > +{
> > +    unsigned int i = 0;
> > +
> > +    if ( nr_ranges == 1 )
> > +    {
> > +        ASSERT(PFN_TBL_IDX(ranges[0].base) ==
> > +               PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1));
> 
> I think this points out a naming issue in patch 2: "base" and "size" look
> as if these were address / byte count, when they're PFN / page count.
> Which caught my eye because of the values being passed to something that
> actually wants a PFN (and hence looked wrong at the first glance).

The struct name is pfn_range, I could rename the fields to base_pfn
and pages or similar, but my impression was that the struct name was
enough of a pointer that those are PFN ranges.  Do you have any
alternative proposal about how to name those?

> > +        return true;
> > +    }
> > +
> > +    while ( i + 1 < nr_ranges )
> > +    {
> > +        /*
> > +         * Ensure ranges [start, end] use the same offset table index.  Should
> > +         * be guaranteed by the logic that calculates the pfn shift.
> > +         */
> > +        if ( PFN_TBL_IDX(ranges[i].base) !=
> > +             PFN_TBL_IDX(ranges[i].base + ranges[i].size - 1) ||
> > +             PFN_TBL_IDX(ranges[i + 1].base) !=
> > +             PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) )
> 
> It feels a little odd to re-do the 2nd half of the checking here on the next
> iteration, when the table wouldn't have changed when ...
> 
> > +        {
> > +            ASSERT_UNREACHABLE();
> > +            return false;
> > +        }
> > +
> > +        if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + 1].base) )
> > +        {
> > +            i++;
> > +            continue;
> 
> ... taking this path. Could I talk you into moving the latter half ...
> 
> > +        }
> 
> ... here? If that's needed at all, as ...
> 
> > +        /* Merge ranges with the same table index. */
> > +        ranges[i].size = ranges[i + 1].base + ranges[i + 1].size -
> > +                         ranges[i].base;
> 
> ... the new range should also fulfill the requirement. Just that the last
> such range then wouldn't be checked, unless ...
> 
> > +        memmove(&ranges[i + 1], &ranges[i + 2],
> > +                (nr_ranges - (i + 2)) * sizeof(ranges[0]));
> > +        nr_ranges--;
> > +    }
> 
> ... that checking was put past the loop. Which then would allow to remove
> the special casing of nr_ranges == 1 at the top of the function: You'd
> uniformly check the ranges[nr_ranges - 1] here.	

What about doing it like:

static bool __init pfn_offset_sanitize_ranges(void)
{
    unsigned int i = 0;

    if ( PFN_TBL_IDX(ranges[0].base) !=
         PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1) )
    {
        ASSERT_UNREACHABLE();
        return false;
    }

    while ( i + 1 < nr_ranges )
    {
        /*
         * Ensure ranges [start, end] use the same offset table index.  Should
         * be guaranteed by the logic that calculates the pfn shift.
         */
        if ( PFN_TBL_IDX(ranges[i + 1].base) !=
             PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) )
        {
            ASSERT_UNREACHABLE();
            return false;
        }

        if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + 1].base) )
        {
            i++;
            continue;
        }

        /* Merge ranges with the same table index. */
        ranges[i].size = ranges[i + 1].base + ranges[i + 1].size -
                         ranges[i].base;
        memmove(&ranges[i + 1], &ranges[i + 2],
                (nr_ranges - (i + 2)) * sizeof(ranges[0]));
        nr_ranges--;
    }

    return true;
}

I've pulled the index 0 check ahead of the loop, which then covers for
the case where nr_ranges == 1.  There's also no duplicate checking of
the ranges, since the range at i + 1 will always be a non-checked one;
either because the array has been shifted as a result of a range
merging, or the index has been advanced.

> > +        ranges[i].size = start + ranges[i].size - ranges[i].base;
> > +
> > +        /*
> > +         * Only merge overlapped regions now, leave adjacent regions separated.
> > +         * They would be merged later if both use the same index into the
> > +         * lookup table.
> > +         */
> > +        if ( !i || ranges[i].base >= (ranges[i - 1].base + ranges[i - 1].size) )
> > +        {
> > +            /*
> > +             * We might parse the region at position 0 more than once, as for
> > +             * index 0 we don't attempt to merge to keep this simple.
> > +             */
> > +            mask |= pdx_region_mask(ranges[i].base, ranges[i].size);
> > +            continue;
> 
> In how far is it relevant here that slot 0 may be looked at more than once?

Not really, just that someone might wonder about the possibly
inefficiency of this.  I will remove the comment then.

> > +        }
> > +
> > +        ranges[i - 1].size = ranges[i].base + ranges[i].size -
> > +                             ranges[i - 1].base;
> > +
> > +        if ( i + 1 < nr_ranges )
> > +            memmove(&ranges[i], &ranges[i + 1],
> > +                    (nr_ranges - (i + 1)) * sizeof(ranges[0]));
> > +        else /* last range */
> > +            mask |= pdx_region_mask(ranges[i].base, ranges[i].size);
> > +        nr_ranges--;
> > +        i--;
> > +    }
> > +
> > +    /*
> > +     * Populate a mask with the non-equal bits of the different ranges, do this
> > +     * to calculate the maximum PFN shift to use as the lookup table index.
> > +     */
> > +    for ( i = 0; i < nr_ranges; i++ )
> > +        for ( unsigned int j = 0; j < nr_ranges; j++ )
> > +            idx_mask |= (ranges[i].base & ~mask) ^ (ranges[j].base & ~mask);
> > +
> > +    if ( !idx_mask )
> > +        /* Single region case. */
> > +        pfn_index_shift = flsl(mask);
> > +    else if ( flsl(idx_mask) - ffsl(idx_mask) < CONFIG_PDX_OFFSET_TBL_ORDER )
> > +        /* The changed mask fits in the table index width. */
> > +        pfn_index_shift = ffsl(idx_mask) - 1;
> > +    else
> > +        /* Changed mask is wider than array size, use most significant bits. */
> > +        pfn_index_shift = flsl(idx_mask) - CONFIG_PDX_OFFSET_TBL_ORDER;
> 
> Perhaps emit a log message here to indicate that bigger PDX_OFFSET_TBL_ORDER
> may yield better results?

What about:

printk(XENLOG_DEBUG
       "PFN compression table index truncated, requires order %u\n",
       flsl(idx_mask) - ffsl(idx_mask) + 1);

> > +    /*
> > +     * Ensure correctness of the calculated values, plus merge ranges if they
> > +     * use the same lookup table index.
> > +     */
> > +    if ( !pfn_offset_sanitize_ranges() )
> > +    {
> > +        printk(XENLOG_DEBUG "PFN compression is invalid, disabling\n");
> > +        pfn_pdx_compression_reset();
> > +        return false;
> > +    }
> > +
> > +    /*
> > +     * Find the maximum PFN range size after having merged ranges with same
> > +     * index.  The rounded up region size will be the base for the PDX region
> > +     * size and shift.
> > +     */
> > +    for ( i = 0; i < nr_ranges; i++ )
> > +        size = max(size, ranges[i].size);
> > +
> > +    /* pdx_init_mask() already takes MAX_ORDER into account. */
> > +    mask = PFN_DOWN(pdx_init_mask(size << PAGE_SHIFT));
> 
> We're in arch-neutral code here, so I think you need to cast size to paddr_t
> before shifting.

Indeed, size is unsigned long here.

Thanks, Roger.

next prev parent reply	other threads:[~2025-08-04 13:09 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-24 11:04 [PATCH v3 0/8] pdx: introduce a new compression algorithm Roger Pau Monne
2025-07-24 11:04 ` [PATCH v3 1/8] kconfig: turn PDX compression into a choice Roger Pau Monne
2025-07-24 11:04 ` [PATCH v3 2/8] pdx: provide a unified set of unit functions Roger Pau Monne
2025-07-29 13:06   ` Jan Beulich
2025-07-24 11:04 ` [PATCH v3 3/8] pdx: introduce command line compression toggle Roger Pau Monne
2025-07-24 11:04 ` [PATCH v3 4/8] pdx: allow per-arch optimization of PDX conversion helpers Roger Pau Monne
2025-07-29 13:20   ` Jan Beulich
2025-07-29 14:54   ` Andrew Cooper
2025-08-04 10:29     ` Roger Pau Monné
2025-07-24 11:04 ` [PATCH v3 5/8] test/pdx: add PDX compression unit tests Roger Pau Monne
2025-07-24 11:04 ` [PATCH v3 6/8] pdx: move some helpers in preparation for new compression Roger Pau Monne
2025-07-24 11:04 ` [PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets Roger Pau Monne
2025-07-29 14:16   ` Jan Beulich
2025-08-04 13:08     ` Roger Pau Monné [this message]
2025-08-04 13:40       ` Jan Beulich
2025-07-24 11:04 ` [PATCH v3 8/8] x86/mm: adjust loop in arch_init_memory() to iterate over the PDX space Roger Pau Monne
2025-07-29 14:33   ` Jan Beulich
2025-07-29 16:54     ` Roger Pau Monné
2025-07-30  5:31       ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJCw0vS2rw2CXJqq@macbook.local \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anthony.perard@vates.tech \
    --cc=community.manager@xenproject.org \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=michal.orzel@amd.com \
    --cc=oleksii.kurochko@gmail.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.