Re: [PATCH] Drivers: hv: mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Naman Jain <namjain@linux.microsoft.com>
To: Michael Kelley <mhklinux@outlook.com>,
	"K . Y . Srinivasan" <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
	Long Li <longli@microsoft.com>
Cc: Saurabh Sengar <ssengar@linux.microsoft.com>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] Drivers: hv: mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER
Date: Mon, 6 Apr 2026 10:26:20 +0530	[thread overview]
Message-ID: <a0d271e3-ece8-45cf-9dbb-ced773d6f3f8@linux.microsoft.com> (raw)
In-Reply-To: <SN6PR02MB4157550DA8F143F4DAFBEEE4D45EA@SN6PR02MB4157.namprd02.prod.outlook.com>



On 4/4/2026 12:07 AM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Friday, April 3, 2026 12:25 AM
>>
> 
> Nit: I wonder what's the best prefix to use in the patch Subject field.
> "Drivers: hv: mshv_vtl:" is rather long.  There was agreement to use
> just "mshv:" for the root partition code, and I probably misused that
> in commit 754cf84504ea. How about just "mshv_vtl:" as the prefix for this
> patch and other VTL patches going forward?
> 

If "mshv:" is OK for the root partition code, "mshv_vtl:" should be good 
for this. I'll change this subject, and for other patches in the future.

>> On 4/3/2026 9:05 AM, Michael Kelley wrote:
>>> From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, March 31, 2026 10:40 PM
>>>>
>>>> When registering VTL0 memory via MSHV_ADD_VTL0_MEMORY, the kernel
>>>> computes pgmap->vmemmap_shift as the number of trailing zeros in the
>>>> OR of start_pfn and last_pfn, intending to use the largest compound
>>>> page order both endpoints are aligned to.
>>>>
>>>> However, this value is not clamped to MAX_FOLIO_ORDER, so a
>>>> sufficiently aligned range (e.g. physical range 0x800000000000-
>>>> 0x800080000000, corresponding to start_pfn=0x800000000 with 35
>>>> trailing zeros) can produce a shift larger than what
>>>> memremap_pages() accepts, triggering a WARN and returning -EINVAL:
>>>>
>>>>     WARNING: ... memremap_pages+0x512/0x650
>>>>     requested folio size unsupported
>>>>
>>>> The MAX_FOLIO_ORDER check was added by
>>>> commit 646b67d57589 ("mm/memremap: reject unreasonable folio/compound
>>>> page sizes in memremap_pages()").
>>>> When CONFIG_HAVE_GIGANTIC_FOLIOS=y, CONFIG_SPARSEMEM_VMEMMAP=y, and
>>>> CONFIG_HUGETLB_PAGE is not set, MAX_FOLIO_ORDER resolves to
>>>> (PUD_SHIFT - PAGE_SHIFT) = 18. Any range whose PFN alignment exceeds
>>>> order 18 hits this path.
>>>
>>> I'm not clear on what point you are making with this specific
>>> configuration that results in MAX_FOLIO_ORDER being 18. Is it just
>>> an example? Is 18 the largest expected value for MAX_FOLIO_ORDER?
>>> And note that PUD_SHIFT and PAGE_SHIFT might have different values
>>> on arm64 with a page size other than 4K.
>>>
>>
>> Yes, this was just an example. It is not generalized enough, I will
>> remove it.
>> MAX_FOLIO_ORDER could go beyond 18.
>>
>>>>
>>>> Fix this by clamping vmemmap_shift to MAX_FOLIO_ORDER so we always
>>>> request the largest order the kernel supports, rather than an
>>>> out-of-range value.
>>>>
>>>> Also fix the error path to propagate the actual error code from
>>>> devm_memremap_pages() instead of hard-coding -EFAULT, which was
>>>> masking the real -EINVAL return.
>>>>
>>>> Fixes: 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver")
>>>> Cc: <stable@vger.kernel.org>
>>>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>>>> ---
>>>>    drivers/hv/mshv_vtl_main.c | 8 ++++++--
>>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
>>>> index 5856975f32e12..255fed3a740c1 100644
>>>> --- a/drivers/hv/mshv_vtl_main.c
>>>> +++ b/drivers/hv/mshv_vtl_main.c
>>>> @@ -405,8 +405,12 @@ static int mshv_vtl_ioctl_add_vtl0_mem(struct mshv_vtl *vtl, void __user *arg)
>>>>    	/*
>>>>    	 * Determine the highest page order that can be used for the given memory range.
>>>>    	 * This works best when the range is aligned; i.e. both the start and the length.
>>>> +	 * Clamp to MAX_FOLIO_ORDER to avoid a WARN in memremap_pages() when the range
>>>> +	 * alignment exceeds the maximum supported folio order for this kernel config.
>>>>    	 */
>>>> -	pgmap->vmemmap_shift = count_trailing_zeros(vtl0_mem.start_pfn | vtl0_mem.last_pfn);
>>>> +	pgmap->vmemmap_shift = min_t(unsigned long,
>>>> +				     count_trailing_zeros(vtl0_mem.start_pfn | vtl0_mem.last_pfn),
>>>> +				     MAX_FOLIO_ORDER);
>>>
>>> Is it necessary to use min_t() here, or would min() work?  Neither count_trailing_zeros()
>>> nor MAX_FOLIO_ORDER is ever negative, so it seems like just min() would work with
>>> no potential for doing a bogus comparison or assignment.
>>>
>>
>> min could work, yes. I just felt min_t is more safer for comparing these
>> two different types of values -
>> count_trailing_zeroes being 'int'
>> MAX_FOLIO_ORDER being a macro, calculated by applying bit operations.
>>
>> and destination being unsigned int.
>>
>>
>> include/linux/mmzone.h:#define MAX_FOLIO_ORDER          MAX_PAGE_ORDER
>> include/linux/mmzone.h:#define MAX_FOLIO_ORDER          PFN_SECTION_SHIFT
>> include/linux/mmzone.h:#define MAX_FOLIO_ORDER          (ilog2(SZ_16G) - PAGE_SHIFT)
>> include/linux/mmzone.h:#define MAX_FOLIO_ORDER          (ilog2(SZ_1G) - PAGE_SHIFT)
>> include/linux/mmzone.h:#define MAX_FOLIO_ORDER          (PUD_SHIFT - PAGE_SHIFT)
>>
>> I am fine with anything you suggest here.
> 
> There's a fair number of patches on LKML that are replacing min_t() with
> min().  At some point in the not-too-distant past, the implementation of
> min() was improved to deal with different but compatible integer types.
> My sense is that min() is the better choice for general integer comparisons,
> particularly when the values are known to be non-negative.
> 
>>
>>> The shift is calculated using the originally passed in start_pfn and last_pfn, while the
>>> "range" struct in pgmap has an "end" value that is one page less. So is the idea to
>>> go ahead and create the mapping with folios of a size that includes that last page,
>>> and then just waste the last page of the last folio?
>>
>> No, waste does not occur. The vmemmap_shift determines the folio order,
>> and memmap_init_zone_device() walks the range [start_pfn, last_pfn) in
>> steps of (1 << vmemmap_shift) pages, creating one folio per step. The OR
>> operation ensures both boundaries are aligned to multiples of (1 <<
>> vmemmap_shift), guaranteeing the range divides evenly into folios with
>> no partial folio at the end.
>> The intention is to find the highest folio order possible here, and if
>> it reaches the MAX_FOLIO_ORDER, restrict vmemmap_shift to it.
> 
> OK, I figured out what is confusing me. I had a misunderstanding
> when I reviewed this code during its original submission, and that
> misunderstanding has influenced my (incorrect) review of this change.
> 
> The struct mshv_vtl_ram_disposition that is passed from user space has
> two fields: start_pfn and last_pfn. But last_pfn is somewhat misnamed
> in my view. For example, an aligned 2 MiB of memory would consist of
> 512 PFNs. If the first PFN is 0x200, the last PFN would be 0x3FF.  But in
> the semantics of the struct, the last_pfn field should be 0x400.
> 
> In response to my comments in the original review, you added the comment
> about last_pfn being excluded in the pagemap range, which is true. But it's
> not because that page is somehow reserved or being wasted. It's because
> the range is being described by specifying the PFN *after* the last PFN.
> 
> With the start_pfn and last_pfn fields used to determine the highest
> page order that can be used, the slightly unorthodox semantics of
> last_pfn make that calculation easy. But then you must subtract 1
> from last_pfn when setting the range start and end for
> devm_memremap_pages() to use. And the code does that, so the code
> is all correct. The comment might be improved to speak about the
> semantics of the last_pfn field, not that a page of memory is
> intentionally being excluded/wasted.  And/or maybe the struct
> mshv_vtl_ram_disposition definition should get a comment to clarify
> the semantics of last_pfn.
> 
> Michael

Right, last_pfn could be interpreted as actual last pfn. I'll add the 
comment to avoid the confusion.

Regards,
Naman

     prev parent reply	other threads:[~2026-04-06  4:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01  5:40 [PATCH] Drivers: hv: mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER Naman Jain
2026-04-03  3:35 ` Michael Kelley
2026-04-03  7:25   ` Naman Jain
2026-04-03 18:37     ` Michael Kelley
2026-04-06  4:56       ` Naman Jain [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a0d271e3-ece8-45cf-9dbb-ced773d6f3f8@linux.microsoft.com \
    --to=namjain@linux.microsoft.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=mhklinux@outlook.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox