All of lore.kernel.org
 help / color / mirror / Atom feed
From: Naman Jain <namjain@linux.microsoft.com>
To: Michael Kelley <mhklinux@outlook.com>,
	"K . Y . Srinivasan" <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
	Long Li <longli@microsoft.com>
Cc: Saurabh Sengar <ssengar@linux.microsoft.com>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] Drivers: hv: mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER
Date: Mon, 6 Apr 2026 10:26:20 +0530	[thread overview]
Message-ID: <a0d271e3-ece8-45cf-9dbb-ced773d6f3f8@linux.microsoft.com> (raw)
In-Reply-To: <SN6PR02MB4157550DA8F143F4DAFBEEE4D45EA@SN6PR02MB4157.namprd02.prod.outlook.com>



On 4/4/2026 12:07 AM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Friday, April 3, 2026 12:25 AM
>>
> 
> Nit: I wonder what's the best prefix to use in the patch Subject field.
> "Drivers: hv: mshv_vtl:" is rather long.  There was agreement to use
> just "mshv:" for the root partition code, and I probably misused that
> in commit 754cf84504ea. How about just "mshv_vtl:" as the prefix for this
> patch and other VTL patches going forward?
> 

If "mshv:" is OK for the root partition code, "mshv_vtl:" should be good 
for this. I'll change this subject, and for other patches in the future.

>> On 4/3/2026 9:05 AM, Michael Kelley wrote:
>>> From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, March 31, 2026 10:40 PM
>>>>
>>>> When registering VTL0 memory via MSHV_ADD_VTL0_MEMORY, the kernel
>>>> computes pgmap->vmemmap_shift as the number of trailing zeros in the
>>>> OR of start_pfn and last_pfn, intending to use the largest compound
>>>> page order both endpoints are aligned to.
>>>>
>>>> However, this value is not clamped to MAX_FOLIO_ORDER, so a
>>>> sufficiently aligned range (e.g. physical range 0x800000000000-
>>>> 0x800080000000, corresponding to start_pfn=0x800000000 with 35
>>>> trailing zeros) can produce a shift larger than what
>>>> memremap_pages() accepts, triggering a WARN and returning -EINVAL:
>>>>
>>>>     WARNING: ... memremap_pages+0x512/0x650
>>>>     requested folio size unsupported
>>>>
>>>> The MAX_FOLIO_ORDER check was added by
>>>> commit 646b67d57589 ("mm/memremap: reject unreasonable folio/compound
>>>> page sizes in memremap_pages()").
>>>> When CONFIG_HAVE_GIGANTIC_FOLIOS=y, CONFIG_SPARSEMEM_VMEMMAP=y, and
>>>> CONFIG_HUGETLB_PAGE is not set, MAX_FOLIO_ORDER resolves to
>>>> (PUD_SHIFT - PAGE_SHIFT) = 18. Any range whose PFN alignment exceeds
>>>> order 18 hits this path.
>>>
>>> I'm not clear on what point you are making with this specific
>>> configuration that results in MAX_FOLIO_ORDER being 18. Is it just
>>> an example? Is 18 the largest expected value for MAX_FOLIO_ORDER?
>>> And note that PUD_SHIFT and PAGE_SHIFT might have different values
>>> on arm64 with a page size other than 4K.
>>>
>>
>> Yes, this was just an example. It is not generalized enough, I will
>> remove it.
>> MAX_FOLIO_ORDER could go beyond 18.
>>
>>>>
>>>> Fix this by clamping vmemmap_shift to MAX_FOLIO_ORDER so we always
>>>> request the largest order the kernel supports, rather than an
>>>> out-of-range value.
>>>>
>>>> Also fix the error path to propagate the actual error code from
>>>> devm_memremap_pages() instead of hard-coding -EFAULT, which was
>>>> masking the real -EINVAL return.
>>>>
>>>> Fixes: 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver")
>>>> Cc: <stable@vger.kernel.org>
>>>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>>>> ---
>>>>    drivers/hv/mshv_vtl_main.c | 8 ++++++--
>>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
>>>> index 5856975f32e12..255fed3a740c1 100644
>>>> --- a/drivers/hv/mshv_vtl_main.c
>>>> +++ b/drivers/hv/mshv_vtl_main.c
>>>> @@ -405,8 +405,12 @@ static int mshv_vtl_ioctl_add_vtl0_mem(struct mshv_vtl *vtl, void __user *arg)
>>>>    	/*
>>>>    	 * Determine the highest page order that can be used for the given memory range.
>>>>    	 * This works best when the range is aligned; i.e. both the start and the length.
>>>> +	 * Clamp to MAX_FOLIO_ORDER to avoid a WARN in memremap_pages() when the range
>>>> +	 * alignment exceeds the maximum supported folio order for this kernel config.
>>>>    	 */
>>>> -	pgmap->vmemmap_shift = count_trailing_zeros(vtl0_mem.start_pfn | vtl0_mem.last_pfn);
>>>> +	pgmap->vmemmap_shift = min_t(unsigned long,
>>>> +				     count_trailing_zeros(vtl0_mem.start_pfn | vtl0_mem.last_pfn),
>>>> +				     MAX_FOLIO_ORDER);
>>>
>>> Is it necessary to use min_t() here, or would min() work?  Neither count_trailing_zeros()
>>> nor MAX_FOLIO_ORDER is ever negative, so it seems like just min() would work with
>>> no potential for doing a bogus comparison or assignment.
>>>
>>
>> min could work, yes. I just felt min_t is more safer for comparing these
>> two different types of values -
>> count_trailing_zeroes being 'int'
>> MAX_FOLIO_ORDER being a macro, calculated by applying bit operations.
>>
>> and destination being unsigned int.
>>
>>
>> include/linux/mmzone.h:#define MAX_FOLIO_ORDER          MAX_PAGE_ORDER
>> include/linux/mmzone.h:#define MAX_FOLIO_ORDER          PFN_SECTION_SHIFT
>> include/linux/mmzone.h:#define MAX_FOLIO_ORDER          (ilog2(SZ_16G) - PAGE_SHIFT)
>> include/linux/mmzone.h:#define MAX_FOLIO_ORDER          (ilog2(SZ_1G) - PAGE_SHIFT)
>> include/linux/mmzone.h:#define MAX_FOLIO_ORDER          (PUD_SHIFT - PAGE_SHIFT)
>>
>> I am fine with anything you suggest here.
> 
> There's a fair number of patches on LKML that are replacing min_t() with
> min().  At some point in the not-too-distant past, the implementation of
> min() was improved to deal with different but compatible integer types.
> My sense is that min() is the better choice for general integer comparisons,
> particularly when the values are known to be non-negative.
> 
>>
>>> The shift is calculated using the originally passed in start_pfn and last_pfn, while the
>>> "range" struct in pgmap has an "end" value that is one page less. So is the idea to
>>> go ahead and create the mapping with folios of a size that includes that last page,
>>> and then just waste the last page of the last folio?
>>
>> No, waste does not occur. The vmemmap_shift determines the folio order,
>> and memmap_init_zone_device() walks the range [start_pfn, last_pfn) in
>> steps of (1 << vmemmap_shift) pages, creating one folio per step. The OR
>> operation ensures both boundaries are aligned to multiples of (1 <<
>> vmemmap_shift), guaranteeing the range divides evenly into folios with
>> no partial folio at the end.
>> The intention is to find the highest folio order possible here, and if
>> it reaches the MAX_FOLIO_ORDER, restrict vmemmap_shift to it.
> 
> OK, I figured out what is confusing me. I had a misunderstanding
> when I reviewed this code during its original submission, and that
> misunderstanding has influenced my (incorrect) review of this change.
> 
> The struct mshv_vtl_ram_disposition that is passed from user space has
> two fields: start_pfn and last_pfn. But last_pfn is somewhat misnamed
> in my view. For example, an aligned 2 MiB of memory would consist of
> 512 PFNs. If the first PFN is 0x200, the last PFN would be 0x3FF.  But in
> the semantics of the struct, the last_pfn field should be 0x400.
> 
> In response to my comments in the original review, you added the comment
> about last_pfn being excluded in the pagemap range, which is true. But it's
> not because that page is somehow reserved or being wasted. It's because
> the range is being described by specifying the PFN *after* the last PFN.
> 
> With the start_pfn and last_pfn fields used to determine the highest
> page order that can be used, the slightly unorthodox semantics of
> last_pfn make that calculation easy. But then you must subtract 1
> from last_pfn when setting the range start and end for
> devm_memremap_pages() to use. And the code does that, so the code
> is all correct. The comment might be improved to speak about the
> semantics of the last_pfn field, not that a page of memory is
> intentionally being excluded/wasted.  And/or maybe the struct
> mshv_vtl_ram_disposition definition should get a comment to clarify
> the semantics of last_pfn.
> 
> Michael

Right, last_pfn could be interpreted as actual last pfn. I'll add the 
comment to avoid the confusion.

Regards,
Naman


      reply	other threads:[~2026-04-06  4:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01  5:40 [PATCH] Drivers: hv: mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER Naman Jain
2026-04-03  3:35 ` Michael Kelley
2026-04-03  7:25   ` Naman Jain
2026-04-03 18:37     ` Michael Kelley
2026-04-06  4:56       ` Naman Jain [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a0d271e3-ece8-45cf-9dbb-ced773d6f3f8@linux.microsoft.com \
    --to=namjain@linux.microsoft.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=mhklinux@outlook.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.