From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C9B982C028F; Mon, 6 Apr 2026 04:56:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775451389; cv=none; b=El4FLEy1w6QXHST6y9401p+VZrtUWEg1/p7LpZj3VTGq216v4TtAN/YrlMCbj3ja1gql8tK+/Vn/Hbtfjat6W0P00ozgYlQJv9d58beSVeHxyo9tR2oavXfWM01c5MzjOdQPBz5q09ZDUIhiJ2lWRi37AGZCTinPn/zcDkftarM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775451389; c=relaxed/simple; bh=US0YUIfvKyYpmy+ZIYPeTZPeD+D4hv/mQX+OAcusRm4=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=nG8pOcSfNATM2GJ47xzwFVDNnP6yx0VNKjNitzk1+wDGUYERc/MIufyqpLcslIaPZekmaDPSYD2Uly0PqVl9Xf66fQ7JoEx131rQOHuDs42l77yl+E+9JToNGA7mBXetXbs33YFnBniZu58v6d6tI2uPZPgQmZaUJRVS80YfF+w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=RL3aKg4q; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="RL3aKg4q" Received: from [10.95.65.110] (unknown [167.220.238.206]) by linux.microsoft.com (Postfix) with ESMTPSA id D47FC20B710C; Sun, 5 Apr 2026 21:56:24 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com D47FC20B710C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1775451387; bh=b4L6SDsrLUvkYdRwOs4OK3+sQ/w7VLyF1zO5/2t+B7g=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=RL3aKg4qaCcePuRqqxZN2kUJVxEXGlrzCq1YTaDU2liGLOvyxSBVXcf2oI4RxxaUw mQ+QNoPtjrpCvu7hfl/0U8x23d+BuYYBDCtZpJ2mk6x5csojB+MLCOSF1/gRHD/Tpd We+CfSpnh2fKKFw/jVM8zKQWofxQQT/P02EA+8iU= Message-ID: Date: Mon, 6 Apr 2026 10:26:20 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] Drivers: hv: mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER To: Michael Kelley , "K . Y . Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Long Li Cc: Saurabh Sengar , "linux-hyperv@vger.kernel.org" , "linux-kernel@vger.kernel.org" References: <20260401054005.1532381-1-namjain@linux.microsoft.com> <9f0e27ac-099b-4b7d-b082-f43245331bbc@linux.microsoft.com> Content-Language: en-US From: Naman Jain In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 4/4/2026 12:07 AM, Michael Kelley wrote: > From: Naman Jain Sent: Friday, April 3, 2026 12:25 AM >> > > Nit: I wonder what's the best prefix to use in the patch Subject field. > "Drivers: hv: mshv_vtl:" is rather long. There was agreement to use > just "mshv:" for the root partition code, and I probably misused that > in commit 754cf84504ea. How about just "mshv_vtl:" as the prefix for this > patch and other VTL patches going forward? > If "mshv:" is OK for the root partition code, "mshv_vtl:" should be good for this. I'll change this subject, and for other patches in the future. >> On 4/3/2026 9:05 AM, Michael Kelley wrote: >>> From: Naman Jain Sent: Tuesday, March 31, 2026 10:40 PM >>>> >>>> When registering VTL0 memory via MSHV_ADD_VTL0_MEMORY, the kernel >>>> computes pgmap->vmemmap_shift as the number of trailing zeros in the >>>> OR of start_pfn and last_pfn, intending to use the largest compound >>>> page order both endpoints are aligned to. >>>> >>>> However, this value is not clamped to MAX_FOLIO_ORDER, so a >>>> sufficiently aligned range (e.g. physical range 0x800000000000- >>>> 0x800080000000, corresponding to start_pfn=0x800000000 with 35 >>>> trailing zeros) can produce a shift larger than what >>>> memremap_pages() accepts, triggering a WARN and returning -EINVAL: >>>> >>>> WARNING: ... memremap_pages+0x512/0x650 >>>> requested folio size unsupported >>>> >>>> The MAX_FOLIO_ORDER check was added by >>>> commit 646b67d57589 ("mm/memremap: reject unreasonable folio/compound >>>> page sizes in memremap_pages()"). >>>> When CONFIG_HAVE_GIGANTIC_FOLIOS=y, CONFIG_SPARSEMEM_VMEMMAP=y, and >>>> CONFIG_HUGETLB_PAGE is not set, MAX_FOLIO_ORDER resolves to >>>> (PUD_SHIFT - PAGE_SHIFT) = 18. Any range whose PFN alignment exceeds >>>> order 18 hits this path. >>> >>> I'm not clear on what point you are making with this specific >>> configuration that results in MAX_FOLIO_ORDER being 18. Is it just >>> an example? Is 18 the largest expected value for MAX_FOLIO_ORDER? >>> And note that PUD_SHIFT and PAGE_SHIFT might have different values >>> on arm64 with a page size other than 4K. >>> >> >> Yes, this was just an example. It is not generalized enough, I will >> remove it. >> MAX_FOLIO_ORDER could go beyond 18. >> >>>> >>>> Fix this by clamping vmemmap_shift to MAX_FOLIO_ORDER so we always >>>> request the largest order the kernel supports, rather than an >>>> out-of-range value. >>>> >>>> Also fix the error path to propagate the actual error code from >>>> devm_memremap_pages() instead of hard-coding -EFAULT, which was >>>> masking the real -EINVAL return. >>>> >>>> Fixes: 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver") >>>> Cc: >>>> Signed-off-by: Naman Jain >>>> --- >>>> drivers/hv/mshv_vtl_main.c | 8 ++++++-- >>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c >>>> index 5856975f32e12..255fed3a740c1 100644 >>>> --- a/drivers/hv/mshv_vtl_main.c >>>> +++ b/drivers/hv/mshv_vtl_main.c >>>> @@ -405,8 +405,12 @@ static int mshv_vtl_ioctl_add_vtl0_mem(struct mshv_vtl *vtl, void __user *arg) >>>> /* >>>> * Determine the highest page order that can be used for the given memory range. >>>> * This works best when the range is aligned; i.e. both the start and the length. >>>> + * Clamp to MAX_FOLIO_ORDER to avoid a WARN in memremap_pages() when the range >>>> + * alignment exceeds the maximum supported folio order for this kernel config. >>>> */ >>>> - pgmap->vmemmap_shift = count_trailing_zeros(vtl0_mem.start_pfn | vtl0_mem.last_pfn); >>>> + pgmap->vmemmap_shift = min_t(unsigned long, >>>> + count_trailing_zeros(vtl0_mem.start_pfn | vtl0_mem.last_pfn), >>>> + MAX_FOLIO_ORDER); >>> >>> Is it necessary to use min_t() here, or would min() work? Neither count_trailing_zeros() >>> nor MAX_FOLIO_ORDER is ever negative, so it seems like just min() would work with >>> no potential for doing a bogus comparison or assignment. >>> >> >> min could work, yes. I just felt min_t is more safer for comparing these >> two different types of values - >> count_trailing_zeroes being 'int' >> MAX_FOLIO_ORDER being a macro, calculated by applying bit operations. >> >> and destination being unsigned int. >> >> >> include/linux/mmzone.h:#define MAX_FOLIO_ORDER MAX_PAGE_ORDER >> include/linux/mmzone.h:#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT >> include/linux/mmzone.h:#define MAX_FOLIO_ORDER (ilog2(SZ_16G) - PAGE_SHIFT) >> include/linux/mmzone.h:#define MAX_FOLIO_ORDER (ilog2(SZ_1G) - PAGE_SHIFT) >> include/linux/mmzone.h:#define MAX_FOLIO_ORDER (PUD_SHIFT - PAGE_SHIFT) >> >> I am fine with anything you suggest here. > > There's a fair number of patches on LKML that are replacing min_t() with > min(). At some point in the not-too-distant past, the implementation of > min() was improved to deal with different but compatible integer types. > My sense is that min() is the better choice for general integer comparisons, > particularly when the values are known to be non-negative. > >> >>> The shift is calculated using the originally passed in start_pfn and last_pfn, while the >>> "range" struct in pgmap has an "end" value that is one page less. So is the idea to >>> go ahead and create the mapping with folios of a size that includes that last page, >>> and then just waste the last page of the last folio? >> >> No, waste does not occur. The vmemmap_shift determines the folio order, >> and memmap_init_zone_device() walks the range [start_pfn, last_pfn) in >> steps of (1 << vmemmap_shift) pages, creating one folio per step. The OR >> operation ensures both boundaries are aligned to multiples of (1 << >> vmemmap_shift), guaranteeing the range divides evenly into folios with >> no partial folio at the end. >> The intention is to find the highest folio order possible here, and if >> it reaches the MAX_FOLIO_ORDER, restrict vmemmap_shift to it. > > OK, I figured out what is confusing me. I had a misunderstanding > when I reviewed this code during its original submission, and that > misunderstanding has influenced my (incorrect) review of this change. > > The struct mshv_vtl_ram_disposition that is passed from user space has > two fields: start_pfn and last_pfn. But last_pfn is somewhat misnamed > in my view. For example, an aligned 2 MiB of memory would consist of > 512 PFNs. If the first PFN is 0x200, the last PFN would be 0x3FF. But in > the semantics of the struct, the last_pfn field should be 0x400. > > In response to my comments in the original review, you added the comment > about last_pfn being excluded in the pagemap range, which is true. But it's > not because that page is somehow reserved or being wasted. It's because > the range is being described by specifying the PFN *after* the last PFN. > > With the start_pfn and last_pfn fields used to determine the highest > page order that can be used, the slightly unorthodox semantics of > last_pfn make that calculation easy. But then you must subtract 1 > from last_pfn when setting the range start and end for > devm_memremap_pages() to use. And the code does that, so the code > is all correct. The comment might be improved to speak about the > semantics of the last_pfn field, not that a page of memory is > intentionally being excluded/wasted. And/or maybe the struct > mshv_vtl_ram_disposition definition should get a comment to clarify > the semantics of last_pfn. > > Michael Right, last_pfn could be interpreted as actual last pfn. I'll add the comment to avoid the confusion. Regards, Naman