Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Naman Jain <namjain@linux.microsoft.com>
To: "Michael Kelley" <mhklinux@outlook.com>,
	"Thomas Weißschuh" <linux@weissschuh.net>
Cc: "stable@vger.kernel.org" <stable@vger.kernel.org>,
	Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
	"K . Y . Srinivasan" <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Long Li <longli@microsoft.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 6.12] Drivers: hv: Make the sysfs node size for the ring buffer dynamic
Date: Thu, 31 Jul 2025 21:13:27 +0530	[thread overview]
Message-ID: <4abf15ac-de18-48d4-9420-19d40f26fdd2@linux.microsoft.com> (raw)
In-Reply-To: <SN6PR02MB415792B00B021D4DB76A6014D425A@SN6PR02MB4157.namprd02.prod.outlook.com>



On 7/30/2025 1:15 AM, Michael Kelley wrote:
> From: Thomas Weißschuh <linux@weissschuh.net> Sent: Tuesday, July 29, 2025 11:47 AM
>>
>> On 2025-07-29 18:39:45+0000, Michael Kelley wrote:
>>> From: Naman Jain <namjain@linux.microsoft.com> Sent: Wednesday, July 23, 2025 12:02 AM
>>>>
>>>> The ring buffer size varies across VMBus channels. The size of sysfs
>>>> node for the ring buffer is currently hardcoded to 4 MB. Userspace
>>>> clients either use fstat() or hardcode this size for doing mmap().
>>>> To address this, make the sysfs node size dynamic to reflect the
>>>> actual ring buffer size for each channel. This will ensure that
>>>> fstat() on ring sysfs node always returns the correct size of
>>>> ring buffer.
>>>>
>>>> This is a backport of the upstream commit
>>>> 65995e97a1ca ("Drivers: hv: Make the sysfs node size for the ring buffer dynamic")
>>>> with modifications, as the original patch has missing dependencies on
>>>> kernel v6.12.x. The structure "struct attribute_group" does not have
>>>> bin_size field in v6.12.x kernel so the logic of configuring size of
>>>> sysfs node for ring buffer has been moved to
>>>> vmbus_chan_bin_attr_is_visible().
>>>>
>>>> Original change was not a fix, but it needs to be backported to fix size
>>>> related discrepancy caused by the commit mentioned in Fixes tag.
>>>>
>>>> Fixes: bf1299797c3c ("uio_hv_generic: Align ring size to system page")
>>>> Cc: <stable@vger.kernel.org> # 6.12.x
>>>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>>>> ---
>>>>
>>>> This change won't apply on older kernels currently due to missing
>>>> dependencies. I will take care of them after this goes in.
>>>>
>>>> I did not retain any Reviewed-by or Tested-by tags, since the code has
>>>> changed completely, while the functionality remains same.
>>>> Requesting Michael, Dexuan, Wei to please review again.
>>>>
>>>> ---
>>>>   drivers/hv/vmbus_drv.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>>>> index 1f519e925f06..616e63fb2f15 100644
>>>> --- a/drivers/hv/vmbus_drv.c
>>>> +++ b/drivers/hv/vmbus_drv.c
>>>> @@ -1810,7 +1810,6 @@ static struct bin_attribute chan_attr_ring_buffer = {
>>>>   		.name = "ring",
>>>>   		.mode = 0600,
>>>>   	},
>>>> -	.size = 2 * SZ_2M,
>>>>   	.mmap = hv_mmap_ring_buffer_wrapper,
>>>>   };
>>>>   static struct attribute *vmbus_chan_attrs[] = {
>>>> @@ -1866,6 +1865,7 @@ static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj,
>>>>   	/* Hide ring attribute if channel's ring_sysfs_visible is set to false */
>>>>   	if (attr ==  &chan_attr_ring_buffer && !channel->ring_sysfs_visible)
>>>>   		return 0;
>>>> +	attr->size = channel->ringbuffer_pagecount << PAGE_SHIFT;
>>>
>>> Suppose a VM has two devices using UIO, such as DPDK network device with
>>> a 2MiB ring buffer, and an fcopy device with a 16KiB ring buffer. Both devices
>>> will be referencing the same static instance of chan_attr_ring_buffer, and the
>>> .size field it contains. The above statement will change that .size field
>>> between 2MiB and 16KiB as the /sys entries are initially populated, and as
>>> the visibility is changed if the devices are removed and re-instantiated (which
>>> is much more likely for fcopy than for netvsc). That changing of the .size
>>> value will probably work most of the time, but it's racy if two devices with
>>> different ring buffer sizes get instantiated or re-instantiated at the same time.
>>
>> IIRC it works out in practice. While the global attribute instance is indeed
>> modified back-and-forth the size from it will be *copied* into kernfs
>> after each recalculation. So each attribute should get its own correct size.
> 
> The race I see is in fs/sysfs/group.c in the create_files() function. It calls the
> is_bin_visible() function, which this patch uses to set the .size field of the static
> attribute. Then creates_files() calls sysfs_add_bin_file_mode_ns(), which reads
> the .size field and uses it to create the sysfs entry. But if create_files() is called
> in parallel on two different kobjs of the same type, but with different values
> for the .size field, the second create_files() could overwrite the static .size
> field after the first create_files() has set it, but before it has used it. I don't
> see any global lock that would prevent such, though maybe I'm missing
> something.
> 
>>
>>> Unfortunately, I don't see a fix, short of backporting support for the
>>> .bin_size function, as this is exactly the problem that function solves.
>>
>> It should work out in practice. (I introduced the .bin_size function)
> 
> The race I describe is unlikely, particularly if attribute groups are created
> once and then not disturbed. But note that the Hyper-V fcopy group can
> get updated in a running VM via update_sysfs_group(), which also calls
> create_files(). Such an update might marginally increase the potential for
> the race and for getting the wrong size. Still, I agree it should work out
> in practice.
> 
> Michael
> 
>>
>> Thomas


hi Thomas,
Would it be possible to port your changes on 6.12 kernel, to avoid such
race conditions? Or if it has a lot of dependencies, or if you have a
follow-up advice, please let us us know.

Thanks,
Naman


  reply	other threads:[~2025-07-31 15:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-23  7:02 [PATCH 6.12] Drivers: hv: Make the sysfs node size for the ring buffer dynamic Naman Jain
2025-07-29 18:39 ` Michael Kelley
2025-07-29 18:46   ` Thomas Weißschuh
2025-07-29 19:45     ` Michael Kelley
2025-07-31 15:43       ` Naman Jain [this message]
2025-08-01 10:52         ` Thomas Weißschuh
2025-08-04  5:49           ` Naman Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4abf15ac-de18-48d4-9420-19d40f26fdd2@linux.microsoft.com \
    --to=namjain@linux.microsoft.com \
    --cc=decui@microsoft.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@weissschuh.net \
    --cc=longli@microsoft.com \
    --cc=mhklinux@outlook.com \
    --cc=stable@vger.kernel.org \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox