* [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
@ 2025-02-25 5:20 Naman Jain
2025-02-25 6:12 ` Greg Kroah-Hartman
0 siblings, 1 reply; 12+ messages in thread
From: Naman Jain @ 2025-02-25 5:20 UTC (permalink / raw)
To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Greg Kroah-Hartman, Stephen Hemminger
Cc: linux-hyperv, linux-kernel, stable, Saurabh Sengar,
Michael Kelley, Naman Jain
On regular bootup, devices get registered to vmbus first, so when
uio_hv_generic driver for a particular device type is probed,
the device is already initialized and added, so sysfs creation in
uio_hv_generic probe works fine. However, when device is removed
and brought back, the channel rescinds and device again gets
registered to vmbus. However this time, the uio_hv_generic driver is
already registered to probe for that device and in this case sysfs
creation is tried before the device gets initialized completely.
Fix this by moving the core logic of sysfs creation for ring buffer,
from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
attributes for the channels are defined. While doing that, make use
of attribute groups and macros, instead of creating sysfs directly,
to ensure better error handling and code flow.
Problem path:
vmbus_device_register
device_register
uio_hv_generic probe
sysfs_create_bin_file (fails here)
kset_create_and_add (dependency)
vmbus_add_channel_kobj (dependency)
Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel")
Cc: stable@kernel.org
Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Suggested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
---
Hi,
This is the first patch after initial RFC was posted.
https://lore.kernel.org/all/20250214064351.8994-1-namjain@linux.microsoft.com/
Changes since RFC patch:
* Different approach to solve the problem is proposed (credits to
Michael Kelley).
* Core logic for sysfs creation moved out of uio_hv_generic, to VMBus
drivers where rest of the sysfs attributes for a VMBus channel
are defined. (addressed Greg's comments)
* Used attribute groups instead of sysfs_create* functions, and bundled
ring attribute with other attributes for the channel sysfs.
Error logs:
[ 35.574120] ------------[ cut here ]------------
[ 35.574122] WARNING: CPU: 0 PID: 10 at fs/sysfs/file.c:591 sysfs_create_bin_file+0x81/0x90
[ 35.574168] Workqueue: hv_pri_chan vmbus_add_channel_work
[ 35.574172] RIP: 0010:sysfs_create_bin_file+0x81/0x90
[ 35.574197] Call Trace:
[ 35.574199] <TASK>
[ 35.574200] ? show_regs+0x69/0x80
[ 35.574217] ? __warn+0x8d/0x130
[ 35.574220] ? sysfs_create_bin_file+0x81/0x90
[ 35.574222] ? report_bug+0x182/0x190
[ 35.574225] ? handle_bug+0x5b/0x90
[ 35.574244] ? exc_invalid_op+0x19/0x70
[ 35.574247] ? asm_exc_invalid_op+0x1b/0x20
[ 35.574252] ? sysfs_create_bin_file+0x81/0x90
[ 35.574255] hv_uio_probe+0x1e7/0x410 [uio_hv_generic]
[ 35.574271] vmbus_probe+0x3b/0x90
[ 35.574275] really_probe+0xf4/0x3b0
[ 35.574279] __driver_probe_device+0x8a/0x170
[ 35.574282] driver_probe_device+0x23/0xc0
[ 35.574285] __device_attach_driver+0xb5/0x140
[ 35.574288] ? __pfx___device_attach_driver+0x10/0x10
[ 35.574291] bus_for_each_drv+0x86/0xe0
[ 35.574294] __device_attach+0xc1/0x200
[ 35.574297] device_initial_probe+0x13/0x20
[ 35.574315] bus_probe_device+0x99/0xa0
[ 35.574318] device_add+0x647/0x870
[ 35.574320] ? hrtimer_init+0x28/0x70
[ 35.574323] device_register+0x1b/0x30
[ 35.574326] vmbus_device_register+0x83/0x130
[ 35.574328] vmbus_add_channel_work+0x135/0x1a0
[ 35.574331] process_one_work+0x177/0x340
[ 35.574348] worker_thread+0x2b2/0x3c0
[ 35.574350] kthread+0xe3/0x1f0
[ 35.574353] ? __pfx_worker_thread+0x10/0x10
[ 35.574356] ? __pfx_kthread+0x10/0x10
---
drivers/hv/hyperv_vmbus.h | 4 +++
drivers/hv/vmbus_drv.c | 62 ++++++++++++++++++++++++++++++++++++
drivers/uio/uio_hv_generic.c | 34 ++------------------
include/linux/hyperv.h | 3 ++
4 files changed, 72 insertions(+), 31 deletions(-)
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 29780f3a7478..e0c7b75e6c7a 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -477,4 +477,8 @@ static inline int hv_debug_add_dev_dir(struct hv_device *dev)
#endif /* CONFIG_HYPERV_TESTING */
+/* Create and remove sysfs entry for memory mapped ring buffers for a channel */
+int hv_create_ring_sysfs(struct vmbus_channel *channel);
+int hv_remove_ring_sysfs(struct vmbus_channel *channel);
+
#endif /* _HYPERV_VMBUS_H */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 22afebfc28ff..0110643bad3f 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1802,6 +1802,39 @@ static ssize_t subchannel_id_show(struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR_RO(subchannel_id);
+/* Functions to create sysfs interface to allow mmap of the ring buffers.
+ * The ring buffer is allocated as contiguous memory by vmbus_open
+ */
+static int hv_mmap_ring_buffer(struct vmbus_channel *channel, struct vm_area_struct *vma)
+{
+ void *ring_buffer = page_address(channel->ringbuffer_page);
+
+ if (channel->state != CHANNEL_OPENED_STATE)
+ return -ENODEV;
+
+ return vm_iomap_memory(vma, virt_to_phys(ring_buffer),
+ channel->ringbuffer_pagecount << PAGE_SHIFT);
+}
+
+static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
+ const struct bin_attribute *attr,
+ struct vm_area_struct *vma)
+{
+ struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
+
+ if (!channel->mmap_ring_buffer)
+ return -ENODEV;
+ return channel->mmap_ring_buffer(channel, vma);
+}
+
+static struct bin_attribute chan_attr_ring_buffer = {
+ .attr = {
+ .name = "ring",
+ .mode = 0600,
+ },
+ .size = 2 * SZ_2M,
+ .mmap = hv_mmap_ring_buffer_wrapper,
+};
static struct attribute *vmbus_chan_attrs[] = {
&chan_attr_out_mask.attr,
&chan_attr_in_mask.attr,
@@ -1818,6 +1851,7 @@ static struct attribute *vmbus_chan_attrs[] = {
&chan_attr_out_full_total.attr,
&chan_attr_monitor_id.attr,
&chan_attr_subchannel_id.attr,
+ &chan_attr_ring_buffer.attr,
NULL
};
@@ -1838,6 +1872,10 @@ static umode_t vmbus_chan_attr_is_visible(struct kobject *kobj,
attr == &chan_attr_monitor_id.attr))
return 0;
+ /* Hide ring attribute if channel's mmap_ring_buffer function is not yet initialised */
+ if (attr == &chan_attr_ring_buffer.attr && !channel->mmap_ring_buffer)
+ return 0;
+
return attr->mode;
}
@@ -1851,6 +1889,30 @@ static const struct kobj_type vmbus_chan_ktype = {
.release = vmbus_chan_release,
};
+/*
+ * hv_create_ring_sysfs - create ring sysfs entry corresponding to ring buffers for a channel
+ */
+int hv_create_ring_sysfs(struct vmbus_channel *channel)
+{
+ struct kobject *kobj = &channel->kobj;
+
+ channel->mmap_ring_buffer = hv_mmap_ring_buffer;
+ return sysfs_update_group(kobj, &vmbus_chan_group);
+}
+EXPORT_SYMBOL_GPL(hv_create_ring_sysfs);
+
+/*
+ * hv_remove_ring_sysfs - remove ring sysfs entry corresponding to ring buffers for a channel
+ */
+int hv_remove_ring_sysfs(struct vmbus_channel *channel)
+{
+ struct kobject *kobj = &channel->kobj;
+
+ channel->mmap_ring_buffer = NULL;
+ return sysfs_update_group(kobj, &vmbus_chan_group);
+}
+EXPORT_SYMBOL_GPL(hv_remove_ring_sysfs);
+
/*
* vmbus_add_channel_kobj - setup a sub-directory under device/channels
*/
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 1b19b5647495..c120259ee1b1 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -131,33 +131,6 @@ static void hv_uio_rescind(struct vmbus_channel *channel)
vmbus_device_unregister(channel->device_obj);
}
-/* Sysfs API to allow mmap of the ring buffers
- * The ring buffer is allocated as contiguous memory by vmbus_open
- */
-static int hv_uio_ring_mmap(struct file *filp, struct kobject *kobj,
- const struct bin_attribute *attr,
- struct vm_area_struct *vma)
-{
- struct vmbus_channel *channel
- = container_of(kobj, struct vmbus_channel, kobj);
- void *ring_buffer = page_address(channel->ringbuffer_page);
-
- if (channel->state != CHANNEL_OPENED_STATE)
- return -ENODEV;
-
- return vm_iomap_memory(vma, virt_to_phys(ring_buffer),
- channel->ringbuffer_pagecount << PAGE_SHIFT);
-}
-
-static const struct bin_attribute ring_buffer_bin_attr = {
- .attr = {
- .name = "ring",
- .mode = 0600,
- },
- .size = 2 * SZ_2M,
- .mmap = hv_uio_ring_mmap,
-};
-
/* Callback from VMBUS subsystem when new channel created. */
static void
hv_uio_new_channel(struct vmbus_channel *new_sc)
@@ -178,8 +151,7 @@ hv_uio_new_channel(struct vmbus_channel *new_sc)
/* Disable interrupts on sub channel */
new_sc->inbound.ring_buffer->interrupt_mask = 1;
set_channel_read_mode(new_sc, HV_CALL_ISR);
-
- ret = sysfs_create_bin_file(&new_sc->kobj, &ring_buffer_bin_attr);
+ ret = hv_create_ring_sysfs(new_sc);
if (ret) {
dev_err(device, "sysfs create ring bin file failed; %d\n", ret);
vmbus_close(new_sc);
@@ -350,7 +322,7 @@ hv_uio_probe(struct hv_device *dev,
goto fail_close;
}
- ret = sysfs_create_bin_file(&channel->kobj, &ring_buffer_bin_attr);
+ ret = hv_create_ring_sysfs(channel);
if (ret)
dev_notice(&dev->device,
"sysfs create ring bin file failed; %d\n", ret);
@@ -375,7 +347,7 @@ hv_uio_remove(struct hv_device *dev)
if (!pdata)
return;
- sysfs_remove_bin_file(&dev->channel->kobj, &ring_buffer_bin_attr);
+ hv_remove_ring_sysfs(dev->channel);
uio_unregister_device(&pdata->info);
hv_uio_cleanup(dev, pdata);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 7f4f8d8bdf43..26b7e7c38864 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1058,6 +1058,9 @@ struct vmbus_channel {
/* The max size of a packet on this channel */
u32 max_pkt_size;
+
+ /* function to mmap ring buffer memory to the channel's sysfs ring attribute */
+ int (*mmap_ring_buffer)(struct vmbus_channel *channel, struct vm_area_struct *vma);
};
#define lock_requestor(channel, flags) \
base-commit: 4ef7aa6a88280d015c6533a210e46efd55bdb57f
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
2025-02-25 5:20 [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer Naman Jain
@ 2025-02-25 6:12 ` Greg Kroah-Hartman
2025-02-25 8:34 ` Naman Jain
0 siblings, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2025-02-25 6:12 UTC (permalink / raw)
To: Naman Jain
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Stephen Hemminger, linux-hyperv, linux-kernel, stable,
Saurabh Sengar, Michael Kelley
On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
> On regular bootup, devices get registered to vmbus first, so when
> uio_hv_generic driver for a particular device type is probed,
> the device is already initialized and added, so sysfs creation in
> uio_hv_generic probe works fine. However, when device is removed
> and brought back, the channel rescinds and device again gets
> registered to vmbus. However this time, the uio_hv_generic driver is
> already registered to probe for that device and in this case sysfs
> creation is tried before the device gets initialized completely.
>
> Fix this by moving the core logic of sysfs creation for ring buffer,
> from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
> attributes for the channels are defined. While doing that, make use
> of attribute groups and macros, instead of creating sysfs directly,
> to ensure better error handling and code flow.
>
> Problem path:
> vmbus_device_register
> device_register
> uio_hv_generic probe
> sysfs_create_bin_file (fails here)
> kset_create_and_add (dependency)
> vmbus_add_channel_kobj (dependency)
>
> Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel")
> Cc: stable@kernel.org
> Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> Suggested-by: Michael Kelley <mhklinux@outlook.com>
> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> ---
> Hi,
> This is the first patch after initial RFC was posted.
> https://lore.kernel.org/all/20250214064351.8994-1-namjain@linux.microsoft.com/
>
> Changes since RFC patch:
> * Different approach to solve the problem is proposed (credits to
> Michael Kelley).
> * Core logic for sysfs creation moved out of uio_hv_generic, to VMBus
> drivers where rest of the sysfs attributes for a VMBus channel
> are defined. (addressed Greg's comments)
> * Used attribute groups instead of sysfs_create* functions, and bundled
> ring attribute with other attributes for the channel sysfs.
>
> Error logs:
>
> [ 35.574120] ------------[ cut here ]------------
> [ 35.574122] WARNING: CPU: 0 PID: 10 at fs/sysfs/file.c:591 sysfs_create_bin_file+0x81/0x90
> [ 35.574168] Workqueue: hv_pri_chan vmbus_add_channel_work
> [ 35.574172] RIP: 0010:sysfs_create_bin_file+0x81/0x90
> [ 35.574197] Call Trace:
> [ 35.574199] <TASK>
> [ 35.574200] ? show_regs+0x69/0x80
> [ 35.574217] ? __warn+0x8d/0x130
> [ 35.574220] ? sysfs_create_bin_file+0x81/0x90
> [ 35.574222] ? report_bug+0x182/0x190
> [ 35.574225] ? handle_bug+0x5b/0x90
> [ 35.574244] ? exc_invalid_op+0x19/0x70
> [ 35.574247] ? asm_exc_invalid_op+0x1b/0x20
> [ 35.574252] ? sysfs_create_bin_file+0x81/0x90
> [ 35.574255] hv_uio_probe+0x1e7/0x410 [uio_hv_generic]
> [ 35.574271] vmbus_probe+0x3b/0x90
> [ 35.574275] really_probe+0xf4/0x3b0
> [ 35.574279] __driver_probe_device+0x8a/0x170
> [ 35.574282] driver_probe_device+0x23/0xc0
> [ 35.574285] __device_attach_driver+0xb5/0x140
> [ 35.574288] ? __pfx___device_attach_driver+0x10/0x10
> [ 35.574291] bus_for_each_drv+0x86/0xe0
> [ 35.574294] __device_attach+0xc1/0x200
> [ 35.574297] device_initial_probe+0x13/0x20
> [ 35.574315] bus_probe_device+0x99/0xa0
> [ 35.574318] device_add+0x647/0x870
> [ 35.574320] ? hrtimer_init+0x28/0x70
> [ 35.574323] device_register+0x1b/0x30
> [ 35.574326] vmbus_device_register+0x83/0x130
> [ 35.574328] vmbus_add_channel_work+0x135/0x1a0
> [ 35.574331] process_one_work+0x177/0x340
> [ 35.574348] worker_thread+0x2b2/0x3c0
> [ 35.574350] kthread+0xe3/0x1f0
> [ 35.574353] ? __pfx_worker_thread+0x10/0x10
> [ 35.574356] ? __pfx_kthread+0x10/0x10
>
> ---
> drivers/hv/hyperv_vmbus.h | 4 +++
> drivers/hv/vmbus_drv.c | 62 ++++++++++++++++++++++++++++++++++++
> drivers/uio/uio_hv_generic.c | 34 ++------------------
> include/linux/hyperv.h | 3 ++
> 4 files changed, 72 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 29780f3a7478..e0c7b75e6c7a 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -477,4 +477,8 @@ static inline int hv_debug_add_dev_dir(struct hv_device *dev)
>
> #endif /* CONFIG_HYPERV_TESTING */
>
> +/* Create and remove sysfs entry for memory mapped ring buffers for a channel */
> +int hv_create_ring_sysfs(struct vmbus_channel *channel);
> +int hv_remove_ring_sysfs(struct vmbus_channel *channel);
> +
> #endif /* _HYPERV_VMBUS_H */
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 22afebfc28ff..0110643bad3f 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1802,6 +1802,39 @@ static ssize_t subchannel_id_show(struct vmbus_channel *channel,
> }
> static VMBUS_CHAN_ATTR_RO(subchannel_id);
>
> +/* Functions to create sysfs interface to allow mmap of the ring buffers.
> + * The ring buffer is allocated as contiguous memory by vmbus_open
> + */
> +static int hv_mmap_ring_buffer(struct vmbus_channel *channel, struct vm_area_struct *vma)
> +{
> + void *ring_buffer = page_address(channel->ringbuffer_page);
> +
> + if (channel->state != CHANNEL_OPENED_STATE)
> + return -ENODEV;
> +
> + return vm_iomap_memory(vma, virt_to_phys(ring_buffer),
> + channel->ringbuffer_pagecount << PAGE_SHIFT);
> +}
> +
> +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
> + const struct bin_attribute *attr,
> + struct vm_area_struct *vma)
> +{
> + struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
> +
> + if (!channel->mmap_ring_buffer)
> + return -ENODEV;
> + return channel->mmap_ring_buffer(channel, vma);
What is preventing mmap_ring_buffer from being set to NULL right after
checking it and then calling it here? I see no locks here or where you
are assigning this variable at all, so what is preventing these types of
races?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
2025-02-25 6:12 ` Greg Kroah-Hartman
@ 2025-02-25 8:34 ` Naman Jain
2025-02-25 8:39 ` Greg Kroah-Hartman
0 siblings, 1 reply; 12+ messages in thread
From: Naman Jain @ 2025-02-25 8:34 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Stephen Hemminger, linux-hyperv, linux-kernel, stable,
Saurabh Sengar, Michael Kelley, Long Li
On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
> On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
>> On regular bootup, devices get registered to vmbus first, so when
>> uio_hv_generic driver for a particular device type is probed,
>> the device is already initialized and added, so sysfs creation in
>> uio_hv_generic probe works fine. However, when device is removed
>> and brought back, the channel rescinds and device again gets
>> registered to vmbus. However this time, the uio_hv_generic driver is
>> already registered to probe for that device and in this case sysfs
>> creation is tried before the device gets initialized completely.
>>
>> Fix this by moving the core logic of sysfs creation for ring buffer,
>> from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
>> attributes for the channels are defined. While doing that, make use
>> of attribute groups and macros, instead of creating sysfs directly,
>> to ensure better error handling and code flow.
>>
>> Problem path:
>> vmbus_device_register
>> device_register
>> uio_hv_generic probe
>> sysfs_create_bin_file (fails here)
>> kset_create_and_add (dependency)
>> vmbus_add_channel_kobj (dependency)
>>
>> Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel")
>> Cc: stable@kernel.org
>> Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com>
>> Suggested-by: Michael Kelley <mhklinux@outlook.com>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> ---
>> Hi,
>> This is the first patch after initial RFC was posted.
>> https://lore.kernel.org/all/20250214064351.8994-1-namjain@linux.microsoft.com/
>>
>> Changes since RFC patch:
>> * Different approach to solve the problem is proposed (credits to
>> Michael Kelley).
>> * Core logic for sysfs creation moved out of uio_hv_generic, to VMBus
>> drivers where rest of the sysfs attributes for a VMBus channel
>> are defined. (addressed Greg's comments)
>> * Used attribute groups instead of sysfs_create* functions, and bundled
>> ring attribute with other attributes for the channel sysfs.
>>
>> Error logs:
>>
>> [ 35.574120] ------------[ cut here ]------------
>> [ 35.574122] WARNING: CPU: 0 PID: 10 at fs/sysfs/file.c:591 sysfs_create_bin_file+0x81/0x90
>> [ 35.574168] Workqueue: hv_pri_chan vmbus_add_channel_work
>> [ 35.574172] RIP: 0010:sysfs_create_bin_file+0x81/0x90
>> [ 35.574197] Call Trace:
>> [ 35.574199] <TASK>
>> [ 35.574200] ? show_regs+0x69/0x80
>> [ 35.574217] ? __warn+0x8d/0x130
>> [ 35.574220] ? sysfs_create_bin_file+0x81/0x90
>> [ 35.574222] ? report_bug+0x182/0x190
>> [ 35.574225] ? handle_bug+0x5b/0x90
>> [ 35.574244] ? exc_invalid_op+0x19/0x70
>> [ 35.574247] ? asm_exc_invalid_op+0x1b/0x20
>> [ 35.574252] ? sysfs_create_bin_file+0x81/0x90
>> [ 35.574255] hv_uio_probe+0x1e7/0x410 [uio_hv_generic]
>> [ 35.574271] vmbus_probe+0x3b/0x90
>> [ 35.574275] really_probe+0xf4/0x3b0
>> [ 35.574279] __driver_probe_device+0x8a/0x170
>> [ 35.574282] driver_probe_device+0x23/0xc0
>> [ 35.574285] __device_attach_driver+0xb5/0x140
>> [ 35.574288] ? __pfx___device_attach_driver+0x10/0x10
>> [ 35.574291] bus_for_each_drv+0x86/0xe0
>> [ 35.574294] __device_attach+0xc1/0x200
>> [ 35.574297] device_initial_probe+0x13/0x20
>> [ 35.574315] bus_probe_device+0x99/0xa0
>> [ 35.574318] device_add+0x647/0x870
>> [ 35.574320] ? hrtimer_init+0x28/0x70
>> [ 35.574323] device_register+0x1b/0x30
>> [ 35.574326] vmbus_device_register+0x83/0x130
>> [ 35.574328] vmbus_add_channel_work+0x135/0x1a0
>> [ 35.574331] process_one_work+0x177/0x340
>> [ 35.574348] worker_thread+0x2b2/0x3c0
>> [ 35.574350] kthread+0xe3/0x1f0
>> [ 35.574353] ? __pfx_worker_thread+0x10/0x10
>> [ 35.574356] ? __pfx_kthread+0x10/0x10
>>
>> ---
>> drivers/hv/hyperv_vmbus.h | 4 +++
>> drivers/hv/vmbus_drv.c | 62 ++++++++++++++++++++++++++++++++++++
>> drivers/uio/uio_hv_generic.c | 34 ++------------------
>> include/linux/hyperv.h | 3 ++
>> 4 files changed, 72 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
>> index 29780f3a7478..e0c7b75e6c7a 100644
>> --- a/drivers/hv/hyperv_vmbus.h
>> +++ b/drivers/hv/hyperv_vmbus.h
>> @@ -477,4 +477,8 @@ static inline int hv_debug_add_dev_dir(struct hv_device *dev)
>>
>> #endif /* CONFIG_HYPERV_TESTING */
>>
>> +/* Create and remove sysfs entry for memory mapped ring buffers for a channel */
>> +int hv_create_ring_sysfs(struct vmbus_channel *channel);
>> +int hv_remove_ring_sysfs(struct vmbus_channel *channel);
>> +
>> #endif /* _HYPERV_VMBUS_H */
>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>> index 22afebfc28ff..0110643bad3f 100644
>> --- a/drivers/hv/vmbus_drv.c
>> +++ b/drivers/hv/vmbus_drv.c
>> @@ -1802,6 +1802,39 @@ static ssize_t subchannel_id_show(struct vmbus_channel *channel,
>> }
>> static VMBUS_CHAN_ATTR_RO(subchannel_id);
>>
>> +/* Functions to create sysfs interface to allow mmap of the ring buffers.
>> + * The ring buffer is allocated as contiguous memory by vmbus_open
>> + */
>> +static int hv_mmap_ring_buffer(struct vmbus_channel *channel, struct vm_area_struct *vma)
>> +{
>> + void *ring_buffer = page_address(channel->ringbuffer_page);
>> +
>> + if (channel->state != CHANNEL_OPENED_STATE)
>> + return -ENODEV;
>> +
>> + return vm_iomap_memory(vma, virt_to_phys(ring_buffer),
>> + channel->ringbuffer_pagecount << PAGE_SHIFT);
>> +}
>> +
>> +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
>> + const struct bin_attribute *attr,
>> + struct vm_area_struct *vma)
>> +{
>> + struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
>> +
>> + if (!channel->mmap_ring_buffer)
>> + return -ENODEV;
>> + return channel->mmap_ring_buffer(channel, vma);
>
> What is preventing mmap_ring_buffer from being set to NULL right after
> checking it and then calling it here? I see no locks here or where you
> are assigning this variable at all, so what is preventing these types of
> races?
>
> thanks,
>
> greg k-h
Thank you so much for reviewing.
I spent some time to understand if this race condition can happen and it
seems execution flow is pretty sequential, for a particular channel of a
device.
Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL) can be
called in parallel to hv_uio_probe (which had set
channel->mmap_ring_buffer to non NULL), I doubt race can happen here.
Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
vmbus_device_register
device_register
hv_uio_probe
hv_create_ring_sysfs (W to non NULL)
sysfs_update_group
vmbus_chan_attr_is_visible (R)
vmbus_add_channel_kobj
sysfs_create_group
vmbus_chan_attr_is_visible (R)
hv_mmap_ring_buffer_wrapper (critical section)
hv_uio_remove
hv_remove_ring_sysfs (W to NULL)
sysfs_update_group
vmbus_chan_attr_is_visible (R)
First probe:
hv_uio_probe
hv_create_ring_sysfs (W to non NULL)
sysfs_update_group
vmbus_chan_attr_is_visible (R)
hv_mmap_ring_buffer_wrapper (critical section)
Regards,
Naman
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
2025-02-25 8:34 ` Naman Jain
@ 2025-02-25 8:39 ` Greg Kroah-Hartman
2025-02-26 5:13 ` Naman Jain
0 siblings, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2025-02-25 8:39 UTC (permalink / raw)
To: Naman Jain
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Stephen Hemminger, linux-hyperv, linux-kernel, stable,
Saurabh Sengar, Michael Kelley, Long Li
On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
>
>
> On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
> > On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
> > > On regular bootup, devices get registered to vmbus first, so when
> > > uio_hv_generic driver for a particular device type is probed,
> > > the device is already initialized and added, so sysfs creation in
> > > uio_hv_generic probe works fine. However, when device is removed
> > > and brought back, the channel rescinds and device again gets
> > > registered to vmbus. However this time, the uio_hv_generic driver is
> > > already registered to probe for that device and in this case sysfs
> > > creation is tried before the device gets initialized completely.
> > >
> > > Fix this by moving the core logic of sysfs creation for ring buffer,
> > > from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
> > > attributes for the channels are defined. While doing that, make use
> > > of attribute groups and macros, instead of creating sysfs directly,
> > > to ensure better error handling and code flow.
> > >
> > > Problem path:
> > > vmbus_device_register
> > > device_register
> > > uio_hv_generic probe
> > > sysfs_create_bin_file (fails here)
> > > kset_create_and_add (dependency)
> > > vmbus_add_channel_kobj (dependency)
> > >
> > > Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel")
> > > Cc: stable@kernel.org
> > > Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> > > Suggested-by: Michael Kelley <mhklinux@outlook.com>
> > > Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> > > ---
> > > Hi,
> > > This is the first patch after initial RFC was posted.
> > > https://lore.kernel.org/all/20250214064351.8994-1-namjain@linux.microsoft.com/
> > >
> > > Changes since RFC patch:
> > > * Different approach to solve the problem is proposed (credits to
> > > Michael Kelley).
> > > * Core logic for sysfs creation moved out of uio_hv_generic, to VMBus
> > > drivers where rest of the sysfs attributes for a VMBus channel
> > > are defined. (addressed Greg's comments)
> > > * Used attribute groups instead of sysfs_create* functions, and bundled
> > > ring attribute with other attributes for the channel sysfs.
> > >
> > > Error logs:
> > >
> > > [ 35.574120] ------------[ cut here ]------------
> > > [ 35.574122] WARNING: CPU: 0 PID: 10 at fs/sysfs/file.c:591 sysfs_create_bin_file+0x81/0x90
> > > [ 35.574168] Workqueue: hv_pri_chan vmbus_add_channel_work
> > > [ 35.574172] RIP: 0010:sysfs_create_bin_file+0x81/0x90
> > > [ 35.574197] Call Trace:
> > > [ 35.574199] <TASK>
> > > [ 35.574200] ? show_regs+0x69/0x80
> > > [ 35.574217] ? __warn+0x8d/0x130
> > > [ 35.574220] ? sysfs_create_bin_file+0x81/0x90
> > > [ 35.574222] ? report_bug+0x182/0x190
> > > [ 35.574225] ? handle_bug+0x5b/0x90
> > > [ 35.574244] ? exc_invalid_op+0x19/0x70
> > > [ 35.574247] ? asm_exc_invalid_op+0x1b/0x20
> > > [ 35.574252] ? sysfs_create_bin_file+0x81/0x90
> > > [ 35.574255] hv_uio_probe+0x1e7/0x410 [uio_hv_generic]
> > > [ 35.574271] vmbus_probe+0x3b/0x90
> > > [ 35.574275] really_probe+0xf4/0x3b0
> > > [ 35.574279] __driver_probe_device+0x8a/0x170
> > > [ 35.574282] driver_probe_device+0x23/0xc0
> > > [ 35.574285] __device_attach_driver+0xb5/0x140
> > > [ 35.574288] ? __pfx___device_attach_driver+0x10/0x10
> > > [ 35.574291] bus_for_each_drv+0x86/0xe0
> > > [ 35.574294] __device_attach+0xc1/0x200
> > > [ 35.574297] device_initial_probe+0x13/0x20
> > > [ 35.574315] bus_probe_device+0x99/0xa0
> > > [ 35.574318] device_add+0x647/0x870
> > > [ 35.574320] ? hrtimer_init+0x28/0x70
> > > [ 35.574323] device_register+0x1b/0x30
> > > [ 35.574326] vmbus_device_register+0x83/0x130
> > > [ 35.574328] vmbus_add_channel_work+0x135/0x1a0
> > > [ 35.574331] process_one_work+0x177/0x340
> > > [ 35.574348] worker_thread+0x2b2/0x3c0
> > > [ 35.574350] kthread+0xe3/0x1f0
> > > [ 35.574353] ? __pfx_worker_thread+0x10/0x10
> > > [ 35.574356] ? __pfx_kthread+0x10/0x10
> > >
> > > ---
> > > drivers/hv/hyperv_vmbus.h | 4 +++
> > > drivers/hv/vmbus_drv.c | 62 ++++++++++++++++++++++++++++++++++++
> > > drivers/uio/uio_hv_generic.c | 34 ++------------------
> > > include/linux/hyperv.h | 3 ++
> > > 4 files changed, 72 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> > > index 29780f3a7478..e0c7b75e6c7a 100644
> > > --- a/drivers/hv/hyperv_vmbus.h
> > > +++ b/drivers/hv/hyperv_vmbus.h
> > > @@ -477,4 +477,8 @@ static inline int hv_debug_add_dev_dir(struct hv_device *dev)
> > > #endif /* CONFIG_HYPERV_TESTING */
> > > +/* Create and remove sysfs entry for memory mapped ring buffers for a channel */
> > > +int hv_create_ring_sysfs(struct vmbus_channel *channel);
> > > +int hv_remove_ring_sysfs(struct vmbus_channel *channel);
> > > +
> > > #endif /* _HYPERV_VMBUS_H */
> > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> > > index 22afebfc28ff..0110643bad3f 100644
> > > --- a/drivers/hv/vmbus_drv.c
> > > +++ b/drivers/hv/vmbus_drv.c
> > > @@ -1802,6 +1802,39 @@ static ssize_t subchannel_id_show(struct vmbus_channel *channel,
> > > }
> > > static VMBUS_CHAN_ATTR_RO(subchannel_id);
> > > +/* Functions to create sysfs interface to allow mmap of the ring buffers.
> > > + * The ring buffer is allocated as contiguous memory by vmbus_open
> > > + */
> > > +static int hv_mmap_ring_buffer(struct vmbus_channel *channel, struct vm_area_struct *vma)
> > > +{
> > > + void *ring_buffer = page_address(channel->ringbuffer_page);
> > > +
> > > + if (channel->state != CHANNEL_OPENED_STATE)
> > > + return -ENODEV;
> > > +
> > > + return vm_iomap_memory(vma, virt_to_phys(ring_buffer),
> > > + channel->ringbuffer_pagecount << PAGE_SHIFT);
> > > +}
> > > +
> > > +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
> > > + const struct bin_attribute *attr,
> > > + struct vm_area_struct *vma)
> > > +{
> > > + struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
> > > +
> > > + if (!channel->mmap_ring_buffer)
> > > + return -ENODEV;
> > > + return channel->mmap_ring_buffer(channel, vma);
> >
> > What is preventing mmap_ring_buffer from being set to NULL right after
> > checking it and then calling it here? I see no locks here or where you
> > are assigning this variable at all, so what is preventing these types of
> > races?
> >
> > thanks,
> >
> > greg k-h
>
> Thank you so much for reviewing.
> I spent some time to understand if this race condition can happen and it
> seems execution flow is pretty sequential, for a particular channel of a
> device.
>
> Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL) can be
> called in parallel to hv_uio_probe (which had set
> channel->mmap_ring_buffer to non NULL), I doubt race can happen here.
>
> Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
>
> vmbus_device_register
> device_register
> hv_uio_probe
> hv_create_ring_sysfs (W to non NULL)
> sysfs_update_group
> vmbus_chan_attr_is_visible (R)
> vmbus_add_channel_kobj
> sysfs_create_group
> vmbus_chan_attr_is_visible (R)
> hv_mmap_ring_buffer_wrapper (critical section)
>
> hv_uio_remove
> hv_remove_ring_sysfs (W to NULL)
Yes, and right in here someone mmaps the file.
I think you can race here, no locks at all feels wrong.
Messing with sysfs groups and files like this is rough, and almost never
a good idea, why can't you just do this all at once with the default
groups, why is this being added/removed out-of-band?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
2025-02-25 8:39 ` Greg Kroah-Hartman
@ 2025-02-26 5:13 ` Naman Jain
2025-02-26 10:03 ` Greg Kroah-Hartman
0 siblings, 1 reply; 12+ messages in thread
From: Naman Jain @ 2025-02-26 5:13 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Stephen Hemminger, linux-hyperv, linux-kernel, stable,
Saurabh Sengar, Michael Kelley, Long Li
On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
> On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
>>
>>
>> On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
>>> On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
>>>> On regular bootup, devices get registered to vmbus first, so when
>>>> uio_hv_generic driver for a particular device type is probed,
>>>> the device is already initialized and added, so sysfs creation in
>>>> uio_hv_generic probe works fine. However, when device is removed
>>>> and brought back, the channel rescinds and device again gets
>>>> registered to vmbus. However this time, the uio_hv_generic driver is
>>>> already registered to probe for that device and in this case sysfs
>>>> creation is tried before the device gets initialized completely.
>>>>
>>>> Fix this by moving the core logic of sysfs creation for ring buffer,
>>>> from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
>>>> attributes for the channels are defined. While doing that, make use
>>>> of attribute groups and macros, instead of creating sysfs directly,
>>>> to ensure better error handling and code flow.
>>>>
>>>> Problem path:
>>>> vmbus_device_register
>>>> device_register
>>>> uio_hv_generic probe
>>>> sysfs_create_bin_file (fails here)
>>>> kset_create_and_add (dependency)
>>>> vmbus_add_channel_kobj (dependency)
>>>>
>>>> Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel")
>>>> Cc: stable@kernel.org
>>>> Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com>
>>>> Suggested-by: Michael Kelley <mhklinux@outlook.com>
>>>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>>>> ---
>>>> Hi,
>>>> This is the first patch after initial RFC was posted.
>>>> https://lore.kernel.org/all/20250214064351.8994-1-namjain@linux.microsoft.com/
>>>>
>>>> Changes since RFC patch:
>>>> * Different approach to solve the problem is proposed (credits to
>>>> Michael Kelley).
>>>> * Core logic for sysfs creation moved out of uio_hv_generic, to VMBus
>>>> drivers where rest of the sysfs attributes for a VMBus channel
>>>> are defined. (addressed Greg's comments)
>>>> * Used attribute groups instead of sysfs_create* functions, and bundled
>>>> ring attribute with other attributes for the channel sysfs.
>>>>
>>>> Error logs:
>>>>
>>>> [ 35.574120] ------------[ cut here ]------------
>>>> [ 35.574122] WARNING: CPU: 0 PID: 10 at fs/sysfs/file.c:591 sysfs_create_bin_file+0x81/0x90
>>>> [ 35.574168] Workqueue: hv_pri_chan vmbus_add_channel_work
>>>> [ 35.574172] RIP: 0010:sysfs_create_bin_file+0x81/0x90
>>>> [ 35.574197] Call Trace:
>>>> [ 35.574199] <TASK>
>>>> [ 35.574200] ? show_regs+0x69/0x80
>>>> [ 35.574217] ? __warn+0x8d/0x130
>>>> [ 35.574220] ? sysfs_create_bin_file+0x81/0x90
>>>> [ 35.574222] ? report_bug+0x182/0x190
>>>> [ 35.574225] ? handle_bug+0x5b/0x90
>>>> [ 35.574244] ? exc_invalid_op+0x19/0x70
>>>> [ 35.574247] ? asm_exc_invalid_op+0x1b/0x20
>>>> [ 35.574252] ? sysfs_create_bin_file+0x81/0x90
>>>> [ 35.574255] hv_uio_probe+0x1e7/0x410 [uio_hv_generic]
>>>> [ 35.574271] vmbus_probe+0x3b/0x90
>>>> [ 35.574275] really_probe+0xf4/0x3b0
>>>> [ 35.574279] __driver_probe_device+0x8a/0x170
>>>> [ 35.574282] driver_probe_device+0x23/0xc0
>>>> [ 35.574285] __device_attach_driver+0xb5/0x140
>>>> [ 35.574288] ? __pfx___device_attach_driver+0x10/0x10
>>>> [ 35.574291] bus_for_each_drv+0x86/0xe0
>>>> [ 35.574294] __device_attach+0xc1/0x200
>>>> [ 35.574297] device_initial_probe+0x13/0x20
>>>> [ 35.574315] bus_probe_device+0x99/0xa0
>>>> [ 35.574318] device_add+0x647/0x870
>>>> [ 35.574320] ? hrtimer_init+0x28/0x70
>>>> [ 35.574323] device_register+0x1b/0x30
>>>> [ 35.574326] vmbus_device_register+0x83/0x130
>>>> [ 35.574328] vmbus_add_channel_work+0x135/0x1a0
>>>> [ 35.574331] process_one_work+0x177/0x340
>>>> [ 35.574348] worker_thread+0x2b2/0x3c0
>>>> [ 35.574350] kthread+0xe3/0x1f0
>>>> [ 35.574353] ? __pfx_worker_thread+0x10/0x10
>>>> [ 35.574356] ? __pfx_kthread+0x10/0x10
>>>>
>>>> ---
>>>> drivers/hv/hyperv_vmbus.h | 4 +++
>>>> drivers/hv/vmbus_drv.c | 62 ++++++++++++++++++++++++++++++++++++
>>>> drivers/uio/uio_hv_generic.c | 34 ++------------------
>>>> include/linux/hyperv.h | 3 ++
>>>> 4 files changed, 72 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
>>>> index 29780f3a7478..e0c7b75e6c7a 100644
>>>> --- a/drivers/hv/hyperv_vmbus.h
>>>> +++ b/drivers/hv/hyperv_vmbus.h
>>>> @@ -477,4 +477,8 @@ static inline int hv_debug_add_dev_dir(struct hv_device *dev)
>>>> #endif /* CONFIG_HYPERV_TESTING */
>>>> +/* Create and remove sysfs entry for memory mapped ring buffers for a channel */
>>>> +int hv_create_ring_sysfs(struct vmbus_channel *channel);
>>>> +int hv_remove_ring_sysfs(struct vmbus_channel *channel);
>>>> +
>>>> #endif /* _HYPERV_VMBUS_H */
>>>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>>>> index 22afebfc28ff..0110643bad3f 100644
>>>> --- a/drivers/hv/vmbus_drv.c
>>>> +++ b/drivers/hv/vmbus_drv.c
>>>> @@ -1802,6 +1802,39 @@ static ssize_t subchannel_id_show(struct vmbus_channel *channel,
>>>> }
>>>> static VMBUS_CHAN_ATTR_RO(subchannel_id);
>>>> +/* Functions to create sysfs interface to allow mmap of the ring buffers.
>>>> + * The ring buffer is allocated as contiguous memory by vmbus_open
>>>> + */
>>>> +static int hv_mmap_ring_buffer(struct vmbus_channel *channel, struct vm_area_struct *vma)
>>>> +{
>>>> + void *ring_buffer = page_address(channel->ringbuffer_page);
>>>> +
>>>> + if (channel->state != CHANNEL_OPENED_STATE)
>>>> + return -ENODEV;
>>>> +
>>>> + return vm_iomap_memory(vma, virt_to_phys(ring_buffer),
>>>> + channel->ringbuffer_pagecount << PAGE_SHIFT);
>>>> +}
>>>> +
>>>> +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
>>>> + const struct bin_attribute *attr,
>>>> + struct vm_area_struct *vma)
>>>> +{
>>>> + struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
>>>> +
>>>> + if (!channel->mmap_ring_buffer)
>>>> + return -ENODEV;
>>>> + return channel->mmap_ring_buffer(channel, vma);
>>>
>>> What is preventing mmap_ring_buffer from being set to NULL right after
>>> checking it and then calling it here? I see no locks here or where you
>>> are assigning this variable at all, so what is preventing these types of
>>> races?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Thank you so much for reviewing.
>> I spent some time to understand if this race condition can happen and it
>> seems execution flow is pretty sequential, for a particular channel of a
>> device.
>>
>> Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL) can be
>> called in parallel to hv_uio_probe (which had set
>> channel->mmap_ring_buffer to non NULL), I doubt race can happen here.
>>
>> Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
>>
>> vmbus_device_register
>> device_register
>> hv_uio_probe
>> hv_create_ring_sysfs (W to non NULL)
>> sysfs_update_group
>> vmbus_chan_attr_is_visible (R)
>> vmbus_add_channel_kobj
>> sysfs_create_group
>> vmbus_chan_attr_is_visible (R)
>> hv_mmap_ring_buffer_wrapper (critical section)
>>
>> hv_uio_remove
>> hv_remove_ring_sysfs (W to NULL)
>
> Yes, and right in here someone mmaps the file.
>
> I think you can race here, no locks at all feels wrong.
>
> Messing with sysfs groups and files like this is rough, and almost never
> a good idea, why can't you just do this all at once with the default
> groups, why is this being added/removed out-of-band?
>
> thanks,
>
> greg k-h
The decision to avoid creating a "ring" sysfs attribute by default
likely stems from a specific use case where it wasn't needed for every
device. By creating it automatically, it keeps the uio_hv_generic
driver simpler and helps prevent potential race conditions. However, it
has an added cost of having ring buffer for all the channels, where it
is not required. I am trying to find if there are any more implications
of it.
Thanks,
Naman
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
2025-02-26 5:13 ` Naman Jain
@ 2025-02-26 10:03 ` Greg Kroah-Hartman
2025-02-26 12:21 ` Naman Jain
0 siblings, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2025-02-26 10:03 UTC (permalink / raw)
To: Naman Jain
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Stephen Hemminger, linux-hyperv, linux-kernel, stable,
Saurabh Sengar, Michael Kelley, Long Li
On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
>
>
> On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
> > On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
> > >
> > >
> > > On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
> > > > On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
> > > > > On regular bootup, devices get registered to vmbus first, so when
> > > > > uio_hv_generic driver for a particular device type is probed,
> > > > > the device is already initialized and added, so sysfs creation in
> > > > > uio_hv_generic probe works fine. However, when device is removed
> > > > > and brought back, the channel rescinds and device again gets
> > > > > registered to vmbus. However this time, the uio_hv_generic driver is
> > > > > already registered to probe for that device and in this case sysfs
> > > > > creation is tried before the device gets initialized completely.
> > > > >
> > > > > Fix this by moving the core logic of sysfs creation for ring buffer,
> > > > > from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
> > > > > attributes for the channels are defined. While doing that, make use
> > > > > of attribute groups and macros, instead of creating sysfs directly,
> > > > > to ensure better error handling and code flow.
> > > > >
> > > > > Problem path:
> > > > > vmbus_device_register
> > > > > device_register
> > > > > uio_hv_generic probe
> > > > > sysfs_create_bin_file (fails here)
> > > > > kset_create_and_add (dependency)
> > > > > vmbus_add_channel_kobj (dependency)
> > > > >
> > > > > Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel")
> > > > > Cc: stable@kernel.org
> > > > > Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> > > > > Suggested-by: Michael Kelley <mhklinux@outlook.com>
> > > > > Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> > > > > ---
> > > > > Hi,
> > > > > This is the first patch after initial RFC was posted.
> > > > > https://lore.kernel.org/all/20250214064351.8994-1-namjain@linux.microsoft.com/
> > > > >
> > > > > Changes since RFC patch:
> > > > > * Different approach to solve the problem is proposed (credits to
> > > > > Michael Kelley).
> > > > > * Core logic for sysfs creation moved out of uio_hv_generic, to VMBus
> > > > > drivers where rest of the sysfs attributes for a VMBus channel
> > > > > are defined. (addressed Greg's comments)
> > > > > * Used attribute groups instead of sysfs_create* functions, and bundled
> > > > > ring attribute with other attributes for the channel sysfs.
> > > > >
> > > > > Error logs:
> > > > >
> > > > > [ 35.574120] ------------[ cut here ]------------
> > > > > [ 35.574122] WARNING: CPU: 0 PID: 10 at fs/sysfs/file.c:591 sysfs_create_bin_file+0x81/0x90
> > > > > [ 35.574168] Workqueue: hv_pri_chan vmbus_add_channel_work
> > > > > [ 35.574172] RIP: 0010:sysfs_create_bin_file+0x81/0x90
> > > > > [ 35.574197] Call Trace:
> > > > > [ 35.574199] <TASK>
> > > > > [ 35.574200] ? show_regs+0x69/0x80
> > > > > [ 35.574217] ? __warn+0x8d/0x130
> > > > > [ 35.574220] ? sysfs_create_bin_file+0x81/0x90
> > > > > [ 35.574222] ? report_bug+0x182/0x190
> > > > > [ 35.574225] ? handle_bug+0x5b/0x90
> > > > > [ 35.574244] ? exc_invalid_op+0x19/0x70
> > > > > [ 35.574247] ? asm_exc_invalid_op+0x1b/0x20
> > > > > [ 35.574252] ? sysfs_create_bin_file+0x81/0x90
> > > > > [ 35.574255] hv_uio_probe+0x1e7/0x410 [uio_hv_generic]
> > > > > [ 35.574271] vmbus_probe+0x3b/0x90
> > > > > [ 35.574275] really_probe+0xf4/0x3b0
> > > > > [ 35.574279] __driver_probe_device+0x8a/0x170
> > > > > [ 35.574282] driver_probe_device+0x23/0xc0
> > > > > [ 35.574285] __device_attach_driver+0xb5/0x140
> > > > > [ 35.574288] ? __pfx___device_attach_driver+0x10/0x10
> > > > > [ 35.574291] bus_for_each_drv+0x86/0xe0
> > > > > [ 35.574294] __device_attach+0xc1/0x200
> > > > > [ 35.574297] device_initial_probe+0x13/0x20
> > > > > [ 35.574315] bus_probe_device+0x99/0xa0
> > > > > [ 35.574318] device_add+0x647/0x870
> > > > > [ 35.574320] ? hrtimer_init+0x28/0x70
> > > > > [ 35.574323] device_register+0x1b/0x30
> > > > > [ 35.574326] vmbus_device_register+0x83/0x130
> > > > > [ 35.574328] vmbus_add_channel_work+0x135/0x1a0
> > > > > [ 35.574331] process_one_work+0x177/0x340
> > > > > [ 35.574348] worker_thread+0x2b2/0x3c0
> > > > > [ 35.574350] kthread+0xe3/0x1f0
> > > > > [ 35.574353] ? __pfx_worker_thread+0x10/0x10
> > > > > [ 35.574356] ? __pfx_kthread+0x10/0x10
> > > > >
> > > > > ---
> > > > > drivers/hv/hyperv_vmbus.h | 4 +++
> > > > > drivers/hv/vmbus_drv.c | 62 ++++++++++++++++++++++++++++++++++++
> > > > > drivers/uio/uio_hv_generic.c | 34 ++------------------
> > > > > include/linux/hyperv.h | 3 ++
> > > > > 4 files changed, 72 insertions(+), 31 deletions(-)
> > > > >
> > > > > diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> > > > > index 29780f3a7478..e0c7b75e6c7a 100644
> > > > > --- a/drivers/hv/hyperv_vmbus.h
> > > > > +++ b/drivers/hv/hyperv_vmbus.h
> > > > > @@ -477,4 +477,8 @@ static inline int hv_debug_add_dev_dir(struct hv_device *dev)
> > > > > #endif /* CONFIG_HYPERV_TESTING */
> > > > > +/* Create and remove sysfs entry for memory mapped ring buffers for a channel */
> > > > > +int hv_create_ring_sysfs(struct vmbus_channel *channel);
> > > > > +int hv_remove_ring_sysfs(struct vmbus_channel *channel);
> > > > > +
> > > > > #endif /* _HYPERV_VMBUS_H */
> > > > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> > > > > index 22afebfc28ff..0110643bad3f 100644
> > > > > --- a/drivers/hv/vmbus_drv.c
> > > > > +++ b/drivers/hv/vmbus_drv.c
> > > > > @@ -1802,6 +1802,39 @@ static ssize_t subchannel_id_show(struct vmbus_channel *channel,
> > > > > }
> > > > > static VMBUS_CHAN_ATTR_RO(subchannel_id);
> > > > > +/* Functions to create sysfs interface to allow mmap of the ring buffers.
> > > > > + * The ring buffer is allocated as contiguous memory by vmbus_open
> > > > > + */
> > > > > +static int hv_mmap_ring_buffer(struct vmbus_channel *channel, struct vm_area_struct *vma)
> > > > > +{
> > > > > + void *ring_buffer = page_address(channel->ringbuffer_page);
> > > > > +
> > > > > + if (channel->state != CHANNEL_OPENED_STATE)
> > > > > + return -ENODEV;
> > > > > +
> > > > > + return vm_iomap_memory(vma, virt_to_phys(ring_buffer),
> > > > > + channel->ringbuffer_pagecount << PAGE_SHIFT);
> > > > > +}
> > > > > +
> > > > > +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
> > > > > + const struct bin_attribute *attr,
> > > > > + struct vm_area_struct *vma)
> > > > > +{
> > > > > + struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
> > > > > +
> > > > > + if (!channel->mmap_ring_buffer)
> > > > > + return -ENODEV;
> > > > > + return channel->mmap_ring_buffer(channel, vma);
> > > >
> > > > What is preventing mmap_ring_buffer from being set to NULL right after
> > > > checking it and then calling it here? I see no locks here or where you
> > > > are assigning this variable at all, so what is preventing these types of
> > > > races?
> > > >
> > > > thanks,
> > > >
> > > > greg k-h
> > >
> > > Thank you so much for reviewing.
> > > I spent some time to understand if this race condition can happen and it
> > > seems execution flow is pretty sequential, for a particular channel of a
> > > device.
> > >
> > > Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL) can be
> > > called in parallel to hv_uio_probe (which had set
> > > channel->mmap_ring_buffer to non NULL), I doubt race can happen here.
> > >
> > > Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
> > >
> > > vmbus_device_register
> > > device_register
> > > hv_uio_probe
> > > hv_create_ring_sysfs (W to non NULL)
> > > sysfs_update_group
> > > vmbus_chan_attr_is_visible (R)
> > > vmbus_add_channel_kobj
> > > sysfs_create_group
> > > vmbus_chan_attr_is_visible (R)
> > > hv_mmap_ring_buffer_wrapper (critical section)
> > >
> > > hv_uio_remove
> > > hv_remove_ring_sysfs (W to NULL)
> >
> > Yes, and right in here someone mmaps the file.
> >
> > I think you can race here, no locks at all feels wrong.
> >
> > Messing with sysfs groups and files like this is rough, and almost never
> > a good idea, why can't you just do this all at once with the default
> > groups, why is this being added/removed out-of-band?
> >
> > thanks,
> >
> > greg k-h
>
> The decision to avoid creating a "ring" sysfs attribute by default
> likely stems from a specific use case where it wasn't needed for every
> device. By creating it automatically, it keeps the uio_hv_generic
> driver simpler and helps prevent potential race conditions. However, it
> has an added cost of having ring buffer for all the channels, where it
> is not required. I am trying to find if there are any more implications
> of it.
You do know about the "is_visable" attribute callback, right? Why not
just use that instead of manually mucking around with the
adding/removing of sysfs attributes at all? That is what it was
designed for.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
2025-02-26 10:03 ` Greg Kroah-Hartman
@ 2025-02-26 12:21 ` Naman Jain
2025-02-26 14:33 ` Greg Kroah-Hartman
0 siblings, 1 reply; 12+ messages in thread
From: Naman Jain @ 2025-02-26 12:21 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Stephen Hemminger, linux-hyperv, linux-kernel, stable,
Saurabh Sengar, Michael Kelley, Long Li
On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
> On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
>>
>>
>> On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
>>> On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
>>>>
>>>>
>>>> On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
>>>>> On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
>>>>>> On regular bootup, devices get registered to vmbus first, so when
>>>>>> uio_hv_generic driver for a particular device type is probed,
>>>>>> the device is already initialized and added, so sysfs creation in
>>>>>> uio_hv_generic probe works fine. However, when device is removed
>>>>>> and brought back, the channel rescinds and device again gets
>>>>>> registered to vmbus. However this time, the uio_hv_generic driver is
>>>>>> already registered to probe for that device and in this case sysfs
>>>>>> creation is tried before the device gets initialized completely.
>>>>>>
>>>>>> Fix this by moving the core logic of sysfs creation for ring buffer,
>>>>>> from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
>>>>>> attributes for the channels are defined. While doing that, make use
>>>>>> of attribute groups and macros, instead of creating sysfs directly,
>>>>>> to ensure better error handling and code flow.
<snip>
>>>>>> +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
>>>>>> + const struct bin_attribute *attr,
>>>>>> + struct vm_area_struct *vma)
>>>>>> +{
>>>>>> + struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
>>>>>> +
>>>>>> + if (!channel->mmap_ring_buffer)
>>>>>> + return -ENODEV;
>>>>>> + return channel->mmap_ring_buffer(channel, vma);
>>>>>
>>>>> What is preventing mmap_ring_buffer from being set to NULL right after
>>>>> checking it and then calling it here? I see no locks here or where you
>>>>> are assigning this variable at all, so what is preventing these types of
>>>>> races?
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>
>>>> Thank you so much for reviewing.
>>>> I spent some time to understand if this race condition can happen and it
>>>> seems execution flow is pretty sequential, for a particular channel of a
>>>> device.
>>>>
>>>> Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL) can be
>>>> called in parallel to hv_uio_probe (which had set
>>>> channel->mmap_ring_buffer to non NULL), I doubt race can happen here.
>>>>
>>>> Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
>>>>
>>>> vmbus_device_register
>>>> device_register
>>>> hv_uio_probe
>>>> hv_create_ring_sysfs (W to non NULL)
>>>> sysfs_update_group
>>>> vmbus_chan_attr_is_visible (R)
>>>> vmbus_add_channel_kobj
>>>> sysfs_create_group
>>>> vmbus_chan_attr_is_visible (R)
>>>> hv_mmap_ring_buffer_wrapper (critical section)
>>>>
>>>> hv_uio_remove
>>>> hv_remove_ring_sysfs (W to NULL)
>>>
>>> Yes, and right in here someone mmaps the file.
>>>
>>> I think you can race here, no locks at all feels wrong.
>>>
>>> Messing with sysfs groups and files like this is rough, and almost never
>>> a good idea, why can't you just do this all at once with the default
>>> groups, why is this being added/removed out-of-band?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> The decision to avoid creating a "ring" sysfs attribute by default
>> likely stems from a specific use case where it wasn't needed for every
>> device. By creating it automatically, it keeps the uio_hv_generic
>> driver simpler and helps prevent potential race conditions. However, it
>> has an added cost of having ring buffer for all the channels, where it
>> is not required. I am trying to find if there are any more implications
>> of it.
>
> You do know about the "is_visable" attribute callback, right? Why not
> just use that instead of manually mucking around with the
> adding/removing of sysfs attributes at all? That is what it was
> designed for.
>
> thanks,
>
> greg k-h
Hi Greg,
Yes, I am utilizing that in my patch. For differentiating channels of a
uio_hv_generic device, and for *selectively* creating sysfs, we
introduced this field in channel struct "channel->mmap_ring_buffer",
which we were setting only in uio_hv_generic. But, by the time we set
this in uio_hv_generic driver, the sysfs creation has already gone
through and sysfs doesn't get updated dynamically. That's where there
was a need to call sysfs_update_group. I thought the better place to
keep sysfs_update_group would be in vmbus driver, where we are creating
the original sysfs entries, hence I had to add the wrapper functions.
This led us to the race condition we are trying to address now.
@@ -1838,6 +1872,10 @@ static umode_t vmbus_chan_attr_is_visible(struct
kobject *kobj,
attr == &chan_attr_monitor_id.attr))
return 0;
+ /* Hide ring attribute if channel's mmap_ring_buffer function is not
yet initialised */
+ if (attr == &chan_attr_ring_buffer.attr && !channel->mmap_ring_buffer)
+ return 0;
+
return attr->mode;
}
Thanks,
Naman
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
2025-02-26 12:21 ` Naman Jain
@ 2025-02-26 14:33 ` Greg Kroah-Hartman
2025-02-27 6:24 ` Naman Jain
0 siblings, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2025-02-26 14:33 UTC (permalink / raw)
To: Naman Jain
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Stephen Hemminger, linux-hyperv, linux-kernel, stable,
Saurabh Sengar, Michael Kelley, Long Li
On Wed, Feb 26, 2025 at 05:51:46PM +0530, Naman Jain wrote:
>
>
> On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
> > On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
> > >
> > >
> > > On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
> > > > On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
> > > > >
> > > > >
> > > > > On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
> > > > > > On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
> > > > > > > On regular bootup, devices get registered to vmbus first, so when
> > > > > > > uio_hv_generic driver for a particular device type is probed,
> > > > > > > the device is already initialized and added, so sysfs creation in
> > > > > > > uio_hv_generic probe works fine. However, when device is removed
> > > > > > > and brought back, the channel rescinds and device again gets
> > > > > > > registered to vmbus. However this time, the uio_hv_generic driver is
> > > > > > > already registered to probe for that device and in this case sysfs
> > > > > > > creation is tried before the device gets initialized completely.
> > > > > > >
> > > > > > > Fix this by moving the core logic of sysfs creation for ring buffer,
> > > > > > > from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
> > > > > > > attributes for the channels are defined. While doing that, make use
> > > > > > > of attribute groups and macros, instead of creating sysfs directly,
> > > > > > > to ensure better error handling and code flow.
>
> <snip>
>
> > > > > > > +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
> > > > > > > + const struct bin_attribute *attr,
> > > > > > > + struct vm_area_struct *vma)
> > > > > > > +{
> > > > > > > + struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
> > > > > > > +
> > > > > > > + if (!channel->mmap_ring_buffer)
> > > > > > > + return -ENODEV;
> > > > > > > + return channel->mmap_ring_buffer(channel, vma);
> > > > > >
> > > > > > What is preventing mmap_ring_buffer from being set to NULL right after
> > > > > > checking it and then calling it here? I see no locks here or where you
> > > > > > are assigning this variable at all, so what is preventing these types of
> > > > > > races?
> > > > > >
> > > > > > thanks,
> > > > > >
> > > > > > greg k-h
> > > > >
> > > > > Thank you so much for reviewing.
> > > > > I spent some time to understand if this race condition can happen and it
> > > > > seems execution flow is pretty sequential, for a particular channel of a
> > > > > device.
> > > > >
> > > > > Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL) can be
> > > > > called in parallel to hv_uio_probe (which had set
> > > > > channel->mmap_ring_buffer to non NULL), I doubt race can happen here.
> > > > >
> > > > > Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
> > > > >
> > > > > vmbus_device_register
> > > > > device_register
> > > > > hv_uio_probe
> > > > > hv_create_ring_sysfs (W to non NULL)
> > > > > sysfs_update_group
> > > > > vmbus_chan_attr_is_visible (R)
> > > > > vmbus_add_channel_kobj
> > > > > sysfs_create_group
> > > > > vmbus_chan_attr_is_visible (R)
> > > > > hv_mmap_ring_buffer_wrapper (critical section)
> > > > >
> > > > > hv_uio_remove
> > > > > hv_remove_ring_sysfs (W to NULL)
> > > >
> > > > Yes, and right in here someone mmaps the file.
> > > >
> > > > I think you can race here, no locks at all feels wrong.
> > > >
> > > > Messing with sysfs groups and files like this is rough, and almost never
> > > > a good idea, why can't you just do this all at once with the default
> > > > groups, why is this being added/removed out-of-band?
> > > >
> > > > thanks,
> > > >
> > > > greg k-h
> > >
> > > The decision to avoid creating a "ring" sysfs attribute by default
> > > likely stems from a specific use case where it wasn't needed for every
> > > device. By creating it automatically, it keeps the uio_hv_generic
> > > driver simpler and helps prevent potential race conditions. However, it
> > > has an added cost of having ring buffer for all the channels, where it
> > > is not required. I am trying to find if there are any more implications
> > > of it.
> >
> > You do know about the "is_visable" attribute callback, right? Why not
> > just use that instead of manually mucking around with the
> > adding/removing of sysfs attributes at all? That is what it was
> > designed for.
> >
> > thanks,
> >
> > greg k-h
>
> Hi Greg,
> Yes, I am utilizing that in my patch. For differentiating channels of a
> uio_hv_generic device, and for *selectively* creating sysfs, we
> introduced this field in channel struct "channel->mmap_ring_buffer",
> which we were setting only in uio_hv_generic. But, by the time we set
> this in uio_hv_generic driver, the sysfs creation has already gone
> through and sysfs doesn't get updated dynamically. That's where there
> was a need to call sysfs_update_group. I thought the better place to
> keep sysfs_update_group would be in vmbus driver, where we are creating
> the original sysfs entries, hence I had to add the wrapper functions.
> This led us to the race condition we are trying to address now.
>
>
> @@ -1838,6 +1872,10 @@ static umode_t vmbus_chan_attr_is_visible(struct
> kobject *kobj,
> attr == &chan_attr_monitor_id.attr))
> return 0;
>
> + /* Hide ring attribute if channel's mmap_ring_buffer function is not yet
> initialised */
> + if (attr == &chan_attr_ring_buffer.attr && !channel->mmap_ring_buffer)
> + return 0;
> +
> return attr->mode;
Ok, that's good. BUT you need to change the detection of this to be
before the device is set up in the driver. Why can't you do it in the
device creation logic itself instead of after-the-fact when you will
ALWAYS race with userspace?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
2025-02-26 14:33 ` Greg Kroah-Hartman
@ 2025-02-27 6:24 ` Naman Jain
2025-03-05 7:06 ` Naman Jain
0 siblings, 1 reply; 12+ messages in thread
From: Naman Jain @ 2025-02-27 6:24 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Stephen Hemminger, linux-hyperv, linux-kernel, stable,
Saurabh Sengar, Michael Kelley, Long Li
On 2/26/2025 8:03 PM, Greg Kroah-Hartman wrote:
> On Wed, Feb 26, 2025 at 05:51:46PM +0530, Naman Jain wrote:
>>
>>
>> On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
>>> On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
>>>>
>>>>
>>>> On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
>>>>> On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
>>>>>>
>>>>>>
>>>>>> On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
>>>>>>> On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
>>>>>>>> On regular bootup, devices get registered to vmbus first, so when
>>>>>>>> uio_hv_generic driver for a particular device type is probed,
>>>>>>>> the device is already initialized and added, so sysfs creation in
>>>>>>>> uio_hv_generic probe works fine. However, when device is removed
>>>>>>>> and brought back, the channel rescinds and device again gets
>>>>>>>> registered to vmbus. However this time, the uio_hv_generic driver is
>>>>>>>> already registered to probe for that device and in this case sysfs
>>>>>>>> creation is tried before the device gets initialized completely.
>>>>>>>>
>>>>>>>> Fix this by moving the core logic of sysfs creation for ring buffer,
>>>>>>>> from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
>>>>>>>> attributes for the channels are defined. While doing that, make use
>>>>>>>> of attribute groups and macros, instead of creating sysfs directly,
>>>>>>>> to ensure better error handling and code flow.
>>
>> <snip>
>>
>>>>>>>> +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
>>>>>>>> + const struct bin_attribute *attr,
>>>>>>>> + struct vm_area_struct *vma)
>>>>>>>> +{
>>>>>>>> + struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
>>>>>>>> +
>>>>>>>> + if (!channel->mmap_ring_buffer)
>>>>>>>> + return -ENODEV;
>>>>>>>> + return channel->mmap_ring_buffer(channel, vma);
>>>>>>>
>>>>>>> What is preventing mmap_ring_buffer from being set to NULL right after
>>>>>>> checking it and then calling it here? I see no locks here or where you
>>>>>>> are assigning this variable at all, so what is preventing these types of
>>>>>>> races?
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> greg k-h
>>>>>>
>>>>>> Thank you so much for reviewing.
>>>>>> I spent some time to understand if this race condition can happen and it
>>>>>> seems execution flow is pretty sequential, for a particular channel of a
>>>>>> device.
>>>>>>
>>>>>> Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL) can be
>>>>>> called in parallel to hv_uio_probe (which had set
>>>>>> channel->mmap_ring_buffer to non NULL), I doubt race can happen here.
>>>>>>
>>>>>> Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
>>>>>>
>>>>>> vmbus_device_register
>>>>>> device_register
>>>>>> hv_uio_probe
>>>>>> hv_create_ring_sysfs (W to non NULL)
>>>>>> sysfs_update_group
>>>>>> vmbus_chan_attr_is_visible (R)
>>>>>> vmbus_add_channel_kobj
>>>>>> sysfs_create_group
>>>>>> vmbus_chan_attr_is_visible (R)
>>>>>> hv_mmap_ring_buffer_wrapper (critical section)
>>>>>>
>>>>>> hv_uio_remove
>>>>>> hv_remove_ring_sysfs (W to NULL)
>>>>>
>>>>> Yes, and right in here someone mmaps the file.
>>>>>
>>>>> I think you can race here, no locks at all feels wrong.
>>>>>
>>>>> Messing with sysfs groups and files like this is rough, and almost never
>>>>> a good idea, why can't you just do this all at once with the default
>>>>> groups, why is this being added/removed out-of-band?
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>
>>>> The decision to avoid creating a "ring" sysfs attribute by default
>>>> likely stems from a specific use case where it wasn't needed for every
>>>> device. By creating it automatically, it keeps the uio_hv_generic
>>>> driver simpler and helps prevent potential race conditions. However, it
>>>> has an added cost of having ring buffer for all the channels, where it
>>>> is not required. I am trying to find if there are any more implications
>>>> of it.
>>>
>>> You do know about the "is_visable" attribute callback, right? Why not
>>> just use that instead of manually mucking around with the
>>> adding/removing of sysfs attributes at all? That is what it was
>>> designed for.
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Hi Greg,
>> Yes, I am utilizing that in my patch. For differentiating channels of a
>> uio_hv_generic device, and for *selectively* creating sysfs, we
>> introduced this field in channel struct "channel->mmap_ring_buffer",
>> which we were setting only in uio_hv_generic. But, by the time we set
>> this in uio_hv_generic driver, the sysfs creation has already gone
>> through and sysfs doesn't get updated dynamically. That's where there
>> was a need to call sysfs_update_group. I thought the better place to
>> keep sysfs_update_group would be in vmbus driver, where we are creating
>> the original sysfs entries, hence I had to add the wrapper functions.
>> This led us to the race condition we are trying to address now.
>>
>>
>> @@ -1838,6 +1872,10 @@ static umode_t vmbus_chan_attr_is_visible(struct
>> kobject *kobj,
>> attr == &chan_attr_monitor_id.attr))
>> return 0;
>>
>> + /* Hide ring attribute if channel's mmap_ring_buffer function is not yet
>> initialised */
>> + if (attr == &chan_attr_ring_buffer.attr && !channel->mmap_ring_buffer)
>> + return 0;
>> +
>> return attr->mode;
>
> Ok, that's good. BUT you need to change the detection of this to be
> before the device is set up in the driver. Why can't you do it in the
> device creation logic itself instead of after-the-fact when you will
> ALWAYS race with userspace?
>
> thanks,
>
> greg k-h
Sure, will check more on this. Thanks.
Regards,
Naman
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
2025-02-27 6:24 ` Naman Jain
@ 2025-03-05 7:06 ` Naman Jain
2025-03-11 4:45 ` Naman Jain
0 siblings, 1 reply; 12+ messages in thread
From: Naman Jain @ 2025-03-05 7:06 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Stephen Hemminger, linux-hyperv, linux-kernel, stable,
Saurabh Sengar, Michael Kelley, Long Li
On 2/27/2025 11:54 AM, Naman Jain wrote:
>
>
> On 2/26/2025 8:03 PM, Greg Kroah-Hartman wrote:
>> On Wed, Feb 26, 2025 at 05:51:46PM +0530, Naman Jain wrote:
>>>
>>>
>>> On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
>>>> On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
>>>>>
>>>>>
>>>>> On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
>>>>>> On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
>>>>>>>> On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
>>>>>>>>> On regular bootup, devices get registered to vmbus first, so when
>>>>>>>>> uio_hv_generic driver for a particular device type is probed,
>>>>>>>>> the device is already initialized and added, so sysfs creation in
>>>>>>>>> uio_hv_generic probe works fine. However, when device is removed
>>>>>>>>> and brought back, the channel rescinds and device again gets
>>>>>>>>> registered to vmbus. However this time, the uio_hv_generic
>>>>>>>>> driver is
>>>>>>>>> already registered to probe for that device and in this case sysfs
>>>>>>>>> creation is tried before the device gets initialized completely.
>>>>>>>>>
>>>>>>>>> Fix this by moving the core logic of sysfs creation for ring
>>>>>>>>> buffer,
>>>>>>>>> from uio_hv_generic to HyperV's vmbus driver, where rest of the
>>>>>>>>> sysfs
>>>>>>>>> attributes for the channels are defined. While doing that, make
>>>>>>>>> use
>>>>>>>>> of attribute groups and macros, instead of creating sysfs
>>>>>>>>> directly,
>>>>>>>>> to ensure better error handling and code flow.
>>>
>>> <snip>
>>>
>>>>>>>>> +static int hv_mmap_ring_buffer_wrapper(struct file *filp,
>>>>>>>>> struct kobject *kobj,
>>>>>>>>> + const struct bin_attribute *attr,
>>>>>>>>> + struct vm_area_struct *vma)
>>>>>>>>> +{
>>>>>>>>> + struct vmbus_channel *channel = container_of(kobj, struct
>>>>>>>>> vmbus_channel, kobj);
>>>>>>>>> +
>>>>>>>>> + if (!channel->mmap_ring_buffer)
>>>>>>>>> + return -ENODEV;
>>>>>>>>> + return channel->mmap_ring_buffer(channel, vma);
>>>>>>>>
>>>>>>>> What is preventing mmap_ring_buffer from being set to NULL right
>>>>>>>> after
>>>>>>>> checking it and then calling it here? I see no locks here or
>>>>>>>> where you
>>>>>>>> are assigning this variable at all, so what is preventing these
>>>>>>>> types of
>>>>>>>> races?
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> greg k-h
>>>>>>>
>>>>>>> Thank you so much for reviewing.
>>>>>>> I spent some time to understand if this race condition can happen
>>>>>>> and it
>>>>>>> seems execution flow is pretty sequential, for a particular
>>>>>>> channel of a
>>>>>>> device.
>>>>>>>
>>>>>>> Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL)
>>>>>>> can be
>>>>>>> called in parallel to hv_uio_probe (which had set
>>>>>>> channel->mmap_ring_buffer to non NULL), I doubt race can happen
>>>>>>> here.
>>>>>>>
>>>>>>> Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
>>>>>>>
>>>>>>> vmbus_device_register
>>>>>>> device_register
>>>>>>> hv_uio_probe
>>>>>>> hv_create_ring_sysfs (W to non NULL)
>>>>>>> sysfs_update_group
>>>>>>> vmbus_chan_attr_is_visible (R)
>>>>>>> vmbus_add_channel_kobj
>>>>>>> sysfs_create_group
>>>>>>> vmbus_chan_attr_is_visible (R)
>>>>>>> hv_mmap_ring_buffer_wrapper (critical section)
>>>>>>>
>>>>>>> hv_uio_remove
>>>>>>> hv_remove_ring_sysfs (W to NULL)
>>>>>>
>>>>>> Yes, and right in here someone mmaps the file.
>>>>>>
>>>>>> I think you can race here, no locks at all feels wrong.
>>>>>>
>>>>>> Messing with sysfs groups and files like this is rough, and almost
>>>>>> never
>>>>>> a good idea, why can't you just do this all at once with the default
>>>>>> groups, why is this being added/removed out-of-band?
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> greg k-h
>>>>>
>>>>> The decision to avoid creating a "ring" sysfs attribute by default
>>>>> likely stems from a specific use case where it wasn't needed for every
>>>>> device. By creating it automatically, it keeps the uio_hv_generic
>>>>> driver simpler and helps prevent potential race conditions.
>>>>> However, it
>>>>> has an added cost of having ring buffer for all the channels, where it
>>>>> is not required. I am trying to find if there are any more
>>>>> implications
>>>>> of it.
>>>>
>>>> You do know about the "is_visable" attribute callback, right? Why not
>>>> just use that instead of manually mucking around with the
>>>> adding/removing of sysfs attributes at all? That is what it was
>>>> designed for.
>>>>
>>>> thanks,
>>>>
>>>> greg k-h
>>>
>>> Hi Greg,
>>> Yes, I am utilizing that in my patch. For differentiating channels of a
>>> uio_hv_generic device, and for *selectively* creating sysfs, we
>>> introduced this field in channel struct "channel->mmap_ring_buffer",
>>> which we were setting only in uio_hv_generic. But, by the time we set
>>> this in uio_hv_generic driver, the sysfs creation has already gone
>>> through and sysfs doesn't get updated dynamically. That's where there
>>> was a need to call sysfs_update_group. I thought the better place to
>>> keep sysfs_update_group would be in vmbus driver, where we are creating
>>> the original sysfs entries, hence I had to add the wrapper functions.
>>> This led us to the race condition we are trying to address now.
>>>
>>>
>>> @@ -1838,6 +1872,10 @@ static umode_t vmbus_chan_attr_is_visible(struct
>>> kobject *kobj,
>>> attr == &chan_attr_monitor_id.attr))
>>> return 0;
>>>
>>> + /* Hide ring attribute if channel's mmap_ring_buffer function is
>>> not yet
>>> initialised */
>>> + if (attr == &chan_attr_ring_buffer.attr && !channel-
>>> >mmap_ring_buffer)
>>> + return 0;
>>> +
>>> return attr->mode;
>>
>> Ok, that's good. BUT you need to change the detection of this to be
>> before the device is set up in the driver. Why can't you do it in the
>> device creation logic itself instead of after-the-fact when you will
>> ALWAYS race with userspace?
>>
>> thanks,
>>
>> greg k-h
>
> Sure, will check more on this. Thanks.
>
> Regards,
> Naman
>
Hi Greg,
I understand this is deviating from the discussions that we have had
till now, but I wanted to kindly request your opinion on the following
approach, which came up in one of our internal discussions.
By moving the sysfs creation logic from hv_uio_probe to hv_uio_open, I
believe we can address this problem. Here are some of the benefits of
this approach:
* This approach involves minimal changes and provides a localized
solution.
* Since the use-case of ring sysfs is specific to uio_hv_generic and
DPDK, this will give us the flexibility to modify it based on
requirements. For example, ring_buffer_bin_attr.size should depend on
the ring buffer's allocated size, which is easier to manage if the
current code resides in uio_hv_generic.
* The use-case of DPDK is such that at any given time, either the
hv_netvsc kernel driver or the userspace (DPDK) will be controlling this
HV_NIC device. We do not want to expose the ring buffer to userspace
when hv_netvsc is using the device. This is where the "awareness" of the
current user comes into play, and we need a way to control the
visibility of the "ring" sysfs from uio_hv_generic.
Alternatively, I could focus on resolving the race condition you
mentioned and proceed with refining the patch. This approach addresses
most of the general practice concerns you highlighted.
Regards,
Naman
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
2025-03-05 7:06 ` Naman Jain
@ 2025-03-11 4:45 ` Naman Jain
2025-03-11 17:48 ` Dexuan Cui
0 siblings, 1 reply; 12+ messages in thread
From: Naman Jain @ 2025-03-11 4:45 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Stephen Hemminger, linux-hyperv, linux-kernel, stable,
Saurabh Sengar, Michael Kelley, Long Li
On 3/5/2025 12:36 PM, Naman Jain wrote:
>
>
> On 2/27/2025 11:54 AM, Naman Jain wrote:
>>
>>
>> On 2/26/2025 8:03 PM, Greg Kroah-Hartman wrote:
>>> On Wed, Feb 26, 2025 at 05:51:46PM +0530, Naman Jain wrote:
>>>>
>>>>
>>>> On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
>>>>> On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
>>>>>>
>>>>>>
>>>>>> On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
>>>>>>> On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
>>>>>>>>> On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
>>>>>>>>>> On regular bootup, devices get registered to vmbus first, so when
>>>>>>>>>> uio_hv_generic driver for a particular device type is probed,
>>>>>>>>>> the device is already initialized and added, so sysfs creation in
>>>>>>>>>> uio_hv_generic probe works fine. However, when device is removed
>>>>>>>>>> and brought back, the channel rescinds and device again gets
>>>>>>>>>> registered to vmbus. However this time, the uio_hv_generic
>>>>>>>>>> driver is
>>>>>>>>>> already registered to probe for that device and in this case
>>>>>>>>>> sysfs
>>>>>>>>>> creation is tried before the device gets initialized completely.
>>>>>>>>>>
>>>>>>>>>> Fix this by moving the core logic of sysfs creation for ring
>>>>>>>>>> buffer,
>>>>>>>>>> from uio_hv_generic to HyperV's vmbus driver, where rest of
>>>>>>>>>> the sysfs
>>>>>>>>>> attributes for the channels are defined. While doing that,
>>>>>>>>>> make use
>>>>>>>>>> of attribute groups and macros, instead of creating sysfs
>>>>>>>>>> directly,
>>>>>>>>>> to ensure better error handling and code flow.
>>>>
>>>> <snip>
>>>>
>>>>>>>>>> +static int hv_mmap_ring_buffer_wrapper(struct file *filp,
>>>>>>>>>> struct kobject *kobj,
>>>>>>>>>> + const struct bin_attribute *attr,
>>>>>>>>>> + struct vm_area_struct *vma)
>>>>>>>>>> +{
>>>>>>>>>> + struct vmbus_channel *channel = container_of(kobj, struct
>>>>>>>>>> vmbus_channel, kobj);
>>>>>>>>>> +
>>>>>>>>>> + if (!channel->mmap_ring_buffer)
>>>>>>>>>> + return -ENODEV;
>>>>>>>>>> + return channel->mmap_ring_buffer(channel, vma);
>>>>>>>>>
>>>>>>>>> What is preventing mmap_ring_buffer from being set to NULL
>>>>>>>>> right after
>>>>>>>>> checking it and then calling it here? I see no locks here or
>>>>>>>>> where you
>>>>>>>>> are assigning this variable at all, so what is preventing these
>>>>>>>>> types of
>>>>>>>>> races?
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> greg k-h
>>>>>>>>
>>>>>>>> Thank you so much for reviewing.
>>>>>>>> I spent some time to understand if this race condition can
>>>>>>>> happen and it
>>>>>>>> seems execution flow is pretty sequential, for a particular
>>>>>>>> channel of a
>>>>>>>> device.
>>>>>>>>
>>>>>>>> Unless hv_uio_remove (which makes channel->mmap_ring_buffer
>>>>>>>> NULL) can be
>>>>>>>> called in parallel to hv_uio_probe (which had set
>>>>>>>> channel->mmap_ring_buffer to non NULL), I doubt race can happen
>>>>>>>> here.
>>>>>>>>
>>>>>>>> Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
>>>>>>>>
>>>>>>>> vmbus_device_register
>>>>>>>> device_register
>>>>>>>> hv_uio_probe
>>>>>>>> hv_create_ring_sysfs (W to non NULL)
>>>>>>>> sysfs_update_group
>>>>>>>> vmbus_chan_attr_is_visible (R)
>>>>>>>> vmbus_add_channel_kobj
>>>>>>>> sysfs_create_group
>>>>>>>> vmbus_chan_attr_is_visible (R)
>>>>>>>> hv_mmap_ring_buffer_wrapper (critical section)
>>>>>>>>
>>>>>>>> hv_uio_remove
>>>>>>>> hv_remove_ring_sysfs (W to NULL)
>>>>>>>
>>>>>>> Yes, and right in here someone mmaps the file.
>>>>>>>
>>>>>>> I think you can race here, no locks at all feels wrong.
>>>>>>>
>>>>>>> Messing with sysfs groups and files like this is rough, and
>>>>>>> almost never
>>>>>>> a good idea, why can't you just do this all at once with the default
>>>>>>> groups, why is this being added/removed out-of-band?
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> greg k-h
>>>>>>
>>>>>> The decision to avoid creating a "ring" sysfs attribute by default
>>>>>> likely stems from a specific use case where it wasn't needed for
>>>>>> every
>>>>>> device. By creating it automatically, it keeps the uio_hv_generic
>>>>>> driver simpler and helps prevent potential race conditions.
>>>>>> However, it
>>>>>> has an added cost of having ring buffer for all the channels,
>>>>>> where it
>>>>>> is not required. I am trying to find if there are any more
>>>>>> implications
>>>>>> of it.
>>>>>
>>>>> You do know about the "is_visable" attribute callback, right? Why not
>>>>> just use that instead of manually mucking around with the
>>>>> adding/removing of sysfs attributes at all? That is what it was
>>>>> designed for.
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>
>>>> Hi Greg,
>>>> Yes, I am utilizing that in my patch. For differentiating channels of a
>>>> uio_hv_generic device, and for *selectively* creating sysfs, we
>>>> introduced this field in channel struct "channel->mmap_ring_buffer",
>>>> which we were setting only in uio_hv_generic. But, by the time we set
>>>> this in uio_hv_generic driver, the sysfs creation has already gone
>>>> through and sysfs doesn't get updated dynamically. That's where there
>>>> was a need to call sysfs_update_group. I thought the better place to
>>>> keep sysfs_update_group would be in vmbus driver, where we are creating
>>>> the original sysfs entries, hence I had to add the wrapper functions.
>>>> This led us to the race condition we are trying to address now.
>>>>
>>>>
>>>> @@ -1838,6 +1872,10 @@ static umode_t vmbus_chan_attr_is_visible(struct
>>>> kobject *kobj,
>>>> attr == &chan_attr_monitor_id.attr))
>>>> return 0;
>>>>
>>>> + /* Hide ring attribute if channel's mmap_ring_buffer function
>>>> is not yet
>>>> initialised */
>>>> + if (attr == &chan_attr_ring_buffer.attr && !channel-
>>>> >mmap_ring_buffer)
>>>> + return 0;
>>>> +
>>>> return attr->mode;
>>>
>>> Ok, that's good. BUT you need to change the detection of this to be
>>> before the device is set up in the driver. Why can't you do it in the
>>> device creation logic itself instead of after-the-fact when you will
>>> ALWAYS race with userspace?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Sure, will check more on this. Thanks.
>>
>> Regards,
>> Naman
>>
>
> Hi Greg,
> I understand this is deviating from the discussions that we have had
> till now, but I wanted to kindly request your opinion on the following
> approach, which came up in one of our internal discussions.
>
> By moving the sysfs creation logic from hv_uio_probe to hv_uio_open, I
> believe we can address this problem. Here are some of the benefits of
> this approach:
>
> * This approach involves minimal changes and provides a localized
> solution.
>
> * Since the use-case of ring sysfs is specific to uio_hv_generic and
> DPDK, this will give us the flexibility to modify it based on
> requirements. For example, ring_buffer_bin_attr.size should depend on
> the ring buffer's allocated size, which is easier to manage if the
> current code resides in uio_hv_generic.
>
> * The use-case of DPDK is such that at any given time, either the
> hv_netvsc kernel driver or the userspace (DPDK) will be controlling this
> HV_NIC device. We do not want to expose the ring buffer to userspace
> when hv_netvsc is using the device. This is where the "awareness" of the
> current user comes into play, and we need a way to control the
> visibility of the "ring" sysfs from uio_hv_generic.
>
>
> Alternatively, I could focus on resolving the race condition you
> mentioned and proceed with refining the patch. This approach addresses
> most of the general practice concerns you highlighted.
>
> Regards,
> Naman
Hello Greg,
Here are the two approaches that we discussed.
Can you please suggest the approach which looks better to you
for next version.
**Approach 1: move sysfs creation to hv_uio_open**
---
drivers/uio/uio_hv_generic.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 1b19b5647495..fec78c90279e 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -218,6 +218,11 @@ hv_uio_open(struct uio_info *info, struct inode *inode)
vmbus_set_chn_rescind_callback(dev->channel, hv_uio_rescind);
vmbus_set_sc_create_callback(dev->channel, hv_uio_new_channel);
+ ret = sysfs_create_bin_file(&dev->channel->kobj, &ring_buffer_bin_attr);
+ if (ret)
+ dev_notice(&dev->device,
+ "sysfs create ring bin file failed; %d\n", ret);
+
ret = vmbus_connect_ring(dev->channel,
hv_uio_channel_cb, dev->channel);
if (ret == 0)
@@ -350,11 +355,6 @@ hv_uio_probe(struct hv_device *dev,
goto fail_close;
}
- ret = sysfs_create_bin_file(&channel->kobj, &ring_buffer_bin_attr);
- if (ret)
- dev_notice(&dev->device,
- "sysfs create ring bin file failed; %d\n", ret);
-
hv_set_drvdata(dev, pdata);
return 0;
--
=====================================================================
**Approach 2: Move sysfs creation to hyperv drivers**
---
drivers/hv/hyperv_vmbus.h | 6 +++
drivers/hv/vmbus_drv.c | 73 +++++++++++++++++++++++++++++++++++-
drivers/uio/uio_hv_generic.c | 33 ++++++----------
include/linux/hyperv.h | 6 +++
4 files changed, 95 insertions(+), 23 deletions(-)
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 29780f3a7478..0b450e53161e 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -477,4 +477,10 @@ static inline int hv_debug_add_dev_dir(struct
hv_device *dev)
#endif /* CONFIG_HYPERV_TESTING */
+/* Create and remove sysfs entry for memory mapped ring buffers for a
channel */
+int hv_create_ring_sysfs(struct vmbus_channel *channel,
+ int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
+ struct vm_area_struct *vma));
+int hv_remove_ring_sysfs(struct vmbus_channel *channel);
+
#endif /* _HYPERV_VMBUS_H */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 22afebfc28ff..44ecbca83c9e 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1802,6 +1802,27 @@ static ssize_t subchannel_id_show(struct
vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR_RO(subchannel_id);
+static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct
kobject *kobj,
+ const struct bin_attribute *attr,
+ struct vm_area_struct *vma)
+{
+ struct vmbus_channel *channel = container_of(kobj, struct
vmbus_channel, kobj);
+
+ /*
+ * hv_(create|remove)_ring_sysfs implementation ensures that
mmap_ring_buffer
+ * is not NULL.
+ */
+ return channel->mmap_ring_buffer(channel, vma);
+}
+
+static struct bin_attribute chan_attr_ring_buffer = {
+ .attr = {
+ .name = "ring",
+ .mode = 0600,
+ },
+ .size = 2 * SZ_2M,
+ .mmap = hv_mmap_ring_buffer_wrapper,
+};
static struct attribute *vmbus_chan_attrs[] = {
&chan_attr_out_mask.attr,
&chan_attr_in_mask.attr,
@@ -1821,6 +1842,11 @@ static struct attribute *vmbus_chan_attrs[] = {
NULL
};
+static struct bin_attribute *vmbus_chan_bin_attrs[] = {
+ &chan_attr_ring_buffer,
+ NULL
+};
+
/*
* Channel-level attribute_group callback function. Returns the
permission for
* each attribute, and returns 0 if an attribute is not visible.
@@ -1841,9 +1867,24 @@ static umode_t vmbus_chan_attr_is_visible(struct
kobject *kobj,
return attr->mode;
}
+static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj,
+ const struct bin_attribute *attr, int idx)
+{
+ const struct vmbus_channel *channel =
+ container_of(kobj, struct vmbus_channel, kobj);
+
+ /* Hide ring attribute if channel's ring_sysfs_visible is set to false */
+ if (attr == &chan_attr_ring_buffer && !channel->ring_sysfs_visible)
+ return 0;
+
+ return attr->attr.mode;
+}
+
static const struct attribute_group vmbus_chan_group = {
.attrs = vmbus_chan_attrs,
- .is_visible = vmbus_chan_attr_is_visible
+ .bin_attrs = vmbus_chan_bin_attrs,
+ .is_visible = vmbus_chan_attr_is_visible,
+ .is_bin_visible = vmbus_chan_bin_attr_is_visible,
};
static const struct kobj_type vmbus_chan_ktype = {
@@ -1851,6 +1892,36 @@ static const struct kobj_type vmbus_chan_ktype = {
.release = vmbus_chan_release,
};
+/*
+ * hv_create_ring_sysfs - create ring sysfs entry corresponding to ring
buffers for a channel
+ */
+int hv_create_ring_sysfs(struct vmbus_channel *channel,
+ int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
+ struct vm_area_struct *vma))
+{
+ struct kobject *kobj = &channel->kobj;
+
+ channel->mmap_ring_buffer = hv_mmap_ring_buffer;
+ channel->ring_sysfs_visible = true;
+ return sysfs_update_group(kobj, &vmbus_chan_group);
+}
+EXPORT_SYMBOL_GPL(hv_create_ring_sysfs);
+
+/*
+ * hv_remove_ring_sysfs - remove ring sysfs entry corresponding to ring
buffers for a channel
+ */
+int hv_remove_ring_sysfs(struct vmbus_channel *channel)
+{
+ struct kobject *kobj = &channel->kobj;
+ int ret;
+
+ channel->ring_sysfs_visible = false;
+ ret = sysfs_update_group(kobj, &vmbus_chan_group);
+ channel->mmap_ring_buffer = NULL;
+ return ret;
+}
+EXPORT_SYMBOL_GPL(hv_remove_ring_sysfs);
+
/*
* vmbus_add_channel_kobj - setup a sub-directory under device/channels
*/
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 1b19b5647495..a398e93ed382 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -131,15 +131,12 @@ static void hv_uio_rescind(struct vmbus_channel
*channel)
vmbus_device_unregister(channel->device_obj);
}
-/* Sysfs API to allow mmap of the ring buffers
+/* Function used for mmap of ring buffer sysfs interface.
* The ring buffer is allocated as contiguous memory by vmbus_open
*/
-static int hv_uio_ring_mmap(struct file *filp, struct kobject *kobj,
- const struct bin_attribute *attr,
- struct vm_area_struct *vma)
+static int
+hv_uio_ring_mmap(struct vmbus_channel *channel, struct vm_area_struct *vma)
{
- struct vmbus_channel *channel
- = container_of(kobj, struct vmbus_channel, kobj);
void *ring_buffer = page_address(channel->ringbuffer_page);
if (channel->state != CHANNEL_OPENED_STATE)
@@ -149,15 +146,6 @@ static int hv_uio_ring_mmap(struct file *filp,
struct kobject *kobj,
channel->ringbuffer_pagecount << PAGE_SHIFT);
}
-static const struct bin_attribute ring_buffer_bin_attr = {
- .attr = {
- .name = "ring",
- .mode = 0600,
- },
- .size = 2 * SZ_2M,
- .mmap = hv_uio_ring_mmap,
-};
-
/* Callback from VMBUS subsystem when new channel created. */
static void
hv_uio_new_channel(struct vmbus_channel *new_sc)
@@ -178,8 +166,7 @@ hv_uio_new_channel(struct vmbus_channel *new_sc)
/* Disable interrupts on sub channel */
new_sc->inbound.ring_buffer->interrupt_mask = 1;
set_channel_read_mode(new_sc, HV_CALL_ISR);
-
- ret = sysfs_create_bin_file(&new_sc->kobj, &ring_buffer_bin_attr);
+ ret = hv_create_ring_sysfs(new_sc, hv_uio_ring_mmap);
if (ret) {
dev_err(device, "sysfs create ring bin file failed; %d\n", ret);
vmbus_close(new_sc);
@@ -350,10 +337,12 @@ hv_uio_probe(struct hv_device *dev,
goto fail_close;
}
- ret = sysfs_create_bin_file(&channel->kobj, &ring_buffer_bin_attr);
- if (ret)
- dev_notice(&dev->device,
- "sysfs create ring bin file failed; %d\n", ret);
+ /*
+ * This internally calls sysfs_update_group, which returns a non-zero
value if it executes
+ * before sysfs_create_group. This is a false alarm from this use case
point of view and
+ * thus, no need to check the return value and print warning.
+ */
+ hv_create_ring_sysfs(channel, hv_uio_ring_mmap);
hv_set_drvdata(dev, pdata);
@@ -375,7 +364,7 @@ hv_uio_remove(struct hv_device *dev)
if (!pdata)
return;
- sysfs_remove_bin_file(&dev->channel->kobj, &ring_buffer_bin_attr);
+ hv_remove_ring_sysfs(dev->channel);
uio_unregister_device(&pdata->info);
hv_uio_cleanup(dev, pdata);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 7f4f8d8bdf43..f93e489eca62 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1058,6 +1058,12 @@ struct vmbus_channel {
/* The max size of a packet on this channel */
u32 max_pkt_size;
+
+ /* function to mmap ring buffer memory to the channel's sysfs ring
attribute */
+ int (*mmap_ring_buffer)(struct vmbus_channel *channel, struct
vm_area_struct *vma);
+
+ /* boolean to control visibility of sysfs for ring buffer */
+ bool ring_sysfs_visible;
};
#define lock_requestor(channel, flags) \
--
Regards,
Naman
^ permalink raw reply related [flat|nested] 12+ messages in thread
* RE: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
2025-03-11 4:45 ` Naman Jain
@ 2025-03-11 17:48 ` Dexuan Cui
0 siblings, 0 replies; 12+ messages in thread
From: Dexuan Cui @ 2025-03-11 17:48 UTC (permalink / raw)
To: Naman Jain, Greg Kroah-Hartman
Cc: KY Srinivasan, Haiyang Zhang, Wei Liu, Stephen Hemminger,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
stable@kernel.org, Saurabh Sengar, Michael Kelley, Long Li
> From: Naman Jain <namjain@linux.microsoft.com>
> Sent: Monday, March 10, 2025 9:45 PM
> > [...]
> > Hi Greg,
> > I understand this is deviating from the discussions that we have had
> > till now, but I wanted to kindly request your opinion on the following
> > approach, which came up in one of our internal discussions.
> >
> > By moving the sysfs creation logic from hv_uio_probe to hv_uio_open, I
> > believe we can address this problem. Here are some of the benefits of
> > this approach:
> >
> > * This approach involves minimal changes and provides a localized
> > solution.
I prefer the "Approach 1" below, which requires only 1/10 of the changes
of "Approach 2".
> > * Since the use-case of ring sysfs is specific to uio_hv_generic and
> > DPDK, this will give us the flexibility to modify it based on
> > requirements. For example, ring_buffer_bin_attr.size should depend on
> > the ring buffer's allocated size, which is easier to manage if the
> > current code resides in uio_hv_generic.
The 'ring' sysfs file is used by DPDK, the user-mode driver for fcopy
(tools/hv/hv_fcopy_uio_daemon.c) and other out-of-tree user-mode drivers.
Before the hv_uio_generic driver and the user-mode driver load, the
hv_vmbus driver doesn't know the correct size of the 'ring'; currently the
patch of "Approach 2" hardcodes ring_buffer_bin_attr.size to 4MB, which
is incorrect for the Fcopy VMBus device and other VMBus devices.
"Approach 1" also uses a hardcoded 4MB, but it can't be easily fixed by
adding one line in hv_uio_probe(). With "Approach 2", the fix would be
difficult as we would need to pass one more param 'size' from the driver
hv_uio_generic to hv_vmbus.
The patch of "Approach 2" already passes two params from hv_uio_generic
to hv_vmbus: 'ring_sysfs_visible' and 'mmap_ring_buffer', and adds and
exports 2 APIs to hv_uio_generic.
With Approach 1, we can avoid all the trouble.
> > * The use-case of DPDK is such that at any given time, either the
> > hv_netvsc kernel driver or the userspace (DPDK) will be controlling this
> > HV_NIC device. We do not want to expose the ring buffer to userspace
> > when hv_netvsc is using the device. This is where the "awareness" of the
> > current user comes into play, and we need a way to control the
> > visibility of the "ring" sysfs from uio_hv_generic.
> >
> >
> > Alternatively, I could focus on resolving the race condition you
> > mentioned and proceed with refining the patch. This approach addresses
> > most of the general practice concerns you highlighted.
> >
> > Regards,
> > Naman
>
> Hello Greg,
> Here are the two approaches that we discussed.
> Can you please suggest the approach which looks better to you
> for next version.
>
> **Approach 1: move sysfs creation to hv_uio_open**
> [...]
> **Approach 2: Move sysfs creation to hyperv drivers**
In general, indeed we would want to avoid manipulating sysfs and kobj directly,
but here IMO calling sysfs_create_bin_file() in hv_uio_generic is reasonable as
hv_uio_generic is the only user of the 'ring' sysfs file, and it has more info about
the 'ring' (i.e. the correct size; when the 'ring' is used); I prefer "Approach 1"
since the patch is much smaller and cleaner.
Greg, can we have your opinions?
Thanks,
Dexuan
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-03-11 17:48 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-25 5:20 [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer Naman Jain
2025-02-25 6:12 ` Greg Kroah-Hartman
2025-02-25 8:34 ` Naman Jain
2025-02-25 8:39 ` Greg Kroah-Hartman
2025-02-26 5:13 ` Naman Jain
2025-02-26 10:03 ` Greg Kroah-Hartman
2025-02-26 12:21 ` Naman Jain
2025-02-26 14:33 ` Greg Kroah-Hartman
2025-02-27 6:24 ` Naman Jain
2025-03-05 7:06 ` Naman Jain
2025-03-11 4:45 ` Naman Jain
2025-03-11 17:48 ` Dexuan Cui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).