From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4C5D815C0; Wed, 26 Feb 2025 12:21:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740572512; cv=none; b=aAh59YFPrjr4OJlWZMQl9E46TVZMH8s0nogCztlTL4WrI/2RMgEY876I3QW3Ci5/Jd7zHJ7mGyX7PrUR+o9SDWX7zCTCn0A4dExC7+8wcru7EOhnLMyZ5avEYSIs7IXzoVTRJyfaDQTJTkAmBhVQUJkZlmGtBwYHUoAV1Y2k6c4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740572512; c=relaxed/simple; bh=XFlpru4HT2cDn6bnGT2IlGUhhMyYfvHdQwQ8YIK93nk=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=c/ypageHUeuLeZxnd8Kie7HX+BK+WhWCqw+V2fNWEe3TX+AVVlX4dkE78Icxxem7PoKiMFL6z94AXYZI1dL8V3MWwgP49xo+OrcNUaVqnMX4dHAQ4n5EPA9cM0FXMqD0/JIMgrg/tTbZGG5e17V6nPIvmhhbTGfHRxficTwAEpU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=YBzXRM7w; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="YBzXRM7w" Received: from [100.79.160.79] (unknown [4.194.122.144]) by linux.microsoft.com (Postfix) with ESMTPSA id CF81B206ADE3; Wed, 26 Feb 2025 04:21:47 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com CF81B206ADE3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1740572510; bh=0qk0lG6fEudYnLnAjNAk9vu7Jq0ZrZ0mFb8apziXmoc=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=YBzXRM7wseDdFCtzdrp9U4jUxqxjtigZIQlX4nWqMU7JX8EBqOl5UPlekHEpV88sf CZNiGVSupwBWUMiH9XtbKGuRdI+N2brkESsyrh6eBG7hI3RFqML9QvFGkkb5cvrJT1 edX0Aof5ijszSvf6WlEFgVEtasosU8D60QXVVoek= Message-ID: <0a694947-809d-48b2-9138-d3f6175fe09d@linux.microsoft.com> Date: Wed, 26 Feb 2025 17:51:46 +0530 Precedence: bulk X-Mailing-List: linux-hyperv@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer To: Greg Kroah-Hartman Cc: "K . Y . Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Stephen Hemminger , linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, stable@kernel.org, Saurabh Sengar , Michael Kelley , Long Li References: <20250225052001.2225-1-namjain@linux.microsoft.com> <2025022504-diagnosis-outsell-684c@gregkh> <9ee65987-4353-42c6-b517-d6f52428f718@linux.microsoft.com> <2025022515-lasso-carrot-4e1d@gregkh> <541c63d6-8ae6-4a32-8a02-d86eea64827e@linux.microsoft.com> <2025022627-deflate-pliable-6da0@gregkh> Content-Language: en-US From: Naman Jain In-Reply-To: <2025022627-deflate-pliable-6da0@gregkh> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote: > On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote: >> >> >> On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote: >>> On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote: >>>> >>>> >>>> On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote: >>>>> On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote: >>>>>> On regular bootup, devices get registered to vmbus first, so when >>>>>> uio_hv_generic driver for a particular device type is probed, >>>>>> the device is already initialized and added, so sysfs creation in >>>>>> uio_hv_generic probe works fine. However, when device is removed >>>>>> and brought back, the channel rescinds and device again gets >>>>>> registered to vmbus. However this time, the uio_hv_generic driver is >>>>>> already registered to probe for that device and in this case sysfs >>>>>> creation is tried before the device gets initialized completely. >>>>>> >>>>>> Fix this by moving the core logic of sysfs creation for ring buffer, >>>>>> from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs >>>>>> attributes for the channels are defined. While doing that, make use >>>>>> of attribute groups and macros, instead of creating sysfs directly, >>>>>> to ensure better error handling and code flow. >>>>>> +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj, >>>>>> + const struct bin_attribute *attr, >>>>>> + struct vm_area_struct *vma) >>>>>> +{ >>>>>> + struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj); >>>>>> + >>>>>> + if (!channel->mmap_ring_buffer) >>>>>> + return -ENODEV; >>>>>> + return channel->mmap_ring_buffer(channel, vma); >>>>> >>>>> What is preventing mmap_ring_buffer from being set to NULL right after >>>>> checking it and then calling it here? I see no locks here or where you >>>>> are assigning this variable at all, so what is preventing these types of >>>>> races? >>>>> >>>>> thanks, >>>>> >>>>> greg k-h >>>> >>>> Thank you so much for reviewing. >>>> I spent some time to understand if this race condition can happen and it >>>> seems execution flow is pretty sequential, for a particular channel of a >>>> device. >>>> >>>> Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL) can be >>>> called in parallel to hv_uio_probe (which had set >>>> channel->mmap_ring_buffer to non NULL), I doubt race can happen here. >>>> >>>> Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer) >>>> >>>> vmbus_device_register >>>> device_register >>>> hv_uio_probe >>>> hv_create_ring_sysfs (W to non NULL) >>>> sysfs_update_group >>>> vmbus_chan_attr_is_visible (R) >>>> vmbus_add_channel_kobj >>>> sysfs_create_group >>>> vmbus_chan_attr_is_visible (R) >>>> hv_mmap_ring_buffer_wrapper (critical section) >>>> >>>> hv_uio_remove >>>> hv_remove_ring_sysfs (W to NULL) >>> >>> Yes, and right in here someone mmaps the file. >>> >>> I think you can race here, no locks at all feels wrong. >>> >>> Messing with sysfs groups and files like this is rough, and almost never >>> a good idea, why can't you just do this all at once with the default >>> groups, why is this being added/removed out-of-band? >>> >>> thanks, >>> >>> greg k-h >> >> The decision to avoid creating a "ring" sysfs attribute by default >> likely stems from a specific use case where it wasn't needed for every >> device. By creating it automatically, it keeps the uio_hv_generic >> driver simpler and helps prevent potential race conditions. However, it >> has an added cost of having ring buffer for all the channels, where it >> is not required. I am trying to find if there are any more implications >> of it. > > You do know about the "is_visable" attribute callback, right? Why not > just use that instead of manually mucking around with the > adding/removing of sysfs attributes at all? That is what it was > designed for. > > thanks, > > greg k-h Hi Greg, Yes, I am utilizing that in my patch. For differentiating channels of a uio_hv_generic device, and for *selectively* creating sysfs, we introduced this field in channel struct "channel->mmap_ring_buffer", which we were setting only in uio_hv_generic. But, by the time we set this in uio_hv_generic driver, the sysfs creation has already gone through and sysfs doesn't get updated dynamically. That's where there was a need to call sysfs_update_group. I thought the better place to keep sysfs_update_group would be in vmbus driver, where we are creating the original sysfs entries, hence I had to add the wrapper functions. This led us to the race condition we are trying to address now. @@ -1838,6 +1872,10 @@ static umode_t vmbus_chan_attr_is_visible(struct kobject *kobj, attr == &chan_attr_monitor_id.attr)) return 0; + /* Hide ring attribute if channel's mmap_ring_buffer function is not yet initialised */ + if (attr == &chan_attr_ring_buffer.attr && !channel->mmap_ring_buffer) + return 0; + return attr->mode; } Thanks, Naman