Linux cryptographic layer development
 help / color / mirror / Atom feed
* [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management
@ 2025-09-16 14:48 Chenghai Huang
  2025-09-16 14:48 ` [PATCH v2 1/4] uacce: fix for cdev memory leak Chenghai Huang
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Chenghai Huang @ 2025-09-16 14:48 UTC (permalink / raw)
  To: gregkh, zhangfei.gao, wangzhou1
  Cc: linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39,
	liulongfang, qianweili

This patch series addresses several issues in the uacce:
1.Memory leak fix when device registration fails.
2.Fix sysfs file creation conditions.
3.Add error reporting for unsupported mremap operations.
4.Ensuring safe queue release with proper state management.

---
Changes in v2:
- Use cdev_init to allocate cdev memory to ensure that memory leaks
  are avoided.
- Supplement the reason for intercepting the remapping operation.
- Add "cc: stable@vger.kernel.org" to paths with fixed.
- Link to v2: https://lore.kernel.org/all/20250822103904.3776304-1-huangchenghai2@huawei.com/

Chenghai Huang (2):
  uacce: fix isolate sysfs check condition
  uacce: ensure safe queue release with state management

Wenkai Lin (1):
  uacce: fix for cdev memory leak

Yang Shen (1):
  uacce: implement mremap in uacce_vm_ops to return -EPERM

 drivers/misc/uacce/uacce.c | 49 +++++++++++++++++++++++++-------------
 include/linux/uacce.h      |  2 +-
 2 files changed, 33 insertions(+), 18 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/4] uacce: fix for cdev memory leak
  2025-09-16 14:48 [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management Chenghai Huang
@ 2025-09-16 14:48 ` Chenghai Huang
  2025-09-16 15:14   ` Greg KH
  2025-09-16 14:48 ` [PATCH v2 2/4] uacce: fix isolate sysfs check condition Chenghai Huang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Chenghai Huang @ 2025-09-16 14:48 UTC (permalink / raw)
  To: gregkh, zhangfei.gao, wangzhou1
  Cc: linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39,
	liulongfang, qianweili

From: Wenkai Lin <linwenkai6@hisilicon.com>

If cdev_device_add failed, it is hard to determine
whether cdev_del has been executed, which lead to a
memory leak issue, so we use cdev_init to avoid it.

Fixes: 015d239ac014 ("uacce: add uacce driver")
Cc: stable@vger.kernel.org
Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
---
 drivers/misc/uacce/uacce.c | 13 ++++---------
 include/linux/uacce.h      |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 42e7d2a2a90c..12370469f646 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce)
 	if (!uacce)
 		return -ENODEV;
 
-	uacce->cdev = cdev_alloc();
-	if (!uacce->cdev)
-		return -ENOMEM;
-
-	uacce->cdev->ops = &uacce_fops;
-	uacce->cdev->owner = THIS_MODULE;
+	cdev_init(&uacce->cdev, &uacce_fops);
+	uacce->cdev.owner = THIS_MODULE;
 
-	return cdev_device_add(uacce->cdev, &uacce->dev);
+	return cdev_device_add(&uacce->cdev, &uacce->dev);
 }
 EXPORT_SYMBOL_GPL(uacce_register);
 
@@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce)
 		unmap_mapping_range(q->mapping, 0, 0, 1);
 	}
 
-	if (uacce->cdev)
-		cdev_device_del(uacce->cdev, &uacce->dev);
+	cdev_device_del(&uacce->cdev, &uacce->dev);
 	xa_erase(&uacce_xa, uacce->dev_id);
 	/*
 	 * uacce exists as long as there are open fds, but ops will be freed
diff --git a/include/linux/uacce.h b/include/linux/uacce.h
index e290c0269944..98b896192a44 100644
--- a/include/linux/uacce.h
+++ b/include/linux/uacce.h
@@ -126,7 +126,7 @@ struct uacce_device {
 	bool is_vf;
 	u32 flags;
 	u32 dev_id;
-	struct cdev *cdev;
+	struct cdev cdev;
 	struct device dev;
 	struct mutex mutex;
 	void *priv;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/4] uacce: fix isolate sysfs check condition
  2025-09-16 14:48 [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management Chenghai Huang
  2025-09-16 14:48 ` [PATCH v2 1/4] uacce: fix for cdev memory leak Chenghai Huang
@ 2025-09-16 14:48 ` Chenghai Huang
  2025-09-16 15:15   ` Greg KH
  2025-09-16 14:48 ` [PATCH v2 3/4] uacce: implement mremap in uacce_vm_ops to return -EPERM Chenghai Huang
  2025-09-16 14:48 ` [PATCH v2 4/4] uacce: ensure safe queue release with state management Chenghai Huang
  3 siblings, 1 reply; 12+ messages in thread
From: Chenghai Huang @ 2025-09-16 14:48 UTC (permalink / raw)
  To: gregkh, zhangfei.gao, wangzhou1
  Cc: linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39,
	liulongfang, qianweili

The uacce supports device isolation feature. If the driver
implements the isolate_err_threshold_read and
isolate_err_threshold_write callbacks, the uacce will create sysfs
files. Users can read and configure isolation policies through
sysfs. Currently, if either isolate_err_threshold_read or
isolate_err_threshold_write callback exists, sysfs files are
created.

However, accessing a non-existent callback may cause a system panic.
Therefore, sysfs files are only created when both
isolate_err_threshold_read and isolate_err_threshold_write are
present.

Fixes: e3e289fbc0b5 ("uacce: supports device isolation feature")
Cc: stable@vger.kernel.org
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
---
 drivers/misc/uacce/uacce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 12370469f646..770a931ef68d 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -441,7 +441,7 @@ static umode_t uacce_dev_is_visible(struct kobject *kobj,
 		return 0;
 
 	if (attr == &dev_attr_isolate_strategy.attr &&
-	    (!uacce->ops->isolate_err_threshold_read &&
+	    (!uacce->ops->isolate_err_threshold_read ||
 	     !uacce->ops->isolate_err_threshold_write))
 		return 0;
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/4] uacce: implement mremap in uacce_vm_ops to return -EPERM
  2025-09-16 14:48 [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management Chenghai Huang
  2025-09-16 14:48 ` [PATCH v2 1/4] uacce: fix for cdev memory leak Chenghai Huang
  2025-09-16 14:48 ` [PATCH v2 2/4] uacce: fix isolate sysfs check condition Chenghai Huang
@ 2025-09-16 14:48 ` Chenghai Huang
  2025-09-16 14:48 ` [PATCH v2 4/4] uacce: ensure safe queue release with state management Chenghai Huang
  3 siblings, 0 replies; 12+ messages in thread
From: Chenghai Huang @ 2025-09-16 14:48 UTC (permalink / raw)
  To: gregkh, zhangfei.gao, wangzhou1
  Cc: linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39,
	liulongfang, qianweili

From: Yang Shen <shenyang39@huawei.com>

The current uacce_vm_ops does not support the mremap operation of
vm_operations_struct. Implement .mremap to return -EPERM to remind
users.

The reason we need to explicitly disable mremap is that when the
driver does not implement .mremap, it uses the default mremap
method. This could lead to a risk scenario:

An application might first mmap address p1, then mremap to p2,
followed by munmap(p1), and finally munmap(p2). Since the default
mremap copies the original vma's vm_private_data (i.e., q) to the
new vma, both munmap operations would trigger vma_close, causing
q->qfr to be freed twice(qfr will be set to null here, so repeated
release is ok).

Fixes: 015d239ac014 ("uacce: add uacce driver")
Cc: stable@vger.kernel.org
Signed-off-by: Yang Shen <shenyang39@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
---
 drivers/misc/uacce/uacce.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 770a931ef68d..dda71492874d 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -214,8 +214,14 @@ static void uacce_vma_close(struct vm_area_struct *vma)
 	}
 }
 
+static int uacce_vma_mremap(struct vm_area_struct *area)
+{
+	return -EPERM;
+}
+
 static const struct vm_operations_struct uacce_vm_ops = {
 	.close = uacce_vma_close,
+	.mremap = uacce_vma_mremap,
 };
 
 static int uacce_fops_mmap(struct file *filep, struct vm_area_struct *vma)
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/4] uacce: ensure safe queue release with state management
  2025-09-16 14:48 [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management Chenghai Huang
                   ` (2 preceding siblings ...)
  2025-09-16 14:48 ` [PATCH v2 3/4] uacce: implement mremap in uacce_vm_ops to return -EPERM Chenghai Huang
@ 2025-09-16 14:48 ` Chenghai Huang
  3 siblings, 0 replies; 12+ messages in thread
From: Chenghai Huang @ 2025-09-16 14:48 UTC (permalink / raw)
  To: gregkh, zhangfei.gao, wangzhou1
  Cc: linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39,
	liulongfang, qianweili

Directly calling `put_queue` carries risks since it cannot
guarantee that resources of `uacce_queue` have been fully released
beforehand. So adding a `stop_queue` operation for the
UACCE_CMD_PUT_Q command and leaving the `put_queue` operation to
the final resource release ensures safety.

Queue states are defined as follows:
- UACCE_Q_ZOMBIE: Initial state
- UACCE_Q_INIT: After opening `uacce`
- UACCE_Q_STARTED: After `start` is issued via `ioctl`

When executing `poweroff -f` in virt while accelerator are still
working, `uacce_fops_release` and `uacce_remove` may execute
concurrently. This can cause `uacce_put_queue` within
`uacce_fops_release` to access a NULL `ops` pointer. Therefore, add
state checks to prevent accessing freed pointers.

Fixes: 015d239ac014 ("uacce: add uacce driver")
Cc: stable@vger.kernel.org
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Yang Shen <shenyang39@huawei.com>
---
 drivers/misc/uacce/uacce.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index dda71492874d..ff9c515ac289 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -40,20 +40,34 @@ static int uacce_start_queue(struct uacce_queue *q)
 	return 0;
 }
 
-static int uacce_put_queue(struct uacce_queue *q)
+static int uacce_stop_queue(struct uacce_queue *q)
 {
 	struct uacce_device *uacce = q->uacce;
 
-	if ((q->state == UACCE_Q_STARTED) && uacce->ops->stop_queue)
+	if (q->state != UACCE_Q_STARTED)
+		return 0;
+
+	if (uacce->ops->stop_queue)
 		uacce->ops->stop_queue(q);
 
-	if ((q->state == UACCE_Q_INIT || q->state == UACCE_Q_STARTED) &&
-	     uacce->ops->put_queue)
+	q->state = UACCE_Q_INIT;
+
+	return 0;
+}
+
+static void uacce_put_queue(struct uacce_queue *q)
+{
+	struct uacce_device *uacce = q->uacce;
+
+	uacce_stop_queue(q);
+
+	if (q->state != UACCE_Q_INIT)
+		return;
+
+	if (uacce->ops->put_queue)
 		uacce->ops->put_queue(q);
 
 	q->state = UACCE_Q_ZOMBIE;
-
-	return 0;
 }
 
 static long uacce_fops_unl_ioctl(struct file *filep,
@@ -80,7 +94,7 @@ static long uacce_fops_unl_ioctl(struct file *filep,
 		ret = uacce_start_queue(q);
 		break;
 	case UACCE_CMD_PUT_Q:
-		ret = uacce_put_queue(q);
+		ret = uacce_stop_queue(q);
 		break;
 	default:
 		if (uacce->ops->ioctl)
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/4] uacce: fix for cdev memory leak
  2025-09-16 14:48 ` [PATCH v2 1/4] uacce: fix for cdev memory leak Chenghai Huang
@ 2025-09-16 15:14   ` Greg KH
  2025-09-17  9:56     ` huangchenghai
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2025-09-16 15:14 UTC (permalink / raw)
  To: Chenghai Huang
  Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm,
	fanghao11, shenyang39, liulongfang, qianweili

On Tue, Sep 16, 2025 at 10:48:08PM +0800, Chenghai Huang wrote:
> From: Wenkai Lin <linwenkai6@hisilicon.com>
> 
> If cdev_device_add failed, it is hard to determine
> whether cdev_del has been executed, which lead to a
> memory leak issue, so we use cdev_init to avoid it.

I do not understand, what is wrong with the current code?  It checks if
add fails:

> 
> Fixes: 015d239ac014 ("uacce: add uacce driver")
> Cc: stable@vger.kernel.org
> Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
> ---
>  drivers/misc/uacce/uacce.c | 13 ++++---------
>  include/linux/uacce.h      |  2 +-
>  2 files changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
> index 42e7d2a2a90c..12370469f646 100644
> --- a/drivers/misc/uacce/uacce.c
> +++ b/drivers/misc/uacce/uacce.c
> @@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce)
>  	if (!uacce)
>  		return -ENODEV;
>  
> -	uacce->cdev = cdev_alloc();
> -	if (!uacce->cdev)
> -		return -ENOMEM;

This is the check.


> -
> -	uacce->cdev->ops = &uacce_fops;
> -	uacce->cdev->owner = THIS_MODULE;
> +	cdev_init(&uacce->cdev, &uacce_fops);
> +	uacce->cdev.owner = THIS_MODULE;
>  
> -	return cdev_device_add(uacce->cdev, &uacce->dev);
> +	return cdev_device_add(&uacce->cdev, &uacce->dev);

And so is this.  So what is wrong here?


>  }
>  EXPORT_SYMBOL_GPL(uacce_register);
>  
> @@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce)
>  		unmap_mapping_range(q->mapping, 0, 0, 1);
>  	}
>  
> -	if (uacce->cdev)
> -		cdev_device_del(uacce->cdev, &uacce->dev);
> +	cdev_device_del(&uacce->cdev, &uacce->dev);
>  	xa_erase(&uacce_xa, uacce->dev_id);
>  	/*
>  	 * uacce exists as long as there are open fds, but ops will be freed
> diff --git a/include/linux/uacce.h b/include/linux/uacce.h
> index e290c0269944..98b896192a44 100644
> --- a/include/linux/uacce.h
> +++ b/include/linux/uacce.h
> @@ -126,7 +126,7 @@ struct uacce_device {
>  	bool is_vf;
>  	u32 flags;
>  	u32 dev_id;
> -	struct cdev *cdev;
> +	struct cdev cdev;
>  	struct device dev;

You can not do this, you now have 2 different reference counts
controlling the lifespan of this one structure.  That is just going to
cause so many more bugs...

How was this tested?  What is currently failing that requires this
change?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/4] uacce: fix isolate sysfs check condition
  2025-09-16 14:48 ` [PATCH v2 2/4] uacce: fix isolate sysfs check condition Chenghai Huang
@ 2025-09-16 15:15   ` Greg KH
  2025-09-17  9:54     ` huangchenghai
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2025-09-16 15:15 UTC (permalink / raw)
  To: Chenghai Huang
  Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm,
	fanghao11, shenyang39, liulongfang, qianweili

On Tue, Sep 16, 2025 at 10:48:09PM +0800, Chenghai Huang wrote:
> The uacce supports device isolation feature. If the driver
> implements the isolate_err_threshold_read and
> isolate_err_threshold_write callbacks, the uacce will create sysfs
> files. Users can read and configure isolation policies through
> sysfs. Currently, if either isolate_err_threshold_read or
> isolate_err_threshold_write callback exists, sysfs files are
> created.
> 
> However, accessing a non-existent callback may cause a system panic.

Where is the callback happening that fails?  Shouldn't that be checked
instead of doing this change?

> Therefore, sysfs files are only created when both
> isolate_err_threshold_read and isolate_err_threshold_write are
> present.

What if a device only has 1?  That should still work properly?

And why not just create the file if it is going to be used, that is the
real solution here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/4] uacce: fix isolate sysfs check condition
  2025-09-16 15:15   ` Greg KH
@ 2025-09-17  9:54     ` huangchenghai
  0 siblings, 0 replies; 12+ messages in thread
From: huangchenghai @ 2025-09-17  9:54 UTC (permalink / raw)
  To: Greg KH
  Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm,
	fanghao11, shenyang39, liulongfang, qianweili


On Mon, Sep 16, 2025 at 11:15 PM +0800, Greg KH wrote:
> On Tue, Sep 16, 2025 at 10:48:09PM +0800, Chenghai Huang wrote:
>> The uacce supports device isolation feature. If the driver
>> implements the isolate_err_threshold_read and
>> isolate_err_threshold_write callbacks, the uacce will create sysfs
>> files. Users can read and configure isolation policies through
>> sysfs. Currently, if either isolate_err_threshold_read or
>> isolate_err_threshold_write callback exists, sysfs files are
>> created.
>>
>> However, accessing a non-existent callback may cause a system panic.
> Where is the callback happening that fails?  Shouldn't that be checked
> instead of doing this change?
>
>> Therefore, sysfs files are only created when both
>> isolate_err_threshold_read and isolate_err_threshold_write are
>> present.
> What if a device only has 1?  That should still work properly?
>
> And why not just create the file if it is going to be used, that is the
> real solution here.
>
> thanks,
>
> greg k-h
Thank you for your feedback.I agree that the check should be done in the 
corresponding `isolate_strategy_show()` and `isolate_strategy_store()` 
functions.

How about the updated:

@@ -402,6 +402,9 @@ static ssize_t isolate_strategy_show(struct device 
*dev, struct device_attribute
         struct uacce_device *uacce = to_uacce_device(dev);
         u32 val;

+       if (!uacce->ops->isolate_err_threshold_read)
+               return -ENOENT;
+
         val = uacce->ops->isolate_err_threshold_read(uacce);

         return sysfs_emit(buf, "%u\n", val);
@@ -414,6 +417,9 @@ static ssize_t isolate_strategy_store(struct device 
*dev, struct device_attribut
         unsigned long val;
         int ret;

+       if (!uacce->ops->isolate_err_threshold_write)
+               return -ENOENT;
+
         if (kstrtoul(buf, 0, &val) < 0)
                 return -EINVAL;

@@ -460,9 +466,7 @@ static umode_t uacce_dev_is_visible(struct kobject 
*kobj,
             (!uacce->qf_pg_num[UACCE_QFRT_DUS])))
                 return 0;

-       if (attr == &dev_attr_isolate_strategy.attr &&
-           (!uacce->ops->isolate_err_threshold_read ||
-            !uacce->ops->isolate_err_threshold_write))
+       if (attr == &dev_attr_isolate_strategy.attr)
                 return 0;

This way, the sysfs files will only be created if they are going to be 
used, and the checks are done at the appropriate places.

Thanks,
Chenghai

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/4] uacce: fix for cdev memory leak
  2025-09-16 15:14   ` Greg KH
@ 2025-09-17  9:56     ` huangchenghai
  2025-09-17 10:18       ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: huangchenghai @ 2025-09-17  9:56 UTC (permalink / raw)
  To: Greg KH
  Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm,
	fanghao11, shenyang39, liulongfang, qianweili


On Mon, Sep 16, 2025 at 11:15 PM +0800, Greg KH wrote:
> On Tue, Sep 16, 2025 at 10:48:08PM +0800, Chenghai Huang wrote:
>> From: Wenkai Lin <linwenkai6@hisilicon.com>
>>
>> If cdev_device_add failed, it is hard to determine
>> whether cdev_del has been executed, which lead to a
>> memory leak issue, so we use cdev_init to avoid it.
> I do not understand, what is wrong with the current code?  It checks if
> add fails:
>
>> Fixes: 015d239ac014 ("uacce: add uacce driver")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
>> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
>> ---
>>   drivers/misc/uacce/uacce.c | 13 ++++---------
>>   include/linux/uacce.h      |  2 +-
>>   2 files changed, 5 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
>> index 42e7d2a2a90c..12370469f646 100644
>> --- a/drivers/misc/uacce/uacce.c
>> +++ b/drivers/misc/uacce/uacce.c
>> @@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce)
>>   	if (!uacce)
>>   		return -ENODEV;
>>   
>> -	uacce->cdev = cdev_alloc();
>> -	if (!uacce->cdev)
>> -		return -ENOMEM;
> This is the check.
>
>
>> -
>> -	uacce->cdev->ops = &uacce_fops;
>> -	uacce->cdev->owner = THIS_MODULE;
>> +	cdev_init(&uacce->cdev, &uacce_fops);
>> +	uacce->cdev.owner = THIS_MODULE;
>>   
>> -	return cdev_device_add(uacce->cdev, &uacce->dev);
>> +	return cdev_device_add(&uacce->cdev, &uacce->dev);
> And so is this.  So what is wrong here?
>
>
>>   }
>>   EXPORT_SYMBOL_GPL(uacce_register);
>>   
>> @@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce)
>>   		unmap_mapping_range(q->mapping, 0, 0, 1);
>>   	}
>>   
>> -	if (uacce->cdev)
>> -		cdev_device_del(uacce->cdev, &uacce->dev);
>> +	cdev_device_del(&uacce->cdev, &uacce->dev);
>>   	xa_erase(&uacce_xa, uacce->dev_id);
>>   	/*
>>   	 * uacce exists as long as there are open fds, but ops will be freed
>> diff --git a/include/linux/uacce.h b/include/linux/uacce.h
>> index e290c0269944..98b896192a44 100644
>> --- a/include/linux/uacce.h
>> +++ b/include/linux/uacce.h
>> @@ -126,7 +126,7 @@ struct uacce_device {
>>   	bool is_vf;
>>   	u32 flags;
>>   	u32 dev_id;
>> -	struct cdev *cdev;
>> +	struct cdev cdev;
>>   	struct device dev;
> You can not do this, you now have 2 different reference counts
> controlling the lifespan of this one structure.  That is just going to
> cause so many more bugs...
>
> How was this tested?  What is currently failing that requires this
> change?
>
> thanks,
>
> greg k-h
We analyze it theoretically there may be a memory leak
issue here, if the cdev_device_add returns a failure,
the uacce_remove will not be executed, which results in the
uacce cdev memory not being released.
Therefore, we have decided to align with the design of other
drivers by making cdev a static member of uacce_device and
releasing the memory through uacce_device.

found one example in drivers/watchdog/watchdog_dev.h.
struct watchdog_core_data {
     struct device dev;
     struct cdev cdev;
     struct watchdog_device *wdd;
     struct mutex lock;
     ktime_t last_keepalive;
     ktime_t last_hw_keepalive;
     ktime_t open_deadline;
     ...
};

static int watchdog_cdev_register(struct watchdog_device *wdd)
{
     struct watchdog_core_data *wd_data;
     int err;
     ...
     cdev_init(&wd_data->cdev, &watchdog_fops);
     wd_data->cdev.owner = wdd->ops->owner;

     /* Add the device */
     err = cdev_device_add(&wd_data->cdev, &wd_data->dev);
     ...
}

static void watchdog_cdev_unregister(struct watchdog_device *wdd)
{
     struct watchdog_core_data *wd_data = wdd->wd_data;
     ...
     cdev_device_del(&wd_data->cdev, &wd_data->dev);
     if (wdd->id == 0) {
             misc_deregister(&watchdog_miscdev);
             old_wd_data = NULL;
     }
     ...
}

Thanks,

ChengHai


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/4] uacce: fix for cdev memory leak
  2025-09-17  9:56     ` huangchenghai
@ 2025-09-17 10:18       ` Greg KH
  2025-09-26  8:47         ` huangchenghai
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2025-09-17 10:18 UTC (permalink / raw)
  To: huangchenghai
  Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm,
	fanghao11, shenyang39, liulongfang, qianweili

On Wed, Sep 17, 2025 at 05:56:16PM +0800, huangchenghai wrote:
> 
> On Mon, Sep 16, 2025 at 11:15 PM +0800, Greg KH wrote:
> > On Tue, Sep 16, 2025 at 10:48:08PM +0800, Chenghai Huang wrote:
> > > From: Wenkai Lin <linwenkai6@hisilicon.com>
> > > 
> > > If cdev_device_add failed, it is hard to determine
> > > whether cdev_del has been executed, which lead to a
> > > memory leak issue, so we use cdev_init to avoid it.
> > I do not understand, what is wrong with the current code?  It checks if
> > add fails:
> > 
> > > Fixes: 015d239ac014 ("uacce: add uacce driver")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
> > > Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
> > > ---
> > >   drivers/misc/uacce/uacce.c | 13 ++++---------
> > >   include/linux/uacce.h      |  2 +-
> > >   2 files changed, 5 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
> > > index 42e7d2a2a90c..12370469f646 100644
> > > --- a/drivers/misc/uacce/uacce.c
> > > +++ b/drivers/misc/uacce/uacce.c
> > > @@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce)
> > >   	if (!uacce)
> > >   		return -ENODEV;
> > > -	uacce->cdev = cdev_alloc();
> > > -	if (!uacce->cdev)
> > > -		return -ENOMEM;
> > This is the check.
> > 
> > 
> > > -
> > > -	uacce->cdev->ops = &uacce_fops;
> > > -	uacce->cdev->owner = THIS_MODULE;
> > > +	cdev_init(&uacce->cdev, &uacce_fops);
> > > +	uacce->cdev.owner = THIS_MODULE;
> > > -	return cdev_device_add(uacce->cdev, &uacce->dev);
> > > +	return cdev_device_add(&uacce->cdev, &uacce->dev);
> > And so is this.  So what is wrong here?
> > 
> > 
> > >   }
> > >   EXPORT_SYMBOL_GPL(uacce_register);
> > > @@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce)
> > >   		unmap_mapping_range(q->mapping, 0, 0, 1);
> > >   	}
> > > -	if (uacce->cdev)
> > > -		cdev_device_del(uacce->cdev, &uacce->dev);
> > > +	cdev_device_del(&uacce->cdev, &uacce->dev);
> > >   	xa_erase(&uacce_xa, uacce->dev_id);
> > >   	/*
> > >   	 * uacce exists as long as there are open fds, but ops will be freed
> > > diff --git a/include/linux/uacce.h b/include/linux/uacce.h
> > > index e290c0269944..98b896192a44 100644
> > > --- a/include/linux/uacce.h
> > > +++ b/include/linux/uacce.h
> > > @@ -126,7 +126,7 @@ struct uacce_device {
> > >   	bool is_vf;
> > >   	u32 flags;
> > >   	u32 dev_id;
> > > -	struct cdev *cdev;
> > > +	struct cdev cdev;
> > >   	struct device dev;
> > You can not do this, you now have 2 different reference counts
> > controlling the lifespan of this one structure.  That is just going to
> > cause so many more bugs...
> > 
> > How was this tested?  What is currently failing that requires this
> > change?
> > 
> > thanks,
> > 
> > greg k-h
> We analyze it theoretically there may be a memory leak
> issue here, if the cdev_device_add returns a failure,
> the uacce_remove will not be executed, which results in the
> uacce cdev memory not being released.

Then properly clean up if that happens.

> Therefore, we have decided to align with the design of other
> drivers by making cdev a static member of uacce_device and
> releasing the memory through uacce_device.

But again, this is wrong to do.

> found one example in drivers/watchdog/watchdog_dev.h.
> struct watchdog_core_data {
>     struct device dev;
>     struct cdev cdev;

This is also wrong and needs to be fixed.  Please send a patch to
resolve it as well, as it should not be copied as a valid example.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/4] uacce: fix for cdev memory leak
  2025-09-17 10:18       ` Greg KH
@ 2025-09-26  8:47         ` huangchenghai
  2025-09-26  9:37           ` linwenkai (C)
  0 siblings, 1 reply; 12+ messages in thread
From: huangchenghai @ 2025-09-26  8:47 UTC (permalink / raw)
  To: Greg KH
  Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm,
	fanghao11, shenyang39, liulongfang, qianweili, linwenkai (C)


On Wed, Sep 17, 2025 at 06:18 PM +0800, Greg KH wrote:
> On Wed, Sep 17, 2025 at 05:56:16PM +0800, huangchenghai wrote:
>> On Mon, Sep 16, 2025 at 11:15 PM +0800, Greg KH wrote:
>>> On Tue, Sep 16, 2025 at 10:48:08PM +0800, Chenghai Huang wrote:
>>>> From: Wenkai Lin <linwenkai6@hisilicon.com>
>>>>
>>>> If cdev_device_add failed, it is hard to determine
>>>> whether cdev_del has been executed, which lead to a
>>>> memory leak issue, so we use cdev_init to avoid it.
>>> I do not understand, what is wrong with the current code?  It checks if
>>> add fails:
>>>
>>>> Fixes: 015d239ac014 ("uacce: add uacce driver")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
>>>> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
>>>> ---
>>>>    drivers/misc/uacce/uacce.c | 13 ++++---------
>>>>    include/linux/uacce.h      |  2 +-
>>>>    2 files changed, 5 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
>>>> index 42e7d2a2a90c..12370469f646 100644
>>>> --- a/drivers/misc/uacce/uacce.c
>>>> +++ b/drivers/misc/uacce/uacce.c
>>>> @@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce)
>>>>    	if (!uacce)
>>>>    		return -ENODEV;
>>>> -	uacce->cdev = cdev_alloc();
>>>> -	if (!uacce->cdev)
>>>> -		return -ENOMEM;
>>> This is the check.
>>>
>>>
>>>> -
>>>> -	uacce->cdev->ops = &uacce_fops;
>>>> -	uacce->cdev->owner = THIS_MODULE;
>>>> +	cdev_init(&uacce->cdev, &uacce_fops);
>>>> +	uacce->cdev.owner = THIS_MODULE;
>>>> -	return cdev_device_add(uacce->cdev, &uacce->dev);
>>>> +	return cdev_device_add(&uacce->cdev, &uacce->dev);
>>> And so is this.  So what is wrong here?
>>>
>>>
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(uacce_register);
>>>> @@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce)
>>>>    		unmap_mapping_range(q->mapping, 0, 0, 1);
>>>>    	}
>>>> -	if (uacce->cdev)
>>>> -		cdev_device_del(uacce->cdev, &uacce->dev);
>>>> +	cdev_device_del(&uacce->cdev, &uacce->dev);
>>>>    	xa_erase(&uacce_xa, uacce->dev_id);
>>>>    	/*
>>>>    	 * uacce exists as long as there are open fds, but ops will be freed
>>>> diff --git a/include/linux/uacce.h b/include/linux/uacce.h
>>>> index e290c0269944..98b896192a44 100644
>>>> --- a/include/linux/uacce.h
>>>> +++ b/include/linux/uacce.h
>>>> @@ -126,7 +126,7 @@ struct uacce_device {
>>>>    	bool is_vf;
>>>>    	u32 flags;
>>>>    	u32 dev_id;
>>>> -	struct cdev *cdev;
>>>> +	struct cdev cdev;
>>>>    	struct device dev;
>>> You can not do this, you now have 2 different reference counts
>>> controlling the lifespan of this one structure.  That is just going to
>>> cause so many more bugs...
>>>
>>> How was this tested?  What is currently failing that requires this
>>> change?
>>>
>>> thanks,
>>>
>>> greg k-h
>> We analyze it theoretically there may be a memory leak
>> issue here, if the cdev_device_add returns a failure,
>> the uacce_remove will not be executed, which results in the
>> uacce cdev memory not being released.
> Then properly clean up if that happens.
>
>> Therefore, we have decided to align with the design of other
>> drivers by making cdev a static member of uacce_device and
>> releasing the memory through uacce_device.
> But again, this is wrong to do.
>
>> found one example in drivers/watchdog/watchdog_dev.h.
>> struct watchdog_core_data {
>>      struct device dev;
>>      struct cdev cdev;
> This is also wrong and needs to be fixed.  Please send a patch to
> resolve it as well, as it should not be copied as a valid example.
>
> thanks,
>
> greg k-h
Very sorry for the delayed response.

In v1, our first thought was that if cdev_device_add returns a
failure, we could release the resources allocated by cdev_alloc
using cdev_del. For this, we attempted the following modification:

@@ -519,6 +519,8 @@ EXPORT_SYMBOL_GPL(uacce_alloc);
   */
  int uacce_register(struct uacce_device *uacce)
  {
+    int ret;
+
      if (!uacce)
          return -ENODEV;

@@ -529,7 +531,14 @@ int uacce_register(struct uacce_device *uacce)
      uacce->cdev->ops = &uacce_fops;
      uacce->cdev->owner = THIS_MODULE;

-    return cdev_device_add(uacce->cdev, &uacce->dev);
+    ret = cdev_device_add(uacce->cdev, &uacce->dev);
+    if (ret) {
+        cdev_del(uacce->cdev);
+        uacce->cdev = NULL;
+        return ret;
+    }
+
+    return 0;
  }

However, after further analysis, we found that cdev_device_add does
not increment the reference count when it fails. Therefore, in this
case, cdev_del is not necessary. This means that the resources
allocated by cdev_alloc will not cause a memory leak in the failure
path.

Thus, I believe this patch modification is unnecessary. In the
upcoming v3 version, I will remove this modification.

Thank you for your patient guidance!

Best regards,
Chenghai
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/4] uacce: fix for cdev memory leak
  2025-09-26  8:47         ` huangchenghai
@ 2025-09-26  9:37           ` linwenkai (C)
  0 siblings, 0 replies; 12+ messages in thread
From: linwenkai (C) @ 2025-09-26  9:37 UTC (permalink / raw)
  To: huangchenghai, Greg KH
  Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm,
	fanghao11, shenyang39, liulongfang, qianweili


在 2025/9/26 16:47, huangchenghai 写道:
>
> On Wed, Sep 17, 2025 at 06:18 PM +0800, Greg KH wrote:
>> On Wed, Sep 17, 2025 at 05:56:16PM +0800, huangchenghai wrote:
>>> On Mon, Sep 16, 2025 at 11:15 PM +0800, Greg KH wrote:
>>>> On Tue, Sep 16, 2025 at 10:48:08PM +0800, Chenghai Huang wrote:
>>>>> From: Wenkai Lin <linwenkai6@hisilicon.com>
>>>>>
>>>>> If cdev_device_add failed, it is hard to determine
>>>>> whether cdev_del has been executed, which lead to a
>>>>> memory leak issue, so we use cdev_init to avoid it.
>>>> I do not understand, what is wrong with the current code? It checks if
>>>> add fails:
>>>>
>>>>> Fixes: 015d239ac014 ("uacce: add uacce driver")
>>>>> Cc: stable@vger.kernel.org
>>>>> Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
>>>>> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
>>>>> ---
>>>>>    drivers/misc/uacce/uacce.c | 13 ++++---------
>>>>>    include/linux/uacce.h      |  2 +-
>>>>>    2 files changed, 5 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
>>>>> index 42e7d2a2a90c..12370469f646 100644
>>>>> --- a/drivers/misc/uacce/uacce.c
>>>>> +++ b/drivers/misc/uacce/uacce.c
>>>>> @@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce)
>>>>>        if (!uacce)
>>>>>            return -ENODEV;
>>>>> -    uacce->cdev = cdev_alloc();
>>>>> -    if (!uacce->cdev)
>>>>> -        return -ENOMEM;
>>>> This is the check.
>>>>
>>>>
>>>>> -
>>>>> -    uacce->cdev->ops = &uacce_fops;
>>>>> -    uacce->cdev->owner = THIS_MODULE;
>>>>> +    cdev_init(&uacce->cdev, &uacce_fops);
>>>>> +    uacce->cdev.owner = THIS_MODULE;
>>>>> -    return cdev_device_add(uacce->cdev, &uacce->dev);
>>>>> +    return cdev_device_add(&uacce->cdev, &uacce->dev);
>>>> And so is this.  So what is wrong here?
>>>>
>>>>
>>>>>    }
>>>>>    EXPORT_SYMBOL_GPL(uacce_register);
>>>>> @@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce)
>>>>>            unmap_mapping_range(q->mapping, 0, 0, 1);
>>>>>        }
>>>>> -    if (uacce->cdev)
>>>>> -        cdev_device_del(uacce->cdev, &uacce->dev);
>>>>> +    cdev_device_del(&uacce->cdev, &uacce->dev);
>>>>>        xa_erase(&uacce_xa, uacce->dev_id);
>>>>>        /*
>>>>>         * uacce exists as long as there are open fds, but ops will 
>>>>> be freed
>>>>> diff --git a/include/linux/uacce.h b/include/linux/uacce.h
>>>>> index e290c0269944..98b896192a44 100644
>>>>> --- a/include/linux/uacce.h
>>>>> +++ b/include/linux/uacce.h
>>>>> @@ -126,7 +126,7 @@ struct uacce_device {
>>>>>        bool is_vf;
>>>>>        u32 flags;
>>>>>        u32 dev_id;
>>>>> -    struct cdev *cdev;
>>>>> +    struct cdev cdev;
>>>>>        struct device dev;
>>>> You can not do this, you now have 2 different reference counts
>>>> controlling the lifespan of this one structure.  That is just going to
>>>> cause so many more bugs...
>>>>
>>>> How was this tested?  What is currently failing that requires this
>>>> change?
>>>>
>>>> thanks,
>>>>
>>>> greg k-h
>>> We analyze it theoretically there may be a memory leak
>>> issue here, if the cdev_device_add returns a failure,
>>> the uacce_remove will not be executed, which results in the
>>> uacce cdev memory not being released.
>> Then properly clean up if that happens.
>>
>>> Therefore, we have decided to align with the design of other
>>> drivers by making cdev a static member of uacce_device and
>>> releasing the memory through uacce_device.
>> But again, this is wrong to do.
>>
>>> found one example in drivers/watchdog/watchdog_dev.h.
>>> struct watchdog_core_data {
>>>      struct device dev;
>>>      struct cdev cdev;
>> This is also wrong and needs to be fixed.  Please send a patch to
>> resolve it as well, as it should not be copied as a valid example.
>>
>> thanks,
>>
>> greg k-h
> Very sorry for the delayed response.
>
> In v1, our first thought was that if cdev_device_add returns a
> failure, we could release the resources allocated by cdev_alloc
> using cdev_del. For this, we attempted the following modification:
>
> @@ -519,6 +519,8 @@ EXPORT_SYMBOL_GPL(uacce_alloc);
>   */
>  int uacce_register(struct uacce_device *uacce)
>  {
> +    int ret;
> +
>      if (!uacce)
>          return -ENODEV;
>
> @@ -529,7 +531,14 @@ int uacce_register(struct uacce_device *uacce)
>      uacce->cdev->ops = &uacce_fops;
>      uacce->cdev->owner = THIS_MODULE;
>
> -    return cdev_device_add(uacce->cdev, &uacce->dev);
> +    ret = cdev_device_add(uacce->cdev, &uacce->dev);
> +    if (ret) {
> +        cdev_del(uacce->cdev);
> +        uacce->cdev = NULL;
> +        return ret;
> +    }
> +
> +    return 0;
>  }
>
> However, after further analysis, we found that cdev_device_add does
> not increment the reference count when it fails. Therefore, in this
> case, cdev_del is not necessary. This means that the resources
> allocated by cdev_alloc will not cause a memory leak in the failure
> path.
>
> Thus, I believe this patch modification is unnecessary. In the
> upcoming v3 version, I will remove this modification.
>
> Thank you for your patient guidance!
>
> Best regards,
> Chenghai
>>
After further analysis, it was found that during the cdev_alloc
process, the reference count of the cdev is initialized to 1.

If kobject_put is not actively executed, the release function of
the cdev cannot be called, and the memory will never be released,
leading to a leak, so kobject_put is still necessary here?

Here is the new solution:
@@ -519,6 +519,8 @@ EXPORT_SYMBOL_GPL(uacce_alloc);
   */
  int uacce_register(struct uacce_device *uacce)
  {
+    int ret;
+
      if (!uacce)
          return -ENODEV;

@@ -529,7 +531,14 @@ int uacce_register(struct uacce_device *uacce)
      uacce->cdev->ops = &uacce_fops;
      uacce->cdev->owner = THIS_MODULE;

-    return cdev_device_add(uacce->cdev, &uacce->dev);
+    ret = cdev_device_add(uacce->cdev, &uacce->dev);
+    if (ret) {
+        kobject_put(&uacce->cdev->kobject);
+        uacce->cdev = NULL;
+        return ret;
+    }
+
+    return 0;
  }

Thanks,
WenKai

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-09-26  9:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-16 14:48 [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management Chenghai Huang
2025-09-16 14:48 ` [PATCH v2 1/4] uacce: fix for cdev memory leak Chenghai Huang
2025-09-16 15:14   ` Greg KH
2025-09-17  9:56     ` huangchenghai
2025-09-17 10:18       ` Greg KH
2025-09-26  8:47         ` huangchenghai
2025-09-26  9:37           ` linwenkai (C)
2025-09-16 14:48 ` [PATCH v2 2/4] uacce: fix isolate sysfs check condition Chenghai Huang
2025-09-16 15:15   ` Greg KH
2025-09-17  9:54     ` huangchenghai
2025-09-16 14:48 ` [PATCH v2 3/4] uacce: implement mremap in uacce_vm_ops to return -EPERM Chenghai Huang
2025-09-16 14:48 ` [PATCH v2 4/4] uacce: ensure safe queue release with state management Chenghai Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox