* [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management
@ 2025-09-16 14:48 Chenghai Huang
2025-09-16 14:48 ` [PATCH v2 1/4] uacce: fix for cdev memory leak Chenghai Huang
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Chenghai Huang @ 2025-09-16 14:48 UTC (permalink / raw)
To: gregkh, zhangfei.gao, wangzhou1
Cc: linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39,
liulongfang, qianweili
This patch series addresses several issues in the uacce:
1.Memory leak fix when device registration fails.
2.Fix sysfs file creation conditions.
3.Add error reporting for unsupported mremap operations.
4.Ensuring safe queue release with proper state management.
---
Changes in v2:
- Use cdev_init to allocate cdev memory to ensure that memory leaks
are avoided.
- Supplement the reason for intercepting the remapping operation.
- Add "cc: stable@vger.kernel.org" to paths with fixed.
- Link to v2: https://lore.kernel.org/all/20250822103904.3776304-1-huangchenghai2@huawei.com/
Chenghai Huang (2):
uacce: fix isolate sysfs check condition
uacce: ensure safe queue release with state management
Wenkai Lin (1):
uacce: fix for cdev memory leak
Yang Shen (1):
uacce: implement mremap in uacce_vm_ops to return -EPERM
drivers/misc/uacce/uacce.c | 49 +++++++++++++++++++++++++-------------
include/linux/uacce.h | 2 +-
2 files changed, 33 insertions(+), 18 deletions(-)
--
2.33.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH v2 1/4] uacce: fix for cdev memory leak 2025-09-16 14:48 [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management Chenghai Huang @ 2025-09-16 14:48 ` Chenghai Huang 2025-09-16 15:14 ` Greg KH 2025-09-16 14:48 ` [PATCH v2 2/4] uacce: fix isolate sysfs check condition Chenghai Huang ` (2 subsequent siblings) 3 siblings, 1 reply; 12+ messages in thread From: Chenghai Huang @ 2025-09-16 14:48 UTC (permalink / raw) To: gregkh, zhangfei.gao, wangzhou1 Cc: linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39, liulongfang, qianweili From: Wenkai Lin <linwenkai6@hisilicon.com> If cdev_device_add failed, it is hard to determine whether cdev_del has been executed, which lead to a memory leak issue, so we use cdev_init to avoid it. Fixes: 015d239ac014 ("uacce: add uacce driver") Cc: stable@vger.kernel.org Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> --- drivers/misc/uacce/uacce.c | 13 ++++--------- include/linux/uacce.h | 2 +- 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c index 42e7d2a2a90c..12370469f646 100644 --- a/drivers/misc/uacce/uacce.c +++ b/drivers/misc/uacce/uacce.c @@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce) if (!uacce) return -ENODEV; - uacce->cdev = cdev_alloc(); - if (!uacce->cdev) - return -ENOMEM; - - uacce->cdev->ops = &uacce_fops; - uacce->cdev->owner = THIS_MODULE; + cdev_init(&uacce->cdev, &uacce_fops); + uacce->cdev.owner = THIS_MODULE; - return cdev_device_add(uacce->cdev, &uacce->dev); + return cdev_device_add(&uacce->cdev, &uacce->dev); } EXPORT_SYMBOL_GPL(uacce_register); @@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce) unmap_mapping_range(q->mapping, 0, 0, 1); } - if (uacce->cdev) - cdev_device_del(uacce->cdev, &uacce->dev); + cdev_device_del(&uacce->cdev, &uacce->dev); xa_erase(&uacce_xa, uacce->dev_id); /* * uacce exists as long as there are open fds, but ops will be freed diff --git a/include/linux/uacce.h b/include/linux/uacce.h index e290c0269944..98b896192a44 100644 --- a/include/linux/uacce.h +++ b/include/linux/uacce.h @@ -126,7 +126,7 @@ struct uacce_device { bool is_vf; u32 flags; u32 dev_id; - struct cdev *cdev; + struct cdev cdev; struct device dev; struct mutex mutex; void *priv; -- 2.33.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/4] uacce: fix for cdev memory leak 2025-09-16 14:48 ` [PATCH v2 1/4] uacce: fix for cdev memory leak Chenghai Huang @ 2025-09-16 15:14 ` Greg KH 2025-09-17 9:56 ` huangchenghai 0 siblings, 1 reply; 12+ messages in thread From: Greg KH @ 2025-09-16 15:14 UTC (permalink / raw) To: Chenghai Huang Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39, liulongfang, qianweili On Tue, Sep 16, 2025 at 10:48:08PM +0800, Chenghai Huang wrote: > From: Wenkai Lin <linwenkai6@hisilicon.com> > > If cdev_device_add failed, it is hard to determine > whether cdev_del has been executed, which lead to a > memory leak issue, so we use cdev_init to avoid it. I do not understand, what is wrong with the current code? It checks if add fails: > > Fixes: 015d239ac014 ("uacce: add uacce driver") > Cc: stable@vger.kernel.org > Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> > Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> > --- > drivers/misc/uacce/uacce.c | 13 ++++--------- > include/linux/uacce.h | 2 +- > 2 files changed, 5 insertions(+), 10 deletions(-) > > diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c > index 42e7d2a2a90c..12370469f646 100644 > --- a/drivers/misc/uacce/uacce.c > +++ b/drivers/misc/uacce/uacce.c > @@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce) > if (!uacce) > return -ENODEV; > > - uacce->cdev = cdev_alloc(); > - if (!uacce->cdev) > - return -ENOMEM; This is the check. > - > - uacce->cdev->ops = &uacce_fops; > - uacce->cdev->owner = THIS_MODULE; > + cdev_init(&uacce->cdev, &uacce_fops); > + uacce->cdev.owner = THIS_MODULE; > > - return cdev_device_add(uacce->cdev, &uacce->dev); > + return cdev_device_add(&uacce->cdev, &uacce->dev); And so is this. So what is wrong here? > } > EXPORT_SYMBOL_GPL(uacce_register); > > @@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce) > unmap_mapping_range(q->mapping, 0, 0, 1); > } > > - if (uacce->cdev) > - cdev_device_del(uacce->cdev, &uacce->dev); > + cdev_device_del(&uacce->cdev, &uacce->dev); > xa_erase(&uacce_xa, uacce->dev_id); > /* > * uacce exists as long as there are open fds, but ops will be freed > diff --git a/include/linux/uacce.h b/include/linux/uacce.h > index e290c0269944..98b896192a44 100644 > --- a/include/linux/uacce.h > +++ b/include/linux/uacce.h > @@ -126,7 +126,7 @@ struct uacce_device { > bool is_vf; > u32 flags; > u32 dev_id; > - struct cdev *cdev; > + struct cdev cdev; > struct device dev; You can not do this, you now have 2 different reference counts controlling the lifespan of this one structure. That is just going to cause so many more bugs... How was this tested? What is currently failing that requires this change? thanks, greg k-h ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/4] uacce: fix for cdev memory leak 2025-09-16 15:14 ` Greg KH @ 2025-09-17 9:56 ` huangchenghai 2025-09-17 10:18 ` Greg KH 0 siblings, 1 reply; 12+ messages in thread From: huangchenghai @ 2025-09-17 9:56 UTC (permalink / raw) To: Greg KH Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39, liulongfang, qianweili On Mon, Sep 16, 2025 at 11:15 PM +0800, Greg KH wrote: > On Tue, Sep 16, 2025 at 10:48:08PM +0800, Chenghai Huang wrote: >> From: Wenkai Lin <linwenkai6@hisilicon.com> >> >> If cdev_device_add failed, it is hard to determine >> whether cdev_del has been executed, which lead to a >> memory leak issue, so we use cdev_init to avoid it. > I do not understand, what is wrong with the current code? It checks if > add fails: > >> Fixes: 015d239ac014 ("uacce: add uacce driver") >> Cc: stable@vger.kernel.org >> Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> >> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> >> --- >> drivers/misc/uacce/uacce.c | 13 ++++--------- >> include/linux/uacce.h | 2 +- >> 2 files changed, 5 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c >> index 42e7d2a2a90c..12370469f646 100644 >> --- a/drivers/misc/uacce/uacce.c >> +++ b/drivers/misc/uacce/uacce.c >> @@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce) >> if (!uacce) >> return -ENODEV; >> >> - uacce->cdev = cdev_alloc(); >> - if (!uacce->cdev) >> - return -ENOMEM; > This is the check. > > >> - >> - uacce->cdev->ops = &uacce_fops; >> - uacce->cdev->owner = THIS_MODULE; >> + cdev_init(&uacce->cdev, &uacce_fops); >> + uacce->cdev.owner = THIS_MODULE; >> >> - return cdev_device_add(uacce->cdev, &uacce->dev); >> + return cdev_device_add(&uacce->cdev, &uacce->dev); > And so is this. So what is wrong here? > > >> } >> EXPORT_SYMBOL_GPL(uacce_register); >> >> @@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce) >> unmap_mapping_range(q->mapping, 0, 0, 1); >> } >> >> - if (uacce->cdev) >> - cdev_device_del(uacce->cdev, &uacce->dev); >> + cdev_device_del(&uacce->cdev, &uacce->dev); >> xa_erase(&uacce_xa, uacce->dev_id); >> /* >> * uacce exists as long as there are open fds, but ops will be freed >> diff --git a/include/linux/uacce.h b/include/linux/uacce.h >> index e290c0269944..98b896192a44 100644 >> --- a/include/linux/uacce.h >> +++ b/include/linux/uacce.h >> @@ -126,7 +126,7 @@ struct uacce_device { >> bool is_vf; >> u32 flags; >> u32 dev_id; >> - struct cdev *cdev; >> + struct cdev cdev; >> struct device dev; > You can not do this, you now have 2 different reference counts > controlling the lifespan of this one structure. That is just going to > cause so many more bugs... > > How was this tested? What is currently failing that requires this > change? > > thanks, > > greg k-h We analyze it theoretically there may be a memory leak issue here, if the cdev_device_add returns a failure, the uacce_remove will not be executed, which results in the uacce cdev memory not being released. Therefore, we have decided to align with the design of other drivers by making cdev a static member of uacce_device and releasing the memory through uacce_device. found one example in drivers/watchdog/watchdog_dev.h. struct watchdog_core_data { struct device dev; struct cdev cdev; struct watchdog_device *wdd; struct mutex lock; ktime_t last_keepalive; ktime_t last_hw_keepalive; ktime_t open_deadline; ... }; static int watchdog_cdev_register(struct watchdog_device *wdd) { struct watchdog_core_data *wd_data; int err; ... cdev_init(&wd_data->cdev, &watchdog_fops); wd_data->cdev.owner = wdd->ops->owner; /* Add the device */ err = cdev_device_add(&wd_data->cdev, &wd_data->dev); ... } static void watchdog_cdev_unregister(struct watchdog_device *wdd) { struct watchdog_core_data *wd_data = wdd->wd_data; ... cdev_device_del(&wd_data->cdev, &wd_data->dev); if (wdd->id == 0) { misc_deregister(&watchdog_miscdev); old_wd_data = NULL; } ... } Thanks, ChengHai ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/4] uacce: fix for cdev memory leak 2025-09-17 9:56 ` huangchenghai @ 2025-09-17 10:18 ` Greg KH 2025-09-26 8:47 ` huangchenghai 0 siblings, 1 reply; 12+ messages in thread From: Greg KH @ 2025-09-17 10:18 UTC (permalink / raw) To: huangchenghai Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39, liulongfang, qianweili On Wed, Sep 17, 2025 at 05:56:16PM +0800, huangchenghai wrote: > > On Mon, Sep 16, 2025 at 11:15 PM +0800, Greg KH wrote: > > On Tue, Sep 16, 2025 at 10:48:08PM +0800, Chenghai Huang wrote: > > > From: Wenkai Lin <linwenkai6@hisilicon.com> > > > > > > If cdev_device_add failed, it is hard to determine > > > whether cdev_del has been executed, which lead to a > > > memory leak issue, so we use cdev_init to avoid it. > > I do not understand, what is wrong with the current code? It checks if > > add fails: > > > > > Fixes: 015d239ac014 ("uacce: add uacce driver") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> > > > Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> > > > --- > > > drivers/misc/uacce/uacce.c | 13 ++++--------- > > > include/linux/uacce.h | 2 +- > > > 2 files changed, 5 insertions(+), 10 deletions(-) > > > > > > diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c > > > index 42e7d2a2a90c..12370469f646 100644 > > > --- a/drivers/misc/uacce/uacce.c > > > +++ b/drivers/misc/uacce/uacce.c > > > @@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce) > > > if (!uacce) > > > return -ENODEV; > > > - uacce->cdev = cdev_alloc(); > > > - if (!uacce->cdev) > > > - return -ENOMEM; > > This is the check. > > > > > > > - > > > - uacce->cdev->ops = &uacce_fops; > > > - uacce->cdev->owner = THIS_MODULE; > > > + cdev_init(&uacce->cdev, &uacce_fops); > > > + uacce->cdev.owner = THIS_MODULE; > > > - return cdev_device_add(uacce->cdev, &uacce->dev); > > > + return cdev_device_add(&uacce->cdev, &uacce->dev); > > And so is this. So what is wrong here? > > > > > > > } > > > EXPORT_SYMBOL_GPL(uacce_register); > > > @@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce) > > > unmap_mapping_range(q->mapping, 0, 0, 1); > > > } > > > - if (uacce->cdev) > > > - cdev_device_del(uacce->cdev, &uacce->dev); > > > + cdev_device_del(&uacce->cdev, &uacce->dev); > > > xa_erase(&uacce_xa, uacce->dev_id); > > > /* > > > * uacce exists as long as there are open fds, but ops will be freed > > > diff --git a/include/linux/uacce.h b/include/linux/uacce.h > > > index e290c0269944..98b896192a44 100644 > > > --- a/include/linux/uacce.h > > > +++ b/include/linux/uacce.h > > > @@ -126,7 +126,7 @@ struct uacce_device { > > > bool is_vf; > > > u32 flags; > > > u32 dev_id; > > > - struct cdev *cdev; > > > + struct cdev cdev; > > > struct device dev; > > You can not do this, you now have 2 different reference counts > > controlling the lifespan of this one structure. That is just going to > > cause so many more bugs... > > > > How was this tested? What is currently failing that requires this > > change? > > > > thanks, > > > > greg k-h > We analyze it theoretically there may be a memory leak > issue here, if the cdev_device_add returns a failure, > the uacce_remove will not be executed, which results in the > uacce cdev memory not being released. Then properly clean up if that happens. > Therefore, we have decided to align with the design of other > drivers by making cdev a static member of uacce_device and > releasing the memory through uacce_device. But again, this is wrong to do. > found one example in drivers/watchdog/watchdog_dev.h. > struct watchdog_core_data { > struct device dev; > struct cdev cdev; This is also wrong and needs to be fixed. Please send a patch to resolve it as well, as it should not be copied as a valid example. thanks, greg k-h ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/4] uacce: fix for cdev memory leak 2025-09-17 10:18 ` Greg KH @ 2025-09-26 8:47 ` huangchenghai 2025-09-26 9:37 ` linwenkai (C) 0 siblings, 1 reply; 12+ messages in thread From: huangchenghai @ 2025-09-26 8:47 UTC (permalink / raw) To: Greg KH Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39, liulongfang, qianweili, linwenkai (C) On Wed, Sep 17, 2025 at 06:18 PM +0800, Greg KH wrote: > On Wed, Sep 17, 2025 at 05:56:16PM +0800, huangchenghai wrote: >> On Mon, Sep 16, 2025 at 11:15 PM +0800, Greg KH wrote: >>> On Tue, Sep 16, 2025 at 10:48:08PM +0800, Chenghai Huang wrote: >>>> From: Wenkai Lin <linwenkai6@hisilicon.com> >>>> >>>> If cdev_device_add failed, it is hard to determine >>>> whether cdev_del has been executed, which lead to a >>>> memory leak issue, so we use cdev_init to avoid it. >>> I do not understand, what is wrong with the current code? It checks if >>> add fails: >>> >>>> Fixes: 015d239ac014 ("uacce: add uacce driver") >>>> Cc: stable@vger.kernel.org >>>> Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> >>>> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> >>>> --- >>>> drivers/misc/uacce/uacce.c | 13 ++++--------- >>>> include/linux/uacce.h | 2 +- >>>> 2 files changed, 5 insertions(+), 10 deletions(-) >>>> >>>> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c >>>> index 42e7d2a2a90c..12370469f646 100644 >>>> --- a/drivers/misc/uacce/uacce.c >>>> +++ b/drivers/misc/uacce/uacce.c >>>> @@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce) >>>> if (!uacce) >>>> return -ENODEV; >>>> - uacce->cdev = cdev_alloc(); >>>> - if (!uacce->cdev) >>>> - return -ENOMEM; >>> This is the check. >>> >>> >>>> - >>>> - uacce->cdev->ops = &uacce_fops; >>>> - uacce->cdev->owner = THIS_MODULE; >>>> + cdev_init(&uacce->cdev, &uacce_fops); >>>> + uacce->cdev.owner = THIS_MODULE; >>>> - return cdev_device_add(uacce->cdev, &uacce->dev); >>>> + return cdev_device_add(&uacce->cdev, &uacce->dev); >>> And so is this. So what is wrong here? >>> >>> >>>> } >>>> EXPORT_SYMBOL_GPL(uacce_register); >>>> @@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce) >>>> unmap_mapping_range(q->mapping, 0, 0, 1); >>>> } >>>> - if (uacce->cdev) >>>> - cdev_device_del(uacce->cdev, &uacce->dev); >>>> + cdev_device_del(&uacce->cdev, &uacce->dev); >>>> xa_erase(&uacce_xa, uacce->dev_id); >>>> /* >>>> * uacce exists as long as there are open fds, but ops will be freed >>>> diff --git a/include/linux/uacce.h b/include/linux/uacce.h >>>> index e290c0269944..98b896192a44 100644 >>>> --- a/include/linux/uacce.h >>>> +++ b/include/linux/uacce.h >>>> @@ -126,7 +126,7 @@ struct uacce_device { >>>> bool is_vf; >>>> u32 flags; >>>> u32 dev_id; >>>> - struct cdev *cdev; >>>> + struct cdev cdev; >>>> struct device dev; >>> You can not do this, you now have 2 different reference counts >>> controlling the lifespan of this one structure. That is just going to >>> cause so many more bugs... >>> >>> How was this tested? What is currently failing that requires this >>> change? >>> >>> thanks, >>> >>> greg k-h >> We analyze it theoretically there may be a memory leak >> issue here, if the cdev_device_add returns a failure, >> the uacce_remove will not be executed, which results in the >> uacce cdev memory not being released. > Then properly clean up if that happens. > >> Therefore, we have decided to align with the design of other >> drivers by making cdev a static member of uacce_device and >> releasing the memory through uacce_device. > But again, this is wrong to do. > >> found one example in drivers/watchdog/watchdog_dev.h. >> struct watchdog_core_data { >> struct device dev; >> struct cdev cdev; > This is also wrong and needs to be fixed. Please send a patch to > resolve it as well, as it should not be copied as a valid example. > > thanks, > > greg k-h Very sorry for the delayed response. In v1, our first thought was that if cdev_device_add returns a failure, we could release the resources allocated by cdev_alloc using cdev_del. For this, we attempted the following modification: @@ -519,6 +519,8 @@ EXPORT_SYMBOL_GPL(uacce_alloc); */ int uacce_register(struct uacce_device *uacce) { + int ret; + if (!uacce) return -ENODEV; @@ -529,7 +531,14 @@ int uacce_register(struct uacce_device *uacce) uacce->cdev->ops = &uacce_fops; uacce->cdev->owner = THIS_MODULE; - return cdev_device_add(uacce->cdev, &uacce->dev); + ret = cdev_device_add(uacce->cdev, &uacce->dev); + if (ret) { + cdev_del(uacce->cdev); + uacce->cdev = NULL; + return ret; + } + + return 0; } However, after further analysis, we found that cdev_device_add does not increment the reference count when it fails. Therefore, in this case, cdev_del is not necessary. This means that the resources allocated by cdev_alloc will not cause a memory leak in the failure path. Thus, I believe this patch modification is unnecessary. In the upcoming v3 version, I will remove this modification. Thank you for your patient guidance! Best regards, Chenghai > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/4] uacce: fix for cdev memory leak 2025-09-26 8:47 ` huangchenghai @ 2025-09-26 9:37 ` linwenkai (C) 0 siblings, 0 replies; 12+ messages in thread From: linwenkai (C) @ 2025-09-26 9:37 UTC (permalink / raw) To: huangchenghai, Greg KH Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39, liulongfang, qianweili 在 2025/9/26 16:47, huangchenghai 写道: > > On Wed, Sep 17, 2025 at 06:18 PM +0800, Greg KH wrote: >> On Wed, Sep 17, 2025 at 05:56:16PM +0800, huangchenghai wrote: >>> On Mon, Sep 16, 2025 at 11:15 PM +0800, Greg KH wrote: >>>> On Tue, Sep 16, 2025 at 10:48:08PM +0800, Chenghai Huang wrote: >>>>> From: Wenkai Lin <linwenkai6@hisilicon.com> >>>>> >>>>> If cdev_device_add failed, it is hard to determine >>>>> whether cdev_del has been executed, which lead to a >>>>> memory leak issue, so we use cdev_init to avoid it. >>>> I do not understand, what is wrong with the current code? It checks if >>>> add fails: >>>> >>>>> Fixes: 015d239ac014 ("uacce: add uacce driver") >>>>> Cc: stable@vger.kernel.org >>>>> Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> >>>>> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> >>>>> --- >>>>> drivers/misc/uacce/uacce.c | 13 ++++--------- >>>>> include/linux/uacce.h | 2 +- >>>>> 2 files changed, 5 insertions(+), 10 deletions(-) >>>>> >>>>> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c >>>>> index 42e7d2a2a90c..12370469f646 100644 >>>>> --- a/drivers/misc/uacce/uacce.c >>>>> +++ b/drivers/misc/uacce/uacce.c >>>>> @@ -522,14 +522,10 @@ int uacce_register(struct uacce_device *uacce) >>>>> if (!uacce) >>>>> return -ENODEV; >>>>> - uacce->cdev = cdev_alloc(); >>>>> - if (!uacce->cdev) >>>>> - return -ENOMEM; >>>> This is the check. >>>> >>>> >>>>> - >>>>> - uacce->cdev->ops = &uacce_fops; >>>>> - uacce->cdev->owner = THIS_MODULE; >>>>> + cdev_init(&uacce->cdev, &uacce_fops); >>>>> + uacce->cdev.owner = THIS_MODULE; >>>>> - return cdev_device_add(uacce->cdev, &uacce->dev); >>>>> + return cdev_device_add(&uacce->cdev, &uacce->dev); >>>> And so is this. So what is wrong here? >>>> >>>> >>>>> } >>>>> EXPORT_SYMBOL_GPL(uacce_register); >>>>> @@ -568,8 +564,7 @@ void uacce_remove(struct uacce_device *uacce) >>>>> unmap_mapping_range(q->mapping, 0, 0, 1); >>>>> } >>>>> - if (uacce->cdev) >>>>> - cdev_device_del(uacce->cdev, &uacce->dev); >>>>> + cdev_device_del(&uacce->cdev, &uacce->dev); >>>>> xa_erase(&uacce_xa, uacce->dev_id); >>>>> /* >>>>> * uacce exists as long as there are open fds, but ops will >>>>> be freed >>>>> diff --git a/include/linux/uacce.h b/include/linux/uacce.h >>>>> index e290c0269944..98b896192a44 100644 >>>>> --- a/include/linux/uacce.h >>>>> +++ b/include/linux/uacce.h >>>>> @@ -126,7 +126,7 @@ struct uacce_device { >>>>> bool is_vf; >>>>> u32 flags; >>>>> u32 dev_id; >>>>> - struct cdev *cdev; >>>>> + struct cdev cdev; >>>>> struct device dev; >>>> You can not do this, you now have 2 different reference counts >>>> controlling the lifespan of this one structure. That is just going to >>>> cause so many more bugs... >>>> >>>> How was this tested? What is currently failing that requires this >>>> change? >>>> >>>> thanks, >>>> >>>> greg k-h >>> We analyze it theoretically there may be a memory leak >>> issue here, if the cdev_device_add returns a failure, >>> the uacce_remove will not be executed, which results in the >>> uacce cdev memory not being released. >> Then properly clean up if that happens. >> >>> Therefore, we have decided to align with the design of other >>> drivers by making cdev a static member of uacce_device and >>> releasing the memory through uacce_device. >> But again, this is wrong to do. >> >>> found one example in drivers/watchdog/watchdog_dev.h. >>> struct watchdog_core_data { >>> struct device dev; >>> struct cdev cdev; >> This is also wrong and needs to be fixed. Please send a patch to >> resolve it as well, as it should not be copied as a valid example. >> >> thanks, >> >> greg k-h > Very sorry for the delayed response. > > In v1, our first thought was that if cdev_device_add returns a > failure, we could release the resources allocated by cdev_alloc > using cdev_del. For this, we attempted the following modification: > > @@ -519,6 +519,8 @@ EXPORT_SYMBOL_GPL(uacce_alloc); > */ > int uacce_register(struct uacce_device *uacce) > { > + int ret; > + > if (!uacce) > return -ENODEV; > > @@ -529,7 +531,14 @@ int uacce_register(struct uacce_device *uacce) > uacce->cdev->ops = &uacce_fops; > uacce->cdev->owner = THIS_MODULE; > > - return cdev_device_add(uacce->cdev, &uacce->dev); > + ret = cdev_device_add(uacce->cdev, &uacce->dev); > + if (ret) { > + cdev_del(uacce->cdev); > + uacce->cdev = NULL; > + return ret; > + } > + > + return 0; > } > > However, after further analysis, we found that cdev_device_add does > not increment the reference count when it fails. Therefore, in this > case, cdev_del is not necessary. This means that the resources > allocated by cdev_alloc will not cause a memory leak in the failure > path. > > Thus, I believe this patch modification is unnecessary. In the > upcoming v3 version, I will remove this modification. > > Thank you for your patient guidance! > > Best regards, > Chenghai >> After further analysis, it was found that during the cdev_alloc process, the reference count of the cdev is initialized to 1. If kobject_put is not actively executed, the release function of the cdev cannot be called, and the memory will never be released, leading to a leak, so kobject_put is still necessary here? Here is the new solution: @@ -519,6 +519,8 @@ EXPORT_SYMBOL_GPL(uacce_alloc); */ int uacce_register(struct uacce_device *uacce) { + int ret; + if (!uacce) return -ENODEV; @@ -529,7 +531,14 @@ int uacce_register(struct uacce_device *uacce) uacce->cdev->ops = &uacce_fops; uacce->cdev->owner = THIS_MODULE; - return cdev_device_add(uacce->cdev, &uacce->dev); + ret = cdev_device_add(uacce->cdev, &uacce->dev); + if (ret) { + kobject_put(&uacce->cdev->kobject); + uacce->cdev = NULL; + return ret; + } + + return 0; } Thanks, WenKai ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 2/4] uacce: fix isolate sysfs check condition 2025-09-16 14:48 [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management Chenghai Huang 2025-09-16 14:48 ` [PATCH v2 1/4] uacce: fix for cdev memory leak Chenghai Huang @ 2025-09-16 14:48 ` Chenghai Huang 2025-09-16 15:15 ` Greg KH 2025-09-16 14:48 ` [PATCH v2 3/4] uacce: implement mremap in uacce_vm_ops to return -EPERM Chenghai Huang 2025-09-16 14:48 ` [PATCH v2 4/4] uacce: ensure safe queue release with state management Chenghai Huang 3 siblings, 1 reply; 12+ messages in thread From: Chenghai Huang @ 2025-09-16 14:48 UTC (permalink / raw) To: gregkh, zhangfei.gao, wangzhou1 Cc: linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39, liulongfang, qianweili The uacce supports device isolation feature. If the driver implements the isolate_err_threshold_read and isolate_err_threshold_write callbacks, the uacce will create sysfs files. Users can read and configure isolation policies through sysfs. Currently, if either isolate_err_threshold_read or isolate_err_threshold_write callback exists, sysfs files are created. However, accessing a non-existent callback may cause a system panic. Therefore, sysfs files are only created when both isolate_err_threshold_read and isolate_err_threshold_write are present. Fixes: e3e289fbc0b5 ("uacce: supports device isolation feature") Cc: stable@vger.kernel.org Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> --- drivers/misc/uacce/uacce.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c index 12370469f646..770a931ef68d 100644 --- a/drivers/misc/uacce/uacce.c +++ b/drivers/misc/uacce/uacce.c @@ -441,7 +441,7 @@ static umode_t uacce_dev_is_visible(struct kobject *kobj, return 0; if (attr == &dev_attr_isolate_strategy.attr && - (!uacce->ops->isolate_err_threshold_read && + (!uacce->ops->isolate_err_threshold_read || !uacce->ops->isolate_err_threshold_write)) return 0; -- 2.33.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/4] uacce: fix isolate sysfs check condition 2025-09-16 14:48 ` [PATCH v2 2/4] uacce: fix isolate sysfs check condition Chenghai Huang @ 2025-09-16 15:15 ` Greg KH 2025-09-17 9:54 ` huangchenghai 0 siblings, 1 reply; 12+ messages in thread From: Greg KH @ 2025-09-16 15:15 UTC (permalink / raw) To: Chenghai Huang Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39, liulongfang, qianweili On Tue, Sep 16, 2025 at 10:48:09PM +0800, Chenghai Huang wrote: > The uacce supports device isolation feature. If the driver > implements the isolate_err_threshold_read and > isolate_err_threshold_write callbacks, the uacce will create sysfs > files. Users can read and configure isolation policies through > sysfs. Currently, if either isolate_err_threshold_read or > isolate_err_threshold_write callback exists, sysfs files are > created. > > However, accessing a non-existent callback may cause a system panic. Where is the callback happening that fails? Shouldn't that be checked instead of doing this change? > Therefore, sysfs files are only created when both > isolate_err_threshold_read and isolate_err_threshold_write are > present. What if a device only has 1? That should still work properly? And why not just create the file if it is going to be used, that is the real solution here. thanks, greg k-h ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/4] uacce: fix isolate sysfs check condition 2025-09-16 15:15 ` Greg KH @ 2025-09-17 9:54 ` huangchenghai 0 siblings, 0 replies; 12+ messages in thread From: huangchenghai @ 2025-09-17 9:54 UTC (permalink / raw) To: Greg KH Cc: zhangfei.gao, wangzhou1, linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39, liulongfang, qianweili On Mon, Sep 16, 2025 at 11:15 PM +0800, Greg KH wrote: > On Tue, Sep 16, 2025 at 10:48:09PM +0800, Chenghai Huang wrote: >> The uacce supports device isolation feature. If the driver >> implements the isolate_err_threshold_read and >> isolate_err_threshold_write callbacks, the uacce will create sysfs >> files. Users can read and configure isolation policies through >> sysfs. Currently, if either isolate_err_threshold_read or >> isolate_err_threshold_write callback exists, sysfs files are >> created. >> >> However, accessing a non-existent callback may cause a system panic. > Where is the callback happening that fails? Shouldn't that be checked > instead of doing this change? > >> Therefore, sysfs files are only created when both >> isolate_err_threshold_read and isolate_err_threshold_write are >> present. > What if a device only has 1? That should still work properly? > > And why not just create the file if it is going to be used, that is the > real solution here. > > thanks, > > greg k-h Thank you for your feedback.I agree that the check should be done in the corresponding `isolate_strategy_show()` and `isolate_strategy_store()` functions. How about the updated: @@ -402,6 +402,9 @@ static ssize_t isolate_strategy_show(struct device *dev, struct device_attribute struct uacce_device *uacce = to_uacce_device(dev); u32 val; + if (!uacce->ops->isolate_err_threshold_read) + return -ENOENT; + val = uacce->ops->isolate_err_threshold_read(uacce); return sysfs_emit(buf, "%u\n", val); @@ -414,6 +417,9 @@ static ssize_t isolate_strategy_store(struct device *dev, struct device_attribut unsigned long val; int ret; + if (!uacce->ops->isolate_err_threshold_write) + return -ENOENT; + if (kstrtoul(buf, 0, &val) < 0) return -EINVAL; @@ -460,9 +466,7 @@ static umode_t uacce_dev_is_visible(struct kobject *kobj, (!uacce->qf_pg_num[UACCE_QFRT_DUS]))) return 0; - if (attr == &dev_attr_isolate_strategy.attr && - (!uacce->ops->isolate_err_threshold_read || - !uacce->ops->isolate_err_threshold_write)) + if (attr == &dev_attr_isolate_strategy.attr) return 0; This way, the sysfs files will only be created if they are going to be used, and the checks are done at the appropriate places. Thanks, Chenghai ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 3/4] uacce: implement mremap in uacce_vm_ops to return -EPERM 2025-09-16 14:48 [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management Chenghai Huang 2025-09-16 14:48 ` [PATCH v2 1/4] uacce: fix for cdev memory leak Chenghai Huang 2025-09-16 14:48 ` [PATCH v2 2/4] uacce: fix isolate sysfs check condition Chenghai Huang @ 2025-09-16 14:48 ` Chenghai Huang 2025-09-16 14:48 ` [PATCH v2 4/4] uacce: ensure safe queue release with state management Chenghai Huang 3 siblings, 0 replies; 12+ messages in thread From: Chenghai Huang @ 2025-09-16 14:48 UTC (permalink / raw) To: gregkh, zhangfei.gao, wangzhou1 Cc: linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39, liulongfang, qianweili From: Yang Shen <shenyang39@huawei.com> The current uacce_vm_ops does not support the mremap operation of vm_operations_struct. Implement .mremap to return -EPERM to remind users. The reason we need to explicitly disable mremap is that when the driver does not implement .mremap, it uses the default mremap method. This could lead to a risk scenario: An application might first mmap address p1, then mremap to p2, followed by munmap(p1), and finally munmap(p2). Since the default mremap copies the original vma's vm_private_data (i.e., q) to the new vma, both munmap operations would trigger vma_close, causing q->qfr to be freed twice(qfr will be set to null here, so repeated release is ok). Fixes: 015d239ac014 ("uacce: add uacce driver") Cc: stable@vger.kernel.org Signed-off-by: Yang Shen <shenyang39@huawei.com> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> --- drivers/misc/uacce/uacce.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c index 770a931ef68d..dda71492874d 100644 --- a/drivers/misc/uacce/uacce.c +++ b/drivers/misc/uacce/uacce.c @@ -214,8 +214,14 @@ static void uacce_vma_close(struct vm_area_struct *vma) } } +static int uacce_vma_mremap(struct vm_area_struct *area) +{ + return -EPERM; +} + static const struct vm_operations_struct uacce_vm_ops = { .close = uacce_vma_close, + .mremap = uacce_vma_mremap, }; static int uacce_fops_mmap(struct file *filep, struct vm_area_struct *vma) -- 2.33.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 4/4] uacce: ensure safe queue release with state management 2025-09-16 14:48 [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management Chenghai Huang ` (2 preceding siblings ...) 2025-09-16 14:48 ` [PATCH v2 3/4] uacce: implement mremap in uacce_vm_ops to return -EPERM Chenghai Huang @ 2025-09-16 14:48 ` Chenghai Huang 3 siblings, 0 replies; 12+ messages in thread From: Chenghai Huang @ 2025-09-16 14:48 UTC (permalink / raw) To: gregkh, zhangfei.gao, wangzhou1 Cc: linux-kernel, linux-crypto, linuxarm, fanghao11, shenyang39, liulongfang, qianweili Directly calling `put_queue` carries risks since it cannot guarantee that resources of `uacce_queue` have been fully released beforehand. So adding a `stop_queue` operation for the UACCE_CMD_PUT_Q command and leaving the `put_queue` operation to the final resource release ensures safety. Queue states are defined as follows: - UACCE_Q_ZOMBIE: Initial state - UACCE_Q_INIT: After opening `uacce` - UACCE_Q_STARTED: After `start` is issued via `ioctl` When executing `poweroff -f` in virt while accelerator are still working, `uacce_fops_release` and `uacce_remove` may execute concurrently. This can cause `uacce_put_queue` within `uacce_fops_release` to access a NULL `ops` pointer. Therefore, add state checks to prevent accessing freed pointers. Fixes: 015d239ac014 ("uacce: add uacce driver") Cc: stable@vger.kernel.org Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Yang Shen <shenyang39@huawei.com> --- drivers/misc/uacce/uacce.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c index dda71492874d..ff9c515ac289 100644 --- a/drivers/misc/uacce/uacce.c +++ b/drivers/misc/uacce/uacce.c @@ -40,20 +40,34 @@ static int uacce_start_queue(struct uacce_queue *q) return 0; } -static int uacce_put_queue(struct uacce_queue *q) +static int uacce_stop_queue(struct uacce_queue *q) { struct uacce_device *uacce = q->uacce; - if ((q->state == UACCE_Q_STARTED) && uacce->ops->stop_queue) + if (q->state != UACCE_Q_STARTED) + return 0; + + if (uacce->ops->stop_queue) uacce->ops->stop_queue(q); - if ((q->state == UACCE_Q_INIT || q->state == UACCE_Q_STARTED) && - uacce->ops->put_queue) + q->state = UACCE_Q_INIT; + + return 0; +} + +static void uacce_put_queue(struct uacce_queue *q) +{ + struct uacce_device *uacce = q->uacce; + + uacce_stop_queue(q); + + if (q->state != UACCE_Q_INIT) + return; + + if (uacce->ops->put_queue) uacce->ops->put_queue(q); q->state = UACCE_Q_ZOMBIE; - - return 0; } static long uacce_fops_unl_ioctl(struct file *filep, @@ -80,7 +94,7 @@ static long uacce_fops_unl_ioctl(struct file *filep, ret = uacce_start_queue(q); break; case UACCE_CMD_PUT_Q: - ret = uacce_put_queue(q); + ret = uacce_stop_queue(q); break; default: if (uacce->ops->ioctl) -- 2.33.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-09-26 9:37 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-16 14:48 [PATCH v2 0/4] uacce: driver fixes for memory leaks and state management Chenghai Huang 2025-09-16 14:48 ` [PATCH v2 1/4] uacce: fix for cdev memory leak Chenghai Huang 2025-09-16 15:14 ` Greg KH 2025-09-17 9:56 ` huangchenghai 2025-09-17 10:18 ` Greg KH 2025-09-26 8:47 ` huangchenghai 2025-09-26 9:37 ` linwenkai (C) 2025-09-16 14:48 ` [PATCH v2 2/4] uacce: fix isolate sysfs check condition Chenghai Huang 2025-09-16 15:15 ` Greg KH 2025-09-17 9:54 ` huangchenghai 2025-09-16 14:48 ` [PATCH v2 3/4] uacce: implement mremap in uacce_vm_ops to return -EPERM Chenghai Huang 2025-09-16 14:48 ` [PATCH v2 4/4] uacce: ensure safe queue release with state management Chenghai Huang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox