* [PATCH v2 0/4] mm: clarify nofail memory allocation
@ 2024-07-31 0:01 Barry Song
2024-07-31 0:01 ` [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL Barry Song
` (3 more replies)
0 siblings, 4 replies; 28+ messages in thread
From: Barry Song @ 2024-07-31 0:01 UTC (permalink / raw)
To: akpm, linux-mm
Cc: 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko,
penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua,
vbabka, virtualization
From: Barry Song <v-songbaohua@oppo.com>
-v2:
* adjust vpda fix according to Jason and Michal's feedback, I would
expect Jason to test it, thanks!
* split BUG_ON of unavoidable failure and the case GFP_ATOMIC |
__GFP_NOFAIL into two patches according to Vlastimil and Michal.
* collect Michal's acked-by for patch 2 - the doc;
* remove the new GFP_NOFAIL from this series, that one would be a
separate enhancement patchset later on.
-v1:
https://lore.kernel.org/linux-mm/20240724085544.299090-1-21cnbao@gmail.com/
__GFP_NOFAIL carries the semantics of never failing, so its callers
do not check the return value:
%__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
cannot handle allocation failures. The allocation could block
indefinitely but will never return with failure. Testing for
failure is pointless.
However, __GFP_NOFAIL can sometimes fail if it exceeds size limits
or is used with GFP_ATOMIC/GFP_NOWAIT in a non-sleepable context.
This can expose security vulnerabilities due to potential NULL
dereferences.
Since __GFP_NOFAIL does not support non-blocking allocation, we introduce
GFP_NOFAIL with inclusive blocking semantics and encourage using GFP_NOFAIL
as a replacement for __GFP_NOFAIL in non-mm.
If we must still fail a nofail allocation, we should trigger a BUG rather
than exposing NULL dereferences to callers who do not check the return
value.
* The discussion started from this topic:
[PATCH RFC] mm: warn potential return NULL for kmalloc_array and
kvmalloc_array with __GFP_NOFAIL
https://lore.kernel.org/linux-mm/20240717230025.77361-1-21cnbao@gmail.com/
Thank you to Michal, Christoph, Vlastimil, and Hailong for all the
comments.
Barry Song (4):
vpda: try to fix the potential crash due to misusing __GFP_NOFAIL
mm: Document __GFP_NOFAIL must be blockable
mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails
mm: prohibit NULL deference exposed for unsupported non-blockable
__GFP_NOFAIL
drivers/vdpa/vdpa_user/iova_domain.c | 31 +++++++++++++++++++++++-----
drivers/vdpa/vdpa_user/iova_domain.h | 5 ++++-
drivers/vdpa/vdpa_user/vduse_dev.c | 4 +++-
include/linux/gfp_types.h | 5 ++++-
include/linux/slab.h | 4 +++-
mm/page_alloc.c | 14 +++++++------
mm/util.c | 1 +
7 files changed, 49 insertions(+), 15 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread* [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-07-31 0:01 [PATCH v2 0/4] mm: clarify nofail memory allocation Barry Song @ 2024-07-31 0:01 ` Barry Song 2024-07-31 3:09 ` Jason Wang 2024-07-31 0:01 ` [PATCH v2 2/4] mm: Document __GFP_NOFAIL must be blockable Barry Song ` (2 subsequent siblings) 3 siblings, 1 reply; 28+ messages in thread From: Barry Song @ 2024-07-31 0:01 UTC (permalink / raw) To: akpm, linux-mm Cc: 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin From: Barry Song <v-songbaohua@oppo.com> mm doesn't support non-blockable __GFP_NOFAIL allocation. Because __GFP_NOFAIL without direct reclamation may just result in a busy loop within non-sleepable contexts. static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct alloc_context *ac) { ... /* * Make sure that __GFP_NOFAIL request doesn't leak out and make sure * we always retry */ if (gfp_mask & __GFP_NOFAIL) { /* * All existing users of the __GFP_NOFAIL are blockable, so warn * of any new users that actually require GFP_NOWAIT */ if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) goto fail; ... } ... fail: warn_alloc(gfp_mask, ac->nodemask, "page allocation failure: order:%u", order); got_pg: return page; } Let's move the memory allocation out of the atomic context and use the normal sleepable context to get pages. [RFT]: This has only been compile-tested; I'd prefer if the VDPA maintainers handles it. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: "Eugenio Pérez" <eperezma@redhat.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Barry Song <v-songbaohua@oppo.com> --- drivers/vdpa/vdpa_user/iova_domain.c | 31 +++++++++++++++++++++++----- drivers/vdpa/vdpa_user/iova_domain.h | 5 ++++- drivers/vdpa/vdpa_user/vduse_dev.c | 4 +++- 3 files changed, 33 insertions(+), 7 deletions(-) diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c index 791d38d6284c..9318f059a8b5 100644 --- a/drivers/vdpa/vdpa_user/iova_domain.c +++ b/drivers/vdpa/vdpa_user/iova_domain.c @@ -283,7 +283,23 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, return ret; } -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain) +{ + struct page **pages; + unsigned long count, i; + + if (!domain->user_bounce_pages) + return NULL; + + count = domain->bounce_size >> PAGE_SHIFT; + pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); + for (i = 0; i < count; i++) + pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); + + return pages; +} + +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, struct page **pages) { struct vduse_bounce_map *map; unsigned long i, count; @@ -294,15 +310,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) count = domain->bounce_size >> PAGE_SHIFT; for (i = 0; i < count; i++) { - struct page *page = NULL; + struct page *page = pages[i]; map = &domain->bounce_maps[i]; - if (WARN_ON(!map->bounce_page)) + if (WARN_ON(!map->bounce_page)) { + put_page(page); continue; + } /* Copy user page to kernel page if it's in use */ if (map->orig_phys != INVALID_PHYS_ADDR) { - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL); memcpy_from_page(page_address(page), map->bounce_page, 0, PAGE_SIZE); } @@ -310,6 +327,7 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) map->bounce_page = page; } domain->user_bounce_pages = false; + kfree(pages); out: write_unlock(&domain->bounce_lock); } @@ -543,10 +561,13 @@ static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma) static int vduse_domain_release(struct inode *inode, struct file *file) { struct vduse_iova_domain *domain = file->private_data; + struct page **pages; + + pages = vduse_domain_alloc_pages_to_remove_bounce(domain); spin_lock(&domain->iotlb_lock); vduse_iotlb_del_range(domain, 0, ULLONG_MAX); - vduse_domain_remove_user_bounce_pages(domain); + vduse_domain_remove_user_bounce_pages(domain, pages); vduse_domain_free_kernel_bounce_pages(domain); spin_unlock(&domain->iotlb_lock); put_iova_domain(&domain->stream_iovad); diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h index f92f22a7267d..17efa5555b3f 100644 --- a/drivers/vdpa/vdpa_user/iova_domain.h +++ b/drivers/vdpa/vdpa_user/iova_domain.h @@ -74,7 +74,10 @@ void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, struct page **pages, int count); -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain); +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, + struct page **pages); + +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain); void vduse_domain_destroy(struct vduse_iova_domain *domain); diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 7ae99691efdf..5d8d5810df57 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1030,6 +1030,7 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev, static int vduse_dev_dereg_umem(struct vduse_dev *dev, u64 iova, u64 size) { + struct page **pages; int ret; mutex_lock(&dev->mem_lock); @@ -1044,7 +1045,8 @@ static int vduse_dev_dereg_umem(struct vduse_dev *dev, if (dev->umem->iova != iova || size != dev->domain->bounce_size) goto unlock; - vduse_domain_remove_user_bounce_pages(dev->domain); + pages = vduse_domain_alloc_pages_to_remove_bounce(dev->domain); + vduse_domain_remove_user_bounce_pages(dev->domain, pages); unpin_user_pages_dirty_lock(dev->umem->pages, dev->umem->npages, true); atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm); -- 2.34.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-07-31 0:01 ` [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL Barry Song @ 2024-07-31 3:09 ` Jason Wang 2024-07-31 3:15 ` Barry Song 0 siblings, 1 reply; 28+ messages in thread From: Jason Wang @ 2024-07-31 3:09 UTC (permalink / raw) To: Barry Song Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin On Wed, Jul 31, 2024 at 8:03 AM Barry Song <21cnbao@gmail.com> wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > __GFP_NOFAIL without direct reclamation may just result in a busy > loop within non-sleepable contexts. > > static inline struct page * > __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > struct alloc_context *ac) > { > ... > /* > * Make sure that __GFP_NOFAIL request doesn't leak out and make sure > * we always retry > */ > if (gfp_mask & __GFP_NOFAIL) { > /* > * All existing users of the __GFP_NOFAIL are blockable, so warn > * of any new users that actually require GFP_NOWAIT > */ > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > goto fail; > ... > } > ... > fail: > warn_alloc(gfp_mask, ac->nodemask, > "page allocation failure: order:%u", order); > got_pg: > return page; > } > > Let's move the memory allocation out of the atomic context and use > the normal sleepable context to get pages. > > [RFT]: This has only been compile-tested; I'd prefer if the VDPA maintainers > handles it. > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > Cc: Jason Wang <jasowang@redhat.com> > Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > Cc: "Eugenio Pérez" <eperezma@redhat.com> > Cc: Maxime Coquelin <maxime.coquelin@redhat.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > --- > drivers/vdpa/vdpa_user/iova_domain.c | 31 +++++++++++++++++++++++----- > drivers/vdpa/vdpa_user/iova_domain.h | 5 ++++- > drivers/vdpa/vdpa_user/vduse_dev.c | 4 +++- > 3 files changed, 33 insertions(+), 7 deletions(-) > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c > index 791d38d6284c..9318f059a8b5 100644 > --- a/drivers/vdpa/vdpa_user/iova_domain.c > +++ b/drivers/vdpa/vdpa_user/iova_domain.c > @@ -283,7 +283,23 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > return ret; > } > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain) > +{ > + struct page **pages; > + unsigned long count, i; > + > + if (!domain->user_bounce_pages) > + return NULL; > + > + count = domain->bounce_size >> PAGE_SHIFT; > + pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); > + for (i = 0; i < count; i++) > + pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); > + > + return pages; > +} > + > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, struct page **pages) > { > struct vduse_bounce_map *map; > unsigned long i, count; > @@ -294,15 +310,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > count = domain->bounce_size >> PAGE_SHIFT; > for (i = 0; i < count; i++) { > - struct page *page = NULL; > + struct page *page = pages[i]; > > map = &domain->bounce_maps[i]; > - if (WARN_ON(!map->bounce_page)) > + if (WARN_ON(!map->bounce_page)) { > + put_page(page); > continue; > + } > > /* Copy user page to kernel page if it's in use */ > if (map->orig_phys != INVALID_PHYS_ADDR) { > - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL); > memcpy_from_page(page_address(page), > map->bounce_page, 0, PAGE_SIZE); > } > @@ -310,6 +327,7 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > map->bounce_page = page; > } > domain->user_bounce_pages = false; > + kfree(pages); > out: > write_unlock(&domain->bounce_lock); > } > @@ -543,10 +561,13 @@ static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma) > static int vduse_domain_release(struct inode *inode, struct file *file) > { > struct vduse_iova_domain *domain = file->private_data; > + struct page **pages; > + > + pages = vduse_domain_alloc_pages_to_remove_bounce(domain); > > spin_lock(&domain->iotlb_lock); > vduse_iotlb_del_range(domain, 0, ULLONG_MAX); > - vduse_domain_remove_user_bounce_pages(domain); > + vduse_domain_remove_user_bounce_pages(domain, pages); > vduse_domain_free_kernel_bounce_pages(domain); > spin_unlock(&domain->iotlb_lock); > put_iova_domain(&domain->stream_iovad); > diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h > index f92f22a7267d..17efa5555b3f 100644 > --- a/drivers/vdpa/vdpa_user/iova_domain.h > +++ b/drivers/vdpa/vdpa_user/iova_domain.h > @@ -74,7 +74,10 @@ void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); > int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > struct page **pages, int count); > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain); > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, > + struct page **pages); > + > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain); > > void vduse_domain_destroy(struct vduse_iova_domain *domain); > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > index 7ae99691efdf..5d8d5810df57 100644 > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > @@ -1030,6 +1030,7 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev, > static int vduse_dev_dereg_umem(struct vduse_dev *dev, > u64 iova, u64 size) > { > + struct page **pages; > int ret; > > mutex_lock(&dev->mem_lock); > @@ -1044,7 +1045,8 @@ static int vduse_dev_dereg_umem(struct vduse_dev *dev, > if (dev->umem->iova != iova || size != dev->domain->bounce_size) > goto unlock; > > - vduse_domain_remove_user_bounce_pages(dev->domain); > + pages = vduse_domain_alloc_pages_to_remove_bounce(dev->domain); > + vduse_domain_remove_user_bounce_pages(dev->domain, pages); > unpin_user_pages_dirty_lock(dev->umem->pages, > dev->umem->npages, true); > atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm); We miss a kfree(pages); here? Thanks > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-07-31 3:09 ` Jason Wang @ 2024-07-31 3:15 ` Barry Song 2024-07-31 3:58 ` Jason Wang 0 siblings, 1 reply; 28+ messages in thread From: Barry Song @ 2024-07-31 3:15 UTC (permalink / raw) To: Jason Wang Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin On Wed, Jul 31, 2024 at 11:10 AM Jason Wang <jasowang@redhat.com> wrote: > > On Wed, Jul 31, 2024 at 8:03 AM Barry Song <21cnbao@gmail.com> wrote: > > > > From: Barry Song <v-songbaohua@oppo.com> > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > __GFP_NOFAIL without direct reclamation may just result in a busy > > loop within non-sleepable contexts. > > > > static inline struct page * > > __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > struct alloc_context *ac) > > { > > ... > > /* > > * Make sure that __GFP_NOFAIL request doesn't leak out and make sure > > * we always retry > > */ > > if (gfp_mask & __GFP_NOFAIL) { > > /* > > * All existing users of the __GFP_NOFAIL are blockable, so warn > > * of any new users that actually require GFP_NOWAIT > > */ > > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > > goto fail; > > ... > > } > > ... > > fail: > > warn_alloc(gfp_mask, ac->nodemask, > > "page allocation failure: order:%u", order); > > got_pg: > > return page; > > } > > > > Let's move the memory allocation out of the atomic context and use > > the normal sleepable context to get pages. > > > > [RFT]: This has only been compile-tested; I'd prefer if the VDPA maintainers > > handles it. > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > Cc: Jason Wang <jasowang@redhat.com> > > Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > Cc: "Eugenio Pérez" <eperezma@redhat.com> > > Cc: Maxime Coquelin <maxime.coquelin@redhat.com> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > --- > > drivers/vdpa/vdpa_user/iova_domain.c | 31 +++++++++++++++++++++++----- > > drivers/vdpa/vdpa_user/iova_domain.h | 5 ++++- > > drivers/vdpa/vdpa_user/vduse_dev.c | 4 +++- > > 3 files changed, 33 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c > > index 791d38d6284c..9318f059a8b5 100644 > > --- a/drivers/vdpa/vdpa_user/iova_domain.c > > +++ b/drivers/vdpa/vdpa_user/iova_domain.c > > @@ -283,7 +283,23 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > return ret; > > } > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain) > > +{ > > + struct page **pages; > > + unsigned long count, i; > > + > > + if (!domain->user_bounce_pages) > > + return NULL; > > + > > + count = domain->bounce_size >> PAGE_SHIFT; > > + pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); > > + for (i = 0; i < count; i++) > > + pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); > > + > > + return pages; > > +} > > + > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, struct page **pages) > > { > > struct vduse_bounce_map *map; > > unsigned long i, count; > > @@ -294,15 +310,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > count = domain->bounce_size >> PAGE_SHIFT; > > for (i = 0; i < count; i++) { > > - struct page *page = NULL; > > + struct page *page = pages[i]; > > > > map = &domain->bounce_maps[i]; > > - if (WARN_ON(!map->bounce_page)) > > + if (WARN_ON(!map->bounce_page)) { > > + put_page(page); > > continue; > > + } > > > > /* Copy user page to kernel page if it's in use */ > > if (map->orig_phys != INVALID_PHYS_ADDR) { > > - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL); > > memcpy_from_page(page_address(page), > > map->bounce_page, 0, PAGE_SIZE); > > } > > @@ -310,6 +327,7 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > map->bounce_page = page; > > } > > domain->user_bounce_pages = false; > > + kfree(pages); > > out: > > write_unlock(&domain->bounce_lock); > > } > > @@ -543,10 +561,13 @@ static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma) > > static int vduse_domain_release(struct inode *inode, struct file *file) > > { > > struct vduse_iova_domain *domain = file->private_data; > > + struct page **pages; > > + > > + pages = vduse_domain_alloc_pages_to_remove_bounce(domain); > > > > spin_lock(&domain->iotlb_lock); > > vduse_iotlb_del_range(domain, 0, ULLONG_MAX); > > - vduse_domain_remove_user_bounce_pages(domain); > > + vduse_domain_remove_user_bounce_pages(domain, pages); > > vduse_domain_free_kernel_bounce_pages(domain); > > spin_unlock(&domain->iotlb_lock); > > put_iova_domain(&domain->stream_iovad); > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h > > index f92f22a7267d..17efa5555b3f 100644 > > --- a/drivers/vdpa/vdpa_user/iova_domain.h > > +++ b/drivers/vdpa/vdpa_user/iova_domain.h > > @@ -74,7 +74,10 @@ void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); > > int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > struct page **pages, int count); > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain); > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, > > + struct page **pages); > > + > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain); > > > > void vduse_domain_destroy(struct vduse_iova_domain *domain); > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > > index 7ae99691efdf..5d8d5810df57 100644 > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > @@ -1030,6 +1030,7 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev, > > static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > u64 iova, u64 size) > > { > > + struct page **pages; > > int ret; > > > > mutex_lock(&dev->mem_lock); > > @@ -1044,7 +1045,8 @@ static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > if (dev->umem->iova != iova || size != dev->domain->bounce_size) > > goto unlock; > > > > - vduse_domain_remove_user_bounce_pages(dev->domain); > > + pages = vduse_domain_alloc_pages_to_remove_bounce(dev->domain); > > + vduse_domain_remove_user_bounce_pages(dev->domain, pages); > > unpin_user_pages_dirty_lock(dev->umem->pages, > > dev->umem->npages, true); > > atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm); > > We miss a kfree(pages); here? no. i've moved it into vduse_domain_remove_user_bounce_pages. > > Thanks > > > -- > > 2.34.1 > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-07-31 3:15 ` Barry Song @ 2024-07-31 3:58 ` Jason Wang 2024-07-31 4:11 ` Barry Song 0 siblings, 1 reply; 28+ messages in thread From: Jason Wang @ 2024-07-31 3:58 UTC (permalink / raw) To: Barry Song Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin On Wed, Jul 31, 2024 at 11:15 AM Barry Song <21cnbao@gmail.com> wrote: > > On Wed, Jul 31, 2024 at 11:10 AM Jason Wang <jasowang@redhat.com> wrote: > > > > On Wed, Jul 31, 2024 at 8:03 AM Barry Song <21cnbao@gmail.com> wrote: > > > > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > loop within non-sleepable contexts. > > > > > > static inline struct page * > > > __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > > struct alloc_context *ac) > > > { > > > ... > > > /* > > > * Make sure that __GFP_NOFAIL request doesn't leak out and make sure > > > * we always retry > > > */ > > > if (gfp_mask & __GFP_NOFAIL) { > > > /* > > > * All existing users of the __GFP_NOFAIL are blockable, so warn > > > * of any new users that actually require GFP_NOWAIT > > > */ > > > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > > > goto fail; > > > ... > > > } > > > ... > > > fail: > > > warn_alloc(gfp_mask, ac->nodemask, > > > "page allocation failure: order:%u", order); > > > got_pg: > > > return page; > > > } > > > > > > Let's move the memory allocation out of the atomic context and use > > > the normal sleepable context to get pages. > > > > > > [RFT]: This has only been compile-tested; I'd prefer if the VDPA maintainers > > > handles it. > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > Cc: Jason Wang <jasowang@redhat.com> > > > Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > Cc: "Eugenio Pérez" <eperezma@redhat.com> > > > Cc: Maxime Coquelin <maxime.coquelin@redhat.com> > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > > --- > > > drivers/vdpa/vdpa_user/iova_domain.c | 31 +++++++++++++++++++++++----- > > > drivers/vdpa/vdpa_user/iova_domain.h | 5 ++++- > > > drivers/vdpa/vdpa_user/vduse_dev.c | 4 +++- > > > 3 files changed, 33 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c > > > index 791d38d6284c..9318f059a8b5 100644 > > > --- a/drivers/vdpa/vdpa_user/iova_domain.c > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.c > > > @@ -283,7 +283,23 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > > return ret; > > > } > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain) > > > +{ > > > + struct page **pages; > > > + unsigned long count, i; > > > + > > > + if (!domain->user_bounce_pages) > > > + return NULL; > > > + > > > + count = domain->bounce_size >> PAGE_SHIFT; > > > + pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); > > > + for (i = 0; i < count; i++) > > > + pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); > > > + > > > + return pages; > > > +} > > > + > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, struct page **pages) > > > { > > > struct vduse_bounce_map *map; > > > unsigned long i, count; > > > @@ -294,15 +310,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > > > count = domain->bounce_size >> PAGE_SHIFT; > > > for (i = 0; i < count; i++) { > > > - struct page *page = NULL; > > > + struct page *page = pages[i]; > > > > > > map = &domain->bounce_maps[i]; > > > - if (WARN_ON(!map->bounce_page)) > > > + if (WARN_ON(!map->bounce_page)) { > > > + put_page(page); > > > continue; > > > + } > > > > > > /* Copy user page to kernel page if it's in use */ > > > if (map->orig_phys != INVALID_PHYS_ADDR) { > > > - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL); > > > memcpy_from_page(page_address(page), > > > map->bounce_page, 0, PAGE_SIZE); > > > } > > > @@ -310,6 +327,7 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > map->bounce_page = page; > > > } > > > domain->user_bounce_pages = false; > > > + kfree(pages); > > > out: > > > write_unlock(&domain->bounce_lock); > > > } > > > @@ -543,10 +561,13 @@ static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma) > > > static int vduse_domain_release(struct inode *inode, struct file *file) > > > { > > > struct vduse_iova_domain *domain = file->private_data; > > > + struct page **pages; > > > + > > > + pages = vduse_domain_alloc_pages_to_remove_bounce(domain); > > > > > > spin_lock(&domain->iotlb_lock); > > > vduse_iotlb_del_range(domain, 0, ULLONG_MAX); > > > - vduse_domain_remove_user_bounce_pages(domain); > > > + vduse_domain_remove_user_bounce_pages(domain, pages); > > > vduse_domain_free_kernel_bounce_pages(domain); > > > spin_unlock(&domain->iotlb_lock); > > > put_iova_domain(&domain->stream_iovad); > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h > > > index f92f22a7267d..17efa5555b3f 100644 > > > --- a/drivers/vdpa/vdpa_user/iova_domain.h > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.h > > > @@ -74,7 +74,10 @@ void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); > > > int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > > struct page **pages, int count); > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain); > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, > > > + struct page **pages); > > > + > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain); > > > > > > void vduse_domain_destroy(struct vduse_iova_domain *domain); > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > > > index 7ae99691efdf..5d8d5810df57 100644 > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > > @@ -1030,6 +1030,7 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev, > > > static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > > u64 iova, u64 size) > > > { > > > + struct page **pages; > > > int ret; > > > > > > mutex_lock(&dev->mem_lock); > > > @@ -1044,7 +1045,8 @@ static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > > if (dev->umem->iova != iova || size != dev->domain->bounce_size) > > > goto unlock; > > > > > > - vduse_domain_remove_user_bounce_pages(dev->domain); > > > + pages = vduse_domain_alloc_pages_to_remove_bounce(dev->domain); > > > + vduse_domain_remove_user_bounce_pages(dev->domain, pages); > > > unpin_user_pages_dirty_lock(dev->umem->pages, > > > dev->umem->npages, true); > > > atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm); > > > > We miss a kfree(pages); here? > no. > i've moved it into vduse_domain_remove_user_bounce_pages. Ok, but it seems tricky e.g allocated by the caller but freed in callee. And I think I missed some important issues in the previous review: The check of user_bounce_pages must be done under the bounce_lock, otherwise it might race with umem_reg. So in the case of release(), we know the device is gone, so there's no need to allocate pages that will be released soon. So we can pass NULL as a hint and just assign bounce_page to NULL in vduse_domain_remove_user_bounce_pages(). And in the case of vduse_dev_dereg_umem(), we need to allocate the pages without checking user_bounce_pages. So in vduse_domain_remove_user_bounce_pages() if we can free the allocated pages as well as the pages in the following check if (!domain->user_bounce_pages) goto out; What do you think? Thanks > > > > > Thanks > > > > > -- > > > 2.34.1 > > > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-07-31 3:58 ` Jason Wang @ 2024-07-31 4:11 ` Barry Song 2024-07-31 4:13 ` Jason Wang 0 siblings, 1 reply; 28+ messages in thread From: Barry Song @ 2024-07-31 4:11 UTC (permalink / raw) To: Jason Wang Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin On Wed, Jul 31, 2024 at 11:58 AM Jason Wang <jasowang@redhat.com> wrote: > > On Wed, Jul 31, 2024 at 11:15 AM Barry Song <21cnbao@gmail.com> wrote: > > > > On Wed, Jul 31, 2024 at 11:10 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Wed, Jul 31, 2024 at 8:03 AM Barry Song <21cnbao@gmail.com> wrote: > > > > > > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > loop within non-sleepable contexts. > > > > > > > > static inline struct page * > > > > __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > > > struct alloc_context *ac) > > > > { > > > > ... > > > > /* > > > > * Make sure that __GFP_NOFAIL request doesn't leak out and make sure > > > > * we always retry > > > > */ > > > > if (gfp_mask & __GFP_NOFAIL) { > > > > /* > > > > * All existing users of the __GFP_NOFAIL are blockable, so warn > > > > * of any new users that actually require GFP_NOWAIT > > > > */ > > > > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > > > > goto fail; > > > > ... > > > > } > > > > ... > > > > fail: > > > > warn_alloc(gfp_mask, ac->nodemask, > > > > "page allocation failure: order:%u", order); > > > > got_pg: > > > > return page; > > > > } > > > > > > > > Let's move the memory allocation out of the atomic context and use > > > > the normal sleepable context to get pages. > > > > > > > > [RFT]: This has only been compile-tested; I'd prefer if the VDPA maintainers > > > > handles it. > > > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > Cc: Jason Wang <jasowang@redhat.com> > > > > Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > Cc: "Eugenio Pérez" <eperezma@redhat.com> > > > > Cc: Maxime Coquelin <maxime.coquelin@redhat.com> > > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > > > --- > > > > drivers/vdpa/vdpa_user/iova_domain.c | 31 +++++++++++++++++++++++----- > > > > drivers/vdpa/vdpa_user/iova_domain.h | 5 ++++- > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 4 +++- > > > > 3 files changed, 33 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c > > > > index 791d38d6284c..9318f059a8b5 100644 > > > > --- a/drivers/vdpa/vdpa_user/iova_domain.c > > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.c > > > > @@ -283,7 +283,23 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > > > return ret; > > > > } > > > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain) > > > > +{ > > > > + struct page **pages; > > > > + unsigned long count, i; > > > > + > > > > + if (!domain->user_bounce_pages) > > > > + return NULL; > > > > + > > > > + count = domain->bounce_size >> PAGE_SHIFT; > > > > + pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); > > > > + for (i = 0; i < count; i++) > > > > + pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); > > > > + > > > > + return pages; > > > > +} > > > > + > > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, struct page **pages) > > > > { > > > > struct vduse_bounce_map *map; > > > > unsigned long i, count; > > > > @@ -294,15 +310,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > > > > > count = domain->bounce_size >> PAGE_SHIFT; > > > > for (i = 0; i < count; i++) { > > > > - struct page *page = NULL; > > > > + struct page *page = pages[i]; > > > > > > > > map = &domain->bounce_maps[i]; > > > > - if (WARN_ON(!map->bounce_page)) > > > > + if (WARN_ON(!map->bounce_page)) { > > > > + put_page(page); > > > > continue; > > > > + } > > > > > > > > /* Copy user page to kernel page if it's in use */ > > > > if (map->orig_phys != INVALID_PHYS_ADDR) { > > > > - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL); > > > > memcpy_from_page(page_address(page), > > > > map->bounce_page, 0, PAGE_SIZE); > > > > } > > > > @@ -310,6 +327,7 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > map->bounce_page = page; > > > > } > > > > domain->user_bounce_pages = false; > > > > + kfree(pages); > > > > out: > > > > write_unlock(&domain->bounce_lock); > > > > } > > > > @@ -543,10 +561,13 @@ static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma) > > > > static int vduse_domain_release(struct inode *inode, struct file *file) > > > > { > > > > struct vduse_iova_domain *domain = file->private_data; > > > > + struct page **pages; > > > > + > > > > + pages = vduse_domain_alloc_pages_to_remove_bounce(domain); > > > > > > > > spin_lock(&domain->iotlb_lock); > > > > vduse_iotlb_del_range(domain, 0, ULLONG_MAX); > > > > - vduse_domain_remove_user_bounce_pages(domain); > > > > + vduse_domain_remove_user_bounce_pages(domain, pages); > > > > vduse_domain_free_kernel_bounce_pages(domain); > > > > spin_unlock(&domain->iotlb_lock); > > > > put_iova_domain(&domain->stream_iovad); > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h > > > > index f92f22a7267d..17efa5555b3f 100644 > > > > --- a/drivers/vdpa/vdpa_user/iova_domain.h > > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.h > > > > @@ -74,7 +74,10 @@ void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); > > > > int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > > > struct page **pages, int count); > > > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain); > > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, > > > > + struct page **pages); > > > > + > > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain); > > > > > > > > void vduse_domain_destroy(struct vduse_iova_domain *domain); > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > > > > index 7ae99691efdf..5d8d5810df57 100644 > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > > > @@ -1030,6 +1030,7 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev, > > > > static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > > > u64 iova, u64 size) > > > > { > > > > + struct page **pages; > > > > int ret; > > > > > > > > mutex_lock(&dev->mem_lock); > > > > @@ -1044,7 +1045,8 @@ static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > > > if (dev->umem->iova != iova || size != dev->domain->bounce_size) > > > > goto unlock; > > > > > > > > - vduse_domain_remove_user_bounce_pages(dev->domain); > > > > + pages = vduse_domain_alloc_pages_to_remove_bounce(dev->domain); > > > > + vduse_domain_remove_user_bounce_pages(dev->domain, pages); > > > > unpin_user_pages_dirty_lock(dev->umem->pages, > > > > dev->umem->npages, true); > > > > atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm); > > > > > > We miss a kfree(pages); here? > > no. > > i've moved it into vduse_domain_remove_user_bounce_pages. > > Ok, but it seems tricky e.g allocated by the caller but freed in > callee. And I think I missed some important issues in the previous > review: The check of user_bounce_pages must be done under the > bounce_lock, otherwise it might race with umem_reg. > > So in the case of release(), we know the device is gone, so there's no > need to allocate pages that will be released soon. So we can pass NULL > as a hint and just assign bounce_page to NULL in > vduse_domain_remove_user_bounce_pages(). > > And in the case of vduse_dev_dereg_umem(), we need to allocate the > pages without checking user_bounce_pages. So in > vduse_domain_remove_user_bounce_pages() if we can free the allocated > pages as well as the pages in the following check > > if (!domain->user_bounce_pages) > goto out; > > What do you think? I am not a vdpa guy, but changing the current logic is another patch. From mm perspective, I can only address the __GFP_NOFAIL issue. I actually prefer you guys handle it directly:-) I'd rather report a BUG instead. TBH, I know nothing about vpda. > > Thanks > > > > > > > > > Thanks > > > > > > > -- > > > > 2.34.1 > > > > > > > Thanks Barry ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-07-31 4:11 ` Barry Song @ 2024-07-31 4:13 ` Jason Wang 2024-07-31 5:05 ` Barry Song 0 siblings, 1 reply; 28+ messages in thread From: Jason Wang @ 2024-07-31 4:13 UTC (permalink / raw) To: Barry Song Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin On Wed, Jul 31, 2024 at 12:12 PM Barry Song <21cnbao@gmail.com> wrote: > > On Wed, Jul 31, 2024 at 11:58 AM Jason Wang <jasowang@redhat.com> wrote: > > > > On Wed, Jul 31, 2024 at 11:15 AM Barry Song <21cnbao@gmail.com> wrote: > > > > > > On Wed, Jul 31, 2024 at 11:10 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Wed, Jul 31, 2024 at 8:03 AM Barry Song <21cnbao@gmail.com> wrote: > > > > > > > > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > > loop within non-sleepable contexts. > > > > > > > > > > static inline struct page * > > > > > __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > > > > struct alloc_context *ac) > > > > > { > > > > > ... > > > > > /* > > > > > * Make sure that __GFP_NOFAIL request doesn't leak out and make sure > > > > > * we always retry > > > > > */ > > > > > if (gfp_mask & __GFP_NOFAIL) { > > > > > /* > > > > > * All existing users of the __GFP_NOFAIL are blockable, so warn > > > > > * of any new users that actually require GFP_NOWAIT > > > > > */ > > > > > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > > > > > goto fail; > > > > > ... > > > > > } > > > > > ... > > > > > fail: > > > > > warn_alloc(gfp_mask, ac->nodemask, > > > > > "page allocation failure: order:%u", order); > > > > > got_pg: > > > > > return page; > > > > > } > > > > > > > > > > Let's move the memory allocation out of the atomic context and use > > > > > the normal sleepable context to get pages. > > > > > > > > > > [RFT]: This has only been compile-tested; I'd prefer if the VDPA maintainers > > > > > handles it. > > > > > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > Cc: Jason Wang <jasowang@redhat.com> > > > > > Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > Cc: "Eugenio Pérez" <eperezma@redhat.com> > > > > > Cc: Maxime Coquelin <maxime.coquelin@redhat.com> > > > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > > > > --- > > > > > drivers/vdpa/vdpa_user/iova_domain.c | 31 +++++++++++++++++++++++----- > > > > > drivers/vdpa/vdpa_user/iova_domain.h | 5 ++++- > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 4 +++- > > > > > 3 files changed, 33 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c > > > > > index 791d38d6284c..9318f059a8b5 100644 > > > > > --- a/drivers/vdpa/vdpa_user/iova_domain.c > > > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.c > > > > > @@ -283,7 +283,23 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > > > > return ret; > > > > > } > > > > > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain) > > > > > +{ > > > > > + struct page **pages; > > > > > + unsigned long count, i; > > > > > + > > > > > + if (!domain->user_bounce_pages) > > > > > + return NULL; > > > > > + > > > > > + count = domain->bounce_size >> PAGE_SHIFT; > > > > > + pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); > > > > > + for (i = 0; i < count; i++) > > > > > + pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); > > > > > + > > > > > + return pages; > > > > > +} > > > > > + > > > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, struct page **pages) > > > > > { > > > > > struct vduse_bounce_map *map; > > > > > unsigned long i, count; > > > > > @@ -294,15 +310,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > > > > > > > count = domain->bounce_size >> PAGE_SHIFT; > > > > > for (i = 0; i < count; i++) { > > > > > - struct page *page = NULL; > > > > > + struct page *page = pages[i]; > > > > > > > > > > map = &domain->bounce_maps[i]; > > > > > - if (WARN_ON(!map->bounce_page)) > > > > > + if (WARN_ON(!map->bounce_page)) { > > > > > + put_page(page); > > > > > continue; > > > > > + } > > > > > > > > > > /* Copy user page to kernel page if it's in use */ > > > > > if (map->orig_phys != INVALID_PHYS_ADDR) { > > > > > - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL); > > > > > memcpy_from_page(page_address(page), > > > > > map->bounce_page, 0, PAGE_SIZE); > > > > > } > > > > > @@ -310,6 +327,7 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > > map->bounce_page = page; > > > > > } > > > > > domain->user_bounce_pages = false; > > > > > + kfree(pages); > > > > > out: > > > > > write_unlock(&domain->bounce_lock); > > > > > } > > > > > @@ -543,10 +561,13 @@ static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma) > > > > > static int vduse_domain_release(struct inode *inode, struct file *file) > > > > > { > > > > > struct vduse_iova_domain *domain = file->private_data; > > > > > + struct page **pages; > > > > > + > > > > > + pages = vduse_domain_alloc_pages_to_remove_bounce(domain); > > > > > > > > > > spin_lock(&domain->iotlb_lock); > > > > > vduse_iotlb_del_range(domain, 0, ULLONG_MAX); > > > > > - vduse_domain_remove_user_bounce_pages(domain); > > > > > + vduse_domain_remove_user_bounce_pages(domain, pages); > > > > > vduse_domain_free_kernel_bounce_pages(domain); > > > > > spin_unlock(&domain->iotlb_lock); > > > > > put_iova_domain(&domain->stream_iovad); > > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h > > > > > index f92f22a7267d..17efa5555b3f 100644 > > > > > --- a/drivers/vdpa/vdpa_user/iova_domain.h > > > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.h > > > > > @@ -74,7 +74,10 @@ void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); > > > > > int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > > > > struct page **pages, int count); > > > > > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain); > > > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, > > > > > + struct page **pages); > > > > > + > > > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain); > > > > > > > > > > void vduse_domain_destroy(struct vduse_iova_domain *domain); > > > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > index 7ae99691efdf..5d8d5810df57 100644 > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > @@ -1030,6 +1030,7 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev, > > > > > static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > > > > u64 iova, u64 size) > > > > > { > > > > > + struct page **pages; > > > > > int ret; > > > > > > > > > > mutex_lock(&dev->mem_lock); > > > > > @@ -1044,7 +1045,8 @@ static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > > > > if (dev->umem->iova != iova || size != dev->domain->bounce_size) > > > > > goto unlock; > > > > > > > > > > - vduse_domain_remove_user_bounce_pages(dev->domain); > > > > > + pages = vduse_domain_alloc_pages_to_remove_bounce(dev->domain); > > > > > + vduse_domain_remove_user_bounce_pages(dev->domain, pages); > > > > > unpin_user_pages_dirty_lock(dev->umem->pages, > > > > > dev->umem->npages, true); > > > > > atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm); > > > > > > > > We miss a kfree(pages); here? > > > no. > > > i've moved it into vduse_domain_remove_user_bounce_pages. > > > > Ok, but it seems tricky e.g allocated by the caller but freed in > > callee. And I think I missed some important issues in the previous > > review: The check of user_bounce_pages must be done under the > > bounce_lock, otherwise it might race with umem_reg. > > > > So in the case of release(), we know the device is gone, so there's no > > need to allocate pages that will be released soon. So we can pass NULL > > as a hint and just assign bounce_page to NULL in > > vduse_domain_remove_user_bounce_pages(). > > > > And in the case of vduse_dev_dereg_umem(), we need to allocate the > > pages without checking user_bounce_pages. So in > > vduse_domain_remove_user_bounce_pages() if we can free the allocated > > pages as well as the pages in the following check > > > > if (!domain->user_bounce_pages) > > goto out; > > > > What do you think? > > I am not a vdpa guy, but changing the current logic is another patch. > From mm perspective, I can only address the __GFP_NOFAIL issue. > > I actually prefer you guys handle it directly:-) I'd rather report a BUG > instead. TBH, I know nothing about vpda. Fine, let me post a patch for this (no later than the end of this week). Thanks > > > > > Thanks > > > > > > > > > > > > > Thanks > > > > > > > > > -- > > > > > 2.34.1 > > > > > > > > > > Thanks > Barry > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-07-31 4:13 ` Jason Wang @ 2024-07-31 5:05 ` Barry Song 2024-07-31 10:20 ` Tetsuo Handa 2024-08-01 2:30 ` Jason Wang 0 siblings, 2 replies; 28+ messages in thread From: Barry Song @ 2024-07-31 5:05 UTC (permalink / raw) To: Jason Wang, penguin-kernel Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin On Wed, Jul 31, 2024 at 12:13 PM Jason Wang <jasowang@redhat.com> wrote: > > On Wed, Jul 31, 2024 at 12:12 PM Barry Song <21cnbao@gmail.com> wrote: > > > > On Wed, Jul 31, 2024 at 11:58 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Wed, Jul 31, 2024 at 11:15 AM Barry Song <21cnbao@gmail.com> wrote: > > > > > > > > On Wed, Jul 31, 2024 at 11:10 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Wed, Jul 31, 2024 at 8:03 AM Barry Song <21cnbao@gmail.com> wrote: > > > > > > > > > > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > > > loop within non-sleepable contexts. > > > > > > > > > > > > static inline struct page * > > > > > > __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > > > > > struct alloc_context *ac) > > > > > > { > > > > > > ... > > > > > > /* > > > > > > * Make sure that __GFP_NOFAIL request doesn't leak out and make sure > > > > > > * we always retry > > > > > > */ > > > > > > if (gfp_mask & __GFP_NOFAIL) { > > > > > > /* > > > > > > * All existing users of the __GFP_NOFAIL are blockable, so warn > > > > > > * of any new users that actually require GFP_NOWAIT > > > > > > */ > > > > > > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > > > > > > goto fail; > > > > > > ... > > > > > > } > > > > > > ... > > > > > > fail: > > > > > > warn_alloc(gfp_mask, ac->nodemask, > > > > > > "page allocation failure: order:%u", order); > > > > > > got_pg: > > > > > > return page; > > > > > > } > > > > > > > > > > > > Let's move the memory allocation out of the atomic context and use > > > > > > the normal sleepable context to get pages. > > > > > > > > > > > > [RFT]: This has only been compile-tested; I'd prefer if the VDPA maintainers > > > > > > handles it. > > > > > > > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > Cc: Jason Wang <jasowang@redhat.com> > > > > > > Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > Cc: "Eugenio Pérez" <eperezma@redhat.com> > > > > > > Cc: Maxime Coquelin <maxime.coquelin@redhat.com> > > > > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > > > > > --- > > > > > > drivers/vdpa/vdpa_user/iova_domain.c | 31 +++++++++++++++++++++++----- > > > > > > drivers/vdpa/vdpa_user/iova_domain.h | 5 ++++- > > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 4 +++- > > > > > > 3 files changed, 33 insertions(+), 7 deletions(-) > > > > > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c > > > > > > index 791d38d6284c..9318f059a8b5 100644 > > > > > > --- a/drivers/vdpa/vdpa_user/iova_domain.c > > > > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.c > > > > > > @@ -283,7 +283,23 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > > > > > return ret; > > > > > > } > > > > > > > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain) > > > > > > +{ > > > > > > + struct page **pages; > > > > > > + unsigned long count, i; > > > > > > + > > > > > > + if (!domain->user_bounce_pages) > > > > > > + return NULL; > > > > > > + > > > > > > + count = domain->bounce_size >> PAGE_SHIFT; > > > > > > + pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); > > > > > > + for (i = 0; i < count; i++) > > > > > > + pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); > > > > > > + > > > > > > + return pages; > > > > > > +} > > > > > > + > > > > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, struct page **pages) > > > > > > { > > > > > > struct vduse_bounce_map *map; > > > > > > unsigned long i, count; > > > > > > @@ -294,15 +310,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > > > > > > > > > count = domain->bounce_size >> PAGE_SHIFT; > > > > > > for (i = 0; i < count; i++) { > > > > > > - struct page *page = NULL; > > > > > > + struct page *page = pages[i]; > > > > > > > > > > > > map = &domain->bounce_maps[i]; > > > > > > - if (WARN_ON(!map->bounce_page)) > > > > > > + if (WARN_ON(!map->bounce_page)) { > > > > > > + put_page(page); > > > > > > continue; > > > > > > + } > > > > > > > > > > > > /* Copy user page to kernel page if it's in use */ > > > > > > if (map->orig_phys != INVALID_PHYS_ADDR) { > > > > > > - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL); > > > > > > memcpy_from_page(page_address(page), > > > > > > map->bounce_page, 0, PAGE_SIZE); > > > > > > } > > > > > > @@ -310,6 +327,7 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > > > map->bounce_page = page; > > > > > > } > > > > > > domain->user_bounce_pages = false; > > > > > > + kfree(pages); > > > > > > out: > > > > > > write_unlock(&domain->bounce_lock); > > > > > > } > > > > > > @@ -543,10 +561,13 @@ static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma) > > > > > > static int vduse_domain_release(struct inode *inode, struct file *file) > > > > > > { > > > > > > struct vduse_iova_domain *domain = file->private_data; > > > > > > + struct page **pages; > > > > > > + > > > > > > + pages = vduse_domain_alloc_pages_to_remove_bounce(domain); > > > > > > > > > > > > spin_lock(&domain->iotlb_lock); > > > > > > vduse_iotlb_del_range(domain, 0, ULLONG_MAX); > > > > > > - vduse_domain_remove_user_bounce_pages(domain); > > > > > > + vduse_domain_remove_user_bounce_pages(domain, pages); > > > > > > vduse_domain_free_kernel_bounce_pages(domain); > > > > > > spin_unlock(&domain->iotlb_lock); > > > > > > put_iova_domain(&domain->stream_iovad); > > > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h > > > > > > index f92f22a7267d..17efa5555b3f 100644 > > > > > > --- a/drivers/vdpa/vdpa_user/iova_domain.h > > > > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.h > > > > > > @@ -74,7 +74,10 @@ void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); > > > > > > int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > > > > > struct page **pages, int count); > > > > > > > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain); > > > > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, > > > > > > + struct page **pages); > > > > > > + > > > > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain); > > > > > > > > > > > > void vduse_domain_destroy(struct vduse_iova_domain *domain); > > > > > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > > index 7ae99691efdf..5d8d5810df57 100644 > > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > > @@ -1030,6 +1030,7 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev, > > > > > > static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > > > > > u64 iova, u64 size) > > > > > > { > > > > > > + struct page **pages; > > > > > > int ret; > > > > > > > > > > > > mutex_lock(&dev->mem_lock); > > > > > > @@ -1044,7 +1045,8 @@ static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > > > > > if (dev->umem->iova != iova || size != dev->domain->bounce_size) > > > > > > goto unlock; > > > > > > > > > > > > - vduse_domain_remove_user_bounce_pages(dev->domain); > > > > > > + pages = vduse_domain_alloc_pages_to_remove_bounce(dev->domain); > > > > > > + vduse_domain_remove_user_bounce_pages(dev->domain, pages); > > > > > > unpin_user_pages_dirty_lock(dev->umem->pages, > > > > > > dev->umem->npages, true); > > > > > > atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm); > > > > > > > > > > We miss a kfree(pages); here? > > > > no. > > > > i've moved it into vduse_domain_remove_user_bounce_pages. > > > > > > Ok, but it seems tricky e.g allocated by the caller but freed in > > > callee. And I think I missed some important issues in the previous > > > review: The check of user_bounce_pages must be done under the > > > bounce_lock, otherwise it might race with umem_reg. > > > > > > So in the case of release(), we know the device is gone, so there's no > > > need to allocate pages that will be released soon. So we can pass NULL > > > as a hint and just assign bounce_page to NULL in > > > vduse_domain_remove_user_bounce_pages(). > > > > > > And in the case of vduse_dev_dereg_umem(), we need to allocate the > > > pages without checking user_bounce_pages. So in > > > vduse_domain_remove_user_bounce_pages() if we can free the allocated > > > pages as well as the pages in the following check > > > > > > if (!domain->user_bounce_pages) > > > goto out; > > > > > > What do you think? > > > > I am not a vdpa guy, but changing the current logic is another patch. > > From mm perspective, I can only address the __GFP_NOFAIL issue. > > > > I actually prefer you guys handle it directly:-) I'd rather report a BUG > > instead. TBH, I know nothing about vpda. > > Fine, let me post a patch for this (no later than the end of this week). > Jason, Thank you very much. Also, Tetsuo reminded me that kmalloc_array() might be problematic if the count is too large: pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); You might want to consider using vmalloc_array() or kvmalloc_array() instead when you send a new version. > Thanks > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > -- > > > > > > 2.34.1 > > > > > > > > > > > Thanks Barry ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-07-31 5:05 ` Barry Song @ 2024-07-31 10:20 ` Tetsuo Handa 2024-08-01 2:37 ` Jason Wang 2024-08-01 2:30 ` Jason Wang 1 sibling, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2024-07-31 10:20 UTC (permalink / raw) To: Barry Song, Jason Wang Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin On 2024/07/31 14:05, Barry Song wrote: > Jason, > Thank you very much. Also, Tetsuo reminded me that kmalloc_array() might be > problematic if the count is too large: > pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); If "count" is guaranteed to be count <= 16, this might be tolerable. Consider a situation where current thread was chosen as an global OOM victim. Trying to allocate "count" pages using for (i = 0; i < count; i++) pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); is not good. > > You might want to consider using vmalloc_array() or kvmalloc_array() instead > when you send a new version. There is a limitation at https://elixir.bootlin.com/linux/v6.11-rc1/source/mm/page_alloc.c#L3033 that you must satisfy count <= PAGE_SIZE * 2 / sizeof(*pages) if you use __GFP_NOFAIL. But as already explained above, allocating 1024 pages (assuming PAGE_SIZE is 4096 and pointer size is 8) when current thread was chosen as an OOM victim is not recommended. You should implement proper error handling instead of using __GFP_NOFAIL if count can become large. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-07-31 10:20 ` Tetsuo Handa @ 2024-08-01 2:37 ` Jason Wang 2024-08-05 1:32 ` Barry Song 0 siblings, 1 reply; 28+ messages in thread From: Jason Wang @ 2024-08-01 2:37 UTC (permalink / raw) To: Tetsuo Handa Cc: Barry Song, akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin On Wed, Jul 31, 2024 at 6:21 PM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2024/07/31 14:05, Barry Song wrote: > > Jason, > > Thank you very much. Also, Tetsuo reminded me that kmalloc_array() might be > > problematic if the count is too large: > > pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); > > If "count" is guaranteed to be count <= 16, this might be tolerable. It's not unfortunately, the maximum bounce buffer size is: #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024) > > Consider a situation where current thread was chosen as an global OOM victim. > Trying to allocate "count" pages using > > for (i = 0; i < count; i++) > pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); > > is not good. Right, I wonder if we need to add a shrink to reclaim the pages that belong to VDUSE bounce pages. > > > > > You might want to consider using vmalloc_array() or kvmalloc_array() instead > > when you send a new version. > > There is a limitation at https://elixir.bootlin.com/linux/v6.11-rc1/source/mm/page_alloc.c#L3033 > that you must satisfy count <= PAGE_SIZE * 2 / sizeof(*pages) if you use __GFP_NOFAIL. > > But as already explained above, allocating 1024 pages (assuming PAGE_SIZE is 4096 and > pointer size is 8) when current thread was chosen as an OOM victim is not recommended. > You should implement proper error handling instead of using __GFP_NOFAIL if count can > become large. I think I need to consider a way to avoid __GFP_NOFAIL. A easy way is not to free the kernel bounce pages, then we don't need to allocate them again. Thanks > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-08-01 2:37 ` Jason Wang @ 2024-08-05 1:32 ` Barry Song 2024-08-05 8:19 ` Jason Wang 0 siblings, 1 reply; 28+ messages in thread From: Barry Song @ 2024-08-05 1:32 UTC (permalink / raw) To: Jason Wang Cc: Tetsuo Handa, akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin On Thu, Aug 1, 2024 at 2:37 PM Jason Wang <jasowang@redhat.com> wrote: > > On Wed, Jul 31, 2024 at 6:21 PM Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > On 2024/07/31 14:05, Barry Song wrote: > > > Jason, > > > Thank you very much. Also, Tetsuo reminded me that kmalloc_array() might be > > > problematic if the count is too large: > > > pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); > > > > If "count" is guaranteed to be count <= 16, this might be tolerable. > > It's not unfortunately, the maximum bounce buffer size is: > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024) > > > > > Consider a situation where current thread was chosen as an global OOM victim. > > Trying to allocate "count" pages using > > > > for (i = 0; i < count; i++) > > pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); > > > > is not good. > > Right, I wonder if we need to add a shrink to reclaim the pages that > belong to VDUSE bounce pages. > > > > > > > > > You might want to consider using vmalloc_array() or kvmalloc_array() instead > > > when you send a new version. > > > > There is a limitation at https://elixir.bootlin.com/linux/v6.11-rc1/source/mm/page_alloc.c#L3033 > > that you must satisfy count <= PAGE_SIZE * 2 / sizeof(*pages) if you use __GFP_NOFAIL. > > > > But as already explained above, allocating 1024 pages (assuming PAGE_SIZE is 4096 and > > pointer size is 8) when current thread was chosen as an OOM victim is not recommended. > > You should implement proper error handling instead of using __GFP_NOFAIL if count can > > become large. > > I think I need to consider a way to avoid __GFP_NOFAIL. A easy way is > not to free the kernel bounce pages, then we don't need to allocate > them again. Let's try to do a fix for this patch as we are waiting for your official patch in mm. I guess, further optimization can be a separate patch later in the driver's tree :-) > > Thanks > > > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-08-05 1:32 ` Barry Song @ 2024-08-05 8:19 ` Jason Wang 0 siblings, 0 replies; 28+ messages in thread From: Jason Wang @ 2024-08-05 8:19 UTC (permalink / raw) To: Barry Song Cc: Tetsuo Handa, akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin, Yongji Xie On Mon, Aug 5, 2024 at 9:32 AM Barry Song <21cnbao@gmail.com> wrote: > > On Thu, Aug 1, 2024 at 2:37 PM Jason Wang <jasowang@redhat.com> wrote: > > > > On Wed, Jul 31, 2024 at 6:21 PM Tetsuo Handa > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > On 2024/07/31 14:05, Barry Song wrote: > > > > Jason, > > > > Thank you very much. Also, Tetsuo reminded me that kmalloc_array() might be > > > > problematic if the count is too large: > > > > pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); > > > > > > If "count" is guaranteed to be count <= 16, this might be tolerable. > > > > It's not unfortunately, the maximum bounce buffer size is: > > > > #define VDUSE_MAX_BOUNCE_SIZE (1024 * 1024 * 1024) > > > > > > > > Consider a situation where current thread was chosen as an global OOM victim. > > > Trying to allocate "count" pages using > > > > > > for (i = 0; i < count; i++) > > > pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); > > > > > > is not good. > > > > Right, I wonder if we need to add a shrink to reclaim the pages that > > belong to VDUSE bounce pages. > > > > > > > > > > > > > You might want to consider using vmalloc_array() or kvmalloc_array() instead > > > > when you send a new version. > > > > > > There is a limitation at https://elixir.bootlin.com/linux/v6.11-rc1/source/mm/page_alloc.c#L3033 > > > that you must satisfy count <= PAGE_SIZE * 2 / sizeof(*pages) if you use __GFP_NOFAIL. > > > > > > But as already explained above, allocating 1024 pages (assuming PAGE_SIZE is 4096 and > > > pointer size is 8) when current thread was chosen as an OOM victim is not recommended. > > > You should implement proper error handling instead of using __GFP_NOFAIL if count can > > > become large. > > > > I think I need to consider a way to avoid __GFP_NOFAIL. A easy way is > > not to free the kernel bounce pages, then we don't need to allocate > > them again. > > Let's try to do a fix for this patch as we are waiting for your official patch > in mm. Will post this soon. One note here is that I don't have handy usersapce that uses userspace bounce pages (neither libvduse nor DPDK did that). I hope YongJi can review and give a test on that. Thanks > > I guess, further optimization can be a separate patch later in the driver's > tree :-) > > > > > Thanks > > > > > > > > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL 2024-07-31 5:05 ` Barry Song 2024-07-31 10:20 ` Tetsuo Handa @ 2024-08-01 2:30 ` Jason Wang 1 sibling, 0 replies; 28+ messages in thread From: Jason Wang @ 2024-08-01 2:30 UTC (permalink / raw) To: Barry Song Cc: penguin-kernel, akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Michael S. Tsirkin, Xuan Zhuo, Eugenio Pérez, Maxime Coquelin On Wed, Jul 31, 2024 at 1:05 PM Barry Song <21cnbao@gmail.com> wrote: > > On Wed, Jul 31, 2024 at 12:13 PM Jason Wang <jasowang@redhat.com> wrote: > > > > On Wed, Jul 31, 2024 at 12:12 PM Barry Song <21cnbao@gmail.com> wrote: > > > > > > On Wed, Jul 31, 2024 at 11:58 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Wed, Jul 31, 2024 at 11:15 AM Barry Song <21cnbao@gmail.com> wrote: > > > > > > > > > > On Wed, Jul 31, 2024 at 11:10 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > On Wed, Jul 31, 2024 at 8:03 AM Barry Song <21cnbao@gmail.com> wrote: > > > > > > > > > > > > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > > > > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > > > > loop within non-sleepable contexts. > > > > > > > > > > > > > > static inline struct page * > > > > > > > __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > > > > > > struct alloc_context *ac) > > > > > > > { > > > > > > > ... > > > > > > > /* > > > > > > > * Make sure that __GFP_NOFAIL request doesn't leak out and make sure > > > > > > > * we always retry > > > > > > > */ > > > > > > > if (gfp_mask & __GFP_NOFAIL) { > > > > > > > /* > > > > > > > * All existing users of the __GFP_NOFAIL are blockable, so warn > > > > > > > * of any new users that actually require GFP_NOWAIT > > > > > > > */ > > > > > > > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > > > > > > > goto fail; > > > > > > > ... > > > > > > > } > > > > > > > ... > > > > > > > fail: > > > > > > > warn_alloc(gfp_mask, ac->nodemask, > > > > > > > "page allocation failure: order:%u", order); > > > > > > > got_pg: > > > > > > > return page; > > > > > > > } > > > > > > > > > > > > > > Let's move the memory allocation out of the atomic context and use > > > > > > > the normal sleepable context to get pages. > > > > > > > > > > > > > > [RFT]: This has only been compile-tested; I'd prefer if the VDPA maintainers > > > > > > > handles it. > > > > > > > > > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > > Cc: Jason Wang <jasowang@redhat.com> > > > > > > > Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > Cc: "Eugenio Pérez" <eperezma@redhat.com> > > > > > > > Cc: Maxime Coquelin <maxime.coquelin@redhat.com> > > > > > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > > > > > > --- > > > > > > > drivers/vdpa/vdpa_user/iova_domain.c | 31 +++++++++++++++++++++++----- > > > > > > > drivers/vdpa/vdpa_user/iova_domain.h | 5 ++++- > > > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 4 +++- > > > > > > > 3 files changed, 33 insertions(+), 7 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c > > > > > > > index 791d38d6284c..9318f059a8b5 100644 > > > > > > > --- a/drivers/vdpa/vdpa_user/iova_domain.c > > > > > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.c > > > > > > > @@ -283,7 +283,23 @@ int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > > > > > > return ret; > > > > > > > } > > > > > > > > > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain) > > > > > > > +{ > > > > > > > + struct page **pages; > > > > > > > + unsigned long count, i; > > > > > > > + > > > > > > > + if (!domain->user_bounce_pages) > > > > > > > + return NULL; > > > > > > > + > > > > > > > + count = domain->bounce_size >> PAGE_SHIFT; > > > > > > > + pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); > > > > > > > + for (i = 0; i < count; i++) > > > > > > > + pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL); > > > > > > > + > > > > > > > + return pages; > > > > > > > +} > > > > > > > + > > > > > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, struct page **pages) > > > > > > > { > > > > > > > struct vduse_bounce_map *map; > > > > > > > unsigned long i, count; > > > > > > > @@ -294,15 +310,16 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > > > > > > > > > > > count = domain->bounce_size >> PAGE_SHIFT; > > > > > > > for (i = 0; i < count; i++) { > > > > > > > - struct page *page = NULL; > > > > > > > + struct page *page = pages[i]; > > > > > > > > > > > > > > map = &domain->bounce_maps[i]; > > > > > > > - if (WARN_ON(!map->bounce_page)) > > > > > > > + if (WARN_ON(!map->bounce_page)) { > > > > > > > + put_page(page); > > > > > > > continue; > > > > > > > + } > > > > > > > > > > > > > > /* Copy user page to kernel page if it's in use */ > > > > > > > if (map->orig_phys != INVALID_PHYS_ADDR) { > > > > > > > - page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL); > > > > > > > memcpy_from_page(page_address(page), > > > > > > > map->bounce_page, 0, PAGE_SIZE); > > > > > > > } > > > > > > > @@ -310,6 +327,7 @@ void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain) > > > > > > > map->bounce_page = page; > > > > > > > } > > > > > > > domain->user_bounce_pages = false; > > > > > > > + kfree(pages); > > > > > > > out: > > > > > > > write_unlock(&domain->bounce_lock); > > > > > > > } > > > > > > > @@ -543,10 +561,13 @@ static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma) > > > > > > > static int vduse_domain_release(struct inode *inode, struct file *file) > > > > > > > { > > > > > > > struct vduse_iova_domain *domain = file->private_data; > > > > > > > + struct page **pages; > > > > > > > + > > > > > > > + pages = vduse_domain_alloc_pages_to_remove_bounce(domain); > > > > > > > > > > > > > > spin_lock(&domain->iotlb_lock); > > > > > > > vduse_iotlb_del_range(domain, 0, ULLONG_MAX); > > > > > > > - vduse_domain_remove_user_bounce_pages(domain); > > > > > > > + vduse_domain_remove_user_bounce_pages(domain, pages); > > > > > > > vduse_domain_free_kernel_bounce_pages(domain); > > > > > > > spin_unlock(&domain->iotlb_lock); > > > > > > > put_iova_domain(&domain->stream_iovad); > > > > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h > > > > > > > index f92f22a7267d..17efa5555b3f 100644 > > > > > > > --- a/drivers/vdpa/vdpa_user/iova_domain.h > > > > > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.h > > > > > > > @@ -74,7 +74,10 @@ void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); > > > > > > > int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain, > > > > > > > struct page **pages, int count); > > > > > > > > > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain); > > > > > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, > > > > > > > + struct page **pages); > > > > > > > + > > > > > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain); > > > > > > > > > > > > > > void vduse_domain_destroy(struct vduse_iova_domain *domain); > > > > > > > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > > > index 7ae99691efdf..5d8d5810df57 100644 > > > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > > > @@ -1030,6 +1030,7 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev, > > > > > > > static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > > > > > > u64 iova, u64 size) > > > > > > > { > > > > > > > + struct page **pages; > > > > > > > int ret; > > > > > > > > > > > > > > mutex_lock(&dev->mem_lock); > > > > > > > @@ -1044,7 +1045,8 @@ static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > > > > > > if (dev->umem->iova != iova || size != dev->domain->bounce_size) > > > > > > > goto unlock; > > > > > > > > > > > > > > - vduse_domain_remove_user_bounce_pages(dev->domain); > > > > > > > + pages = vduse_domain_alloc_pages_to_remove_bounce(dev->domain); > > > > > > > + vduse_domain_remove_user_bounce_pages(dev->domain, pages); > > > > > > > unpin_user_pages_dirty_lock(dev->umem->pages, > > > > > > > dev->umem->npages, true); > > > > > > > atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm); > > > > > > > > > > > > We miss a kfree(pages); here? > > > > > no. > > > > > i've moved it into vduse_domain_remove_user_bounce_pages. > > > > > > > > Ok, but it seems tricky e.g allocated by the caller but freed in > > > > callee. And I think I missed some important issues in the previous > > > > review: The check of user_bounce_pages must be done under the > > > > bounce_lock, otherwise it might race with umem_reg. > > > > > > > > So in the case of release(), we know the device is gone, so there's no > > > > need to allocate pages that will be released soon. So we can pass NULL > > > > as a hint and just assign bounce_page to NULL in > > > > vduse_domain_remove_user_bounce_pages(). > > > > > > > > And in the case of vduse_dev_dereg_umem(), we need to allocate the > > > > pages without checking user_bounce_pages. So in > > > > vduse_domain_remove_user_bounce_pages() if we can free the allocated > > > > pages as well as the pages in the following check > > > > > > > > if (!domain->user_bounce_pages) > > > > goto out; > > > > > > > > What do you think? > > > > > > I am not a vdpa guy, but changing the current logic is another patch. > > > From mm perspective, I can only address the __GFP_NOFAIL issue. > > > > > > I actually prefer you guys handle it directly:-) I'd rather report a BUG > > > instead. TBH, I know nothing about vpda. > > > > Fine, let me post a patch for this (no later than the end of this week). > > > Jason, > Thank you very much. Also, Tetsuo reminded me that kmalloc_array() might be > problematic if the count is too large: > pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL); > I think it's a side effect of __GFP_NOFAIL? > You might want to consider using vmalloc_array() or kvmalloc_array() instead > when you send a new version. Will consider that. Thanks > > > Thanks > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > -- > > > > > > > 2.34.1 > > > > > > > > > > > > > > > Thanks > Barry > ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 2/4] mm: Document __GFP_NOFAIL must be blockable 2024-07-31 0:01 [PATCH v2 0/4] mm: clarify nofail memory allocation Barry Song 2024-07-31 0:01 ` [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL Barry Song @ 2024-07-31 0:01 ` Barry Song 2024-07-31 10:18 ` Vlastimil Babka 2024-07-31 16:26 ` Christoph Hellwig 2024-07-31 0:01 ` [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails Barry Song 2024-07-31 0:01 ` [PATCH v2 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL Barry Song 3 siblings, 2 replies; 28+ messages in thread From: Barry Song @ 2024-07-31 0:01 UTC (permalink / raw) To: akpm, linux-mm Cc: 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization From: Barry Song <v-songbaohua@oppo.com> Non-blocking allocation with __GFP_NOFAIL is not supported and may still result in NULL pointers (if we don't return NULL, we result in busy-loop within non-sleepable contexts): static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct alloc_context *ac) { ... /* * Make sure that __GFP_NOFAIL request doesn't leak out and make sure * we always retry */ if (gfp_mask & __GFP_NOFAIL) { /* * All existing users of the __GFP_NOFAIL are blockable, so warn * of any new users that actually require GFP_NOWAIT */ if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) goto fail; ... } ... fail: warn_alloc(gfp_mask, ac->nodemask, "page allocation failure: order:%u", order); got_pg: return page; } Highlight this in the documentation of __GFP_NOFAIL so that non-mm subsystems can reject any illegal usage of __GFP_NOFAIL with GFP_ATOMIC, GFP_NOWAIT, etc. Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Barry Song <v-songbaohua@oppo.com> --- include/linux/gfp_types.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h index 313be4ad79fd..4a1fa7706b0c 100644 --- a/include/linux/gfp_types.h +++ b/include/linux/gfp_types.h @@ -215,7 +215,8 @@ enum { * the caller still has to check for failures) while costly requests try to be * not disruptive and back off even without invoking the OOM killer. * The following three modifiers might be used to override some of these - * implicit rules. + * implicit rules. Please note that all of them must be used along with + * %__GFP_DIRECT_RECLAIM flag. * * %__GFP_NORETRY: The VM implementation will try only very lightweight * memory direct reclaim to get some memory under memory pressure (thus @@ -246,6 +247,8 @@ enum { * cannot handle allocation failures. The allocation could block * indefinitely but will never return with failure. Testing for * failure is pointless. + * It _must_ be blockable and used together with __GFP_DIRECT_RECLAIM. + * It should _never_ be used in non-sleepable contexts. * New users should be evaluated carefully (and the flag should be * used only when there is no reasonable failure policy) but it is * definitely preferable to use the flag rather than opencode endless -- 2.34.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2 2/4] mm: Document __GFP_NOFAIL must be blockable 2024-07-31 0:01 ` [PATCH v2 2/4] mm: Document __GFP_NOFAIL must be blockable Barry Song @ 2024-07-31 10:18 ` Vlastimil Babka 2024-07-31 16:26 ` Christoph Hellwig 1 sibling, 0 replies; 28+ messages in thread From: Vlastimil Babka @ 2024-07-31 10:18 UTC (permalink / raw) To: Barry Song, akpm, linux-mm Cc: 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, virtualization On 7/31/24 2:01 AM, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > Non-blocking allocation with __GFP_NOFAIL is not supported and may > still result in NULL pointers (if we don't return NULL, we result > in busy-loop within non-sleepable contexts): > > static inline struct page * > __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > struct alloc_context *ac) > { > ... > /* > * Make sure that __GFP_NOFAIL request doesn't leak out and make sure > * we always retry > */ > if (gfp_mask & __GFP_NOFAIL) { > /* > * All existing users of the __GFP_NOFAIL are blockable, so warn > * of any new users that actually require GFP_NOWAIT > */ > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > goto fail; > ... > } > ... > fail: > warn_alloc(gfp_mask, ac->nodemask, > "page allocation failure: order:%u", order); > got_pg: > return page; > } > > Highlight this in the documentation of __GFP_NOFAIL so that non-mm > subsystems can reject any illegal usage of __GFP_NOFAIL with > GFP_ATOMIC, GFP_NOWAIT, etc. > > Acked-by: Michal Hocko <mhocko@suse.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > include/linux/gfp_types.h | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h > index 313be4ad79fd..4a1fa7706b0c 100644 > --- a/include/linux/gfp_types.h > +++ b/include/linux/gfp_types.h > @@ -215,7 +215,8 @@ enum { > * the caller still has to check for failures) while costly requests try to be > * not disruptive and back off even without invoking the OOM killer. > * The following three modifiers might be used to override some of these > - * implicit rules. > + * implicit rules. Please note that all of them must be used along with > + * %__GFP_DIRECT_RECLAIM flag. > * > * %__GFP_NORETRY: The VM implementation will try only very lightweight > * memory direct reclaim to get some memory under memory pressure (thus > @@ -246,6 +247,8 @@ enum { > * cannot handle allocation failures. The allocation could block > * indefinitely but will never return with failure. Testing for > * failure is pointless. > + * It _must_ be blockable and used together with __GFP_DIRECT_RECLAIM. > + * It should _never_ be used in non-sleepable contexts. So this second line is a bit redundant as you can't use __GFP_DIRECT_RECLAIM in non-sleepable contexts. But can't hurt so not a objection. > * New users should be evaluated carefully (and the flag should be > * used only when there is no reasonable failure policy) but it is > * definitely preferable to use the flag rather than opencode endless ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 2/4] mm: Document __GFP_NOFAIL must be blockable 2024-07-31 0:01 ` [PATCH v2 2/4] mm: Document __GFP_NOFAIL must be blockable Barry Song 2024-07-31 10:18 ` Vlastimil Babka @ 2024-07-31 16:26 ` Christoph Hellwig 1 sibling, 0 replies; 28+ messages in thread From: Christoph Hellwig @ 2024-07-31 16:26 UTC (permalink / raw) To: Barry Song Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails 2024-07-31 0:01 [PATCH v2 0/4] mm: clarify nofail memory allocation Barry Song 2024-07-31 0:01 ` [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL Barry Song 2024-07-31 0:01 ` [PATCH v2 2/4] mm: Document __GFP_NOFAIL must be blockable Barry Song @ 2024-07-31 0:01 ` Barry Song 2024-07-31 7:11 ` Michal Hocko ` (2 more replies) 2024-07-31 0:01 ` [PATCH v2 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL Barry Song 3 siblings, 3 replies; 28+ messages in thread From: Barry Song @ 2024-07-31 0:01 UTC (permalink / raw) To: akpm, linux-mm Cc: 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Kees Cook From: Barry Song <v-songbaohua@oppo.com> We have cases we still fail though callers might have __GFP_NOFAIL. Since they don't check the return, we are exposed to the security risks for NULL deference. Though BUG_ON() is not encouraged by Linus, this is an unrecoverable situation. Christoph Hellwig: The whole freaking point of __GFP_NOFAIL is that callers don't handle allocation failures. So in fact a straight BUG is the right thing here. Vlastimil Babka: It's just not a recoverable situation (WARN_ON is for recoverable situations). The caller cannot handle allocation failure and at the same time asked for an impossible allocation. BUG_ON() is a guaranteed oops with stracktrace etc. We don't need to hope for the later NULL pointer dereference (which might if really unlucky happen from a different context where it's no longer obvious what lead to the allocation failing). Michal Hocko: Linus tends to be against adding new BUG() calls unless the failure is absolutely unrecoverable (e.g. corrupted data structures etc.). I am not sure how he would look at simply incorrect memory allocator usage to blow up the kernel. Now the argument could be made that those failures could cause subtle memory corruptions or even be exploitable which might be a sufficient reason to stop them early. Cc: Michal Hocko <mhocko@suse.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Barry Song <v-songbaohua@oppo.com> --- include/linux/slab.h | 4 +++- mm/page_alloc.c | 4 +++- mm/util.c | 1 + 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index c9cb42203183..4a4d1fdc2afe 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -827,8 +827,10 @@ kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node) { size_t bytes; - if (unlikely(check_mul_overflow(n, size, &bytes))) + if (unlikely(check_mul_overflow(n, size, &bytes))) { + BUG_ON(flags & __GFP_NOFAIL); return NULL; + } return kvmalloc_node_noprof(bytes, flags, node); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c700d2598a26..cc179c3e68df 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4708,8 +4708,10 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, * There are several places where we assume that the order value is sane * so bail out early if the request is out of bound. */ - if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) + if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) { + BUG_ON(gfp & __GFP_NOFAIL); return NULL; + } gfp &= gfp_allowed_mask; /* diff --git a/mm/util.c b/mm/util.c index 0ff5898cc6de..bad3258523b6 100644 --- a/mm/util.c +++ b/mm/util.c @@ -667,6 +667,7 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) /* Don't even allow crazy sizes */ if (unlikely(size > INT_MAX)) { + BUG_ON(flags & __GFP_NOFAIL); WARN_ON_ONCE(!(flags & __GFP_NOWARN)); return NULL; } -- 2.34.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails 2024-07-31 0:01 ` [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails Barry Song @ 2024-07-31 7:11 ` Michal Hocko 2024-07-31 10:29 ` Vlastimil Babka 2024-07-31 16:28 ` Christoph Hellwig 2 siblings, 0 replies; 28+ messages in thread From: Michal Hocko @ 2024-07-31 7:11 UTC (permalink / raw) To: Barry Song Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Kees Cook On Wed 31-07-24 12:01:54, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > We have cases we still fail though callers might have __GFP_NOFAIL. > Since they don't check the return, we are exposed to the security > risks for NULL deference. > > Though BUG_ON() is not encouraged by Linus, this is an unrecoverable > situation. > > Christoph Hellwig: > The whole freaking point of __GFP_NOFAIL is that callers don't handle > allocation failures. So in fact a straight BUG is the right thing > here. > > Vlastimil Babka: > It's just not a recoverable situation (WARN_ON is for recoverable > situations). The caller cannot handle allocation failure and at the same > time asked for an impossible allocation. BUG_ON() is a guaranteed oops > with stracktrace etc. We don't need to hope for the later NULL pointer > dereference (which might if really unlucky happen from a different > context where it's no longer obvious what lead to the allocation failing). > > Michal Hocko: > Linus tends to be against adding new BUG() calls unless the failure is > absolutely unrecoverable (e.g. corrupted data structures etc.). I am > not sure how he would look at simply incorrect memory allocator usage to > blow up the kernel. Now the argument could be made that those failures > could cause subtle memory corruptions or even be exploitable which might > be a sufficient reason to stop them early. > > Cc: Michal Hocko <mhocko@suse.com> > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Lorenzo Stoakes <lstoakes@gmail.com> > Cc: Christoph Lameter <cl@linux.com> > Cc: Pekka Enberg <penberg@kernel.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Roman Gushchin <roman.gushchin@linux.dev> > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Kees Cook <kees@kernel.org> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> Thanks for separating overflow/out of bounds checks. Acked-by: Michal Hocko <mhocko@suse.com> > --- > include/linux/slab.h | 4 +++- > mm/page_alloc.c | 4 +++- > mm/util.c | 1 + > 3 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index c9cb42203183..4a4d1fdc2afe 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -827,8 +827,10 @@ kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node) > { > size_t bytes; > > - if (unlikely(check_mul_overflow(n, size, &bytes))) > + if (unlikely(check_mul_overflow(n, size, &bytes))) { > + BUG_ON(flags & __GFP_NOFAIL); > return NULL; > + } > > return kvmalloc_node_noprof(bytes, flags, node); > } > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c700d2598a26..cc179c3e68df 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4708,8 +4708,10 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, > * There are several places where we assume that the order value is sane > * so bail out early if the request is out of bound. > */ > - if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) > + if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) { > + BUG_ON(gfp & __GFP_NOFAIL); > return NULL; > + } > > gfp &= gfp_allowed_mask; > /* > diff --git a/mm/util.c b/mm/util.c > index 0ff5898cc6de..bad3258523b6 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -667,6 +667,7 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) > > /* Don't even allow crazy sizes */ > if (unlikely(size > INT_MAX)) { > + BUG_ON(flags & __GFP_NOFAIL); > WARN_ON_ONCE(!(flags & __GFP_NOWARN)); > return NULL; > } > -- > 2.34.1 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails 2024-07-31 0:01 ` [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails Barry Song 2024-07-31 7:11 ` Michal Hocko @ 2024-07-31 10:29 ` Vlastimil Babka 2024-07-31 10:44 ` Tetsuo Handa 2024-07-31 16:28 ` Christoph Hellwig 2 siblings, 1 reply; 28+ messages in thread From: Vlastimil Babka @ 2024-07-31 10:29 UTC (permalink / raw) To: Barry Song, akpm, linux-mm Cc: 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, virtualization, Kees Cook On 7/31/24 2:01 AM, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > We have cases we still fail though callers might have __GFP_NOFAIL. > Since they don't check the return, we are exposed to the security > risks for NULL deference. > > Though BUG_ON() is not encouraged by Linus, this is an unrecoverable > situation. > > Christoph Hellwig: > The whole freaking point of __GFP_NOFAIL is that callers don't handle > allocation failures. So in fact a straight BUG is the right thing > here. > > Vlastimil Babka: > It's just not a recoverable situation (WARN_ON is for recoverable > situations). The caller cannot handle allocation failure and at the same > time asked for an impossible allocation. BUG_ON() is a guaranteed oops > with stracktrace etc. We don't need to hope for the later NULL pointer > dereference (which might if really unlucky happen from a different > context where it's no longer obvious what lead to the allocation failing). > > Michal Hocko: > Linus tends to be against adding new BUG() calls unless the failure is > absolutely unrecoverable (e.g. corrupted data structures etc.). I am > not sure how he would look at simply incorrect memory allocator usage to > blow up the kernel. Now the argument could be made that those failures > could cause subtle memory corruptions or even be exploitable which might > be a sufficient reason to stop them early. > > Cc: Michal Hocko <mhocko@suse.com> > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Lorenzo Stoakes <lstoakes@gmail.com> > Cc: Christoph Lameter <cl@linux.com> > Cc: Pekka Enberg <penberg@kernel.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Roman Gushchin <roman.gushchin@linux.dev> > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Kees Cook <kees@kernel.org> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > include/linux/slab.h | 4 +++- > mm/page_alloc.c | 4 +++- > mm/util.c | 1 + > 3 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index c9cb42203183..4a4d1fdc2afe 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -827,8 +827,10 @@ kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node) > { > size_t bytes; > > - if (unlikely(check_mul_overflow(n, size, &bytes))) > + if (unlikely(check_mul_overflow(n, size, &bytes))) { > + BUG_ON(flags & __GFP_NOFAIL); Shouldn't we produce some kind of warning also in the no-NOFAIL case? Returning a NULL completely silently feels wrong. > return NULL; > + } > > return kvmalloc_node_noprof(bytes, flags, node); > } > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c700d2598a26..cc179c3e68df 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4708,8 +4708,10 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, > * There are several places where we assume that the order value is sane > * so bail out early if the request is out of bound. > */ > - if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) > + if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) { Because here we do warn (although I'd argue even __GFP_NOWARN shouldn't suppress a warning for such a reason). > + BUG_ON(gfp & __GFP_NOFAIL); > return NULL; > + } > > gfp &= gfp_allowed_mask; > /* > diff --git a/mm/util.c b/mm/util.c > index 0ff5898cc6de..bad3258523b6 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -667,6 +667,7 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) > > /* Don't even allow crazy sizes */ > if (unlikely(size > INT_MAX)) { > + BUG_ON(flags & __GFP_NOFAIL); > WARN_ON_ONCE(!(flags & __GFP_NOWARN)); And here, would argue the same. But maybe there's code that relies on this so dunno. In that case we could at least convert to WARN_ON_ONCE_GFP. > return NULL; > } ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails 2024-07-31 10:29 ` Vlastimil Babka @ 2024-07-31 10:44 ` Tetsuo Handa 2024-07-31 10:48 ` Vlastimil Babka 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2024-07-31 10:44 UTC (permalink / raw) To: Vlastimil Babka, Barry Song, akpm, linux-mm Cc: 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, virtualization, Kees Cook On 2024/07/31 19:29, Vlastimil Babka wrote: >> @@ -827,8 +827,10 @@ kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node) >> { >> size_t bytes; >> >> - if (unlikely(check_mul_overflow(n, size, &bytes))) >> + if (unlikely(check_mul_overflow(n, size, &bytes))) { >> + BUG_ON(flags & __GFP_NOFAIL); > > Shouldn't we produce some kind of warning also in the no-NOFAIL case? > Returning a NULL completely silently feels wrong. Doesn't embedding macros like BUG_ON() into inline functions needlessly bloat the kernel size? I think we can call a non-inlined function (that is marked as EXPORT_SYMBOL()) for emitting warning. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails 2024-07-31 10:44 ` Tetsuo Handa @ 2024-07-31 10:48 ` Vlastimil Babka 2024-07-31 10:57 ` Barry Song 0 siblings, 1 reply; 28+ messages in thread From: Vlastimil Babka @ 2024-07-31 10:48 UTC (permalink / raw) To: Tetsuo Handa, Barry Song, akpm, linux-mm Cc: 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, virtualization, Kees Cook On 7/31/24 12:44 PM, Tetsuo Handa wrote: > On 2024/07/31 19:29, Vlastimil Babka wrote: >>> @@ -827,8 +827,10 @@ kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node) >>> { >>> size_t bytes; >>> >>> - if (unlikely(check_mul_overflow(n, size, &bytes))) >>> + if (unlikely(check_mul_overflow(n, size, &bytes))) { >>> + BUG_ON(flags & __GFP_NOFAIL); >> >> Shouldn't we produce some kind of warning also in the no-NOFAIL case? >> Returning a NULL completely silently feels wrong. > > Doesn't embedding macros like BUG_ON() into inline functions needlessly bloat > the kernel size? I think we can call a non-inlined function (that is marked > as EXPORT_SYMBOL()) for emitting warning. Hm yeah we could probably make the whole function non-inline as it results in a call anyway. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails 2024-07-31 10:48 ` Vlastimil Babka @ 2024-07-31 10:57 ` Barry Song 0 siblings, 0 replies; 28+ messages in thread From: Barry Song @ 2024-07-31 10:57 UTC (permalink / raw) To: Vlastimil Babka Cc: Tetsuo Handa, akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, virtualization, Kees Cook On Wed, Jul 31, 2024 at 6:48 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 7/31/24 12:44 PM, Tetsuo Handa wrote: > > On 2024/07/31 19:29, Vlastimil Babka wrote: > >>> @@ -827,8 +827,10 @@ kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node) > >>> { > >>> size_t bytes; > >>> > >>> - if (unlikely(check_mul_overflow(n, size, &bytes))) > >>> + if (unlikely(check_mul_overflow(n, size, &bytes))) { > >>> + BUG_ON(flags & __GFP_NOFAIL); > >> > >> Shouldn't we produce some kind of warning also in the no-NOFAIL case? > >> Returning a NULL completely silently feels wrong. > > > > Doesn't embedding macros like BUG_ON() into inline functions needlessly bloat > > the kernel size? I think we can call a non-inlined function (that is marked > > as EXPORT_SYMBOL()) for emitting warning. > > Hm yeah we could probably make the whole function non-inline as it results > in a call anyway. Although this might save some code footprint, we will result in two calls for users of kvmalloc_array_node_noprof() with both Tetsuo's and your proposal? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails 2024-07-31 0:01 ` [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails Barry Song 2024-07-31 7:11 ` Michal Hocko 2024-07-31 10:29 ` Vlastimil Babka @ 2024-07-31 16:28 ` Christoph Hellwig 2 siblings, 0 replies; 28+ messages in thread From: Christoph Hellwig @ 2024-07-31 16:28 UTC (permalink / raw) To: Barry Song Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Kees Cook Looks good (inline or not): Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL 2024-07-31 0:01 [PATCH v2 0/4] mm: clarify nofail memory allocation Barry Song ` (2 preceding siblings ...) 2024-07-31 0:01 ` [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails Barry Song @ 2024-07-31 0:01 ` Barry Song 2024-07-31 7:15 ` Michal Hocko 2024-07-31 10:55 ` Vlastimil Babka 3 siblings, 2 replies; 28+ messages in thread From: Barry Song @ 2024-07-31 0:01 UTC (permalink / raw) To: akpm, linux-mm Cc: 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Kees Cook From: Barry Song <v-songbaohua@oppo.com> When users allocate memory with the __GFP_NOFAIL flag, they might incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kind of non-blockable __GFP_NOFAIL is not supported and is pointless. If we attempt and still fail to allocate memory for these users, we have two choices: 1. We could busy-loop and hope that some other direct reclamation or kswapd rescues the current process. However, this is unreliable and could ultimately lead to hard or soft lockups, which might not be well supported by some architectures. 2. We could use BUG_ON to trigger a reliable system crash, avoiding exposing NULL dereference. This patch chooses the second option because the first is unreliable. Even if the process incorrectly using __GFP_NOFAIL is sometimes rescued, the long latency might be unacceptable, especially considering that misusing GFP_ATOMIC and __GFP_NOFAIL is likely to occur in atomic contexts with strict timing requirements. Cc: Michal Hocko <mhocko@suse.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Barry Song <v-songbaohua@oppo.com> --- mm/page_alloc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cc179c3e68df..ed1bd8f595bd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4439,11 +4439,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, */ if (gfp_mask & __GFP_NOFAIL) { /* - * All existing users of the __GFP_NOFAIL are blockable, so warn - * of any new users that actually require GFP_NOWAIT + * All existing users of the __GFP_NOFAIL are blockable + * otherwise we introduce a busy loop with inside the page + * allocator from non-sleepable contexts */ - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) - goto fail; + BUG_ON(!can_direct_reclaim); /* * PF_MEMALLOC request from this context is rather bizarre @@ -4474,7 +4474,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, cond_resched(); goto retry; } -fail: + warn_alloc(gfp_mask, ac->nodemask, "page allocation failure: order:%u", order); got_pg: -- 2.34.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL 2024-07-31 0:01 ` [PATCH v2 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL Barry Song @ 2024-07-31 7:15 ` Michal Hocko 2024-07-31 10:55 ` Vlastimil Babka 1 sibling, 0 replies; 28+ messages in thread From: Michal Hocko @ 2024-07-31 7:15 UTC (permalink / raw) To: Barry Song Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, vbabka, virtualization, Kees Cook On Wed 31-07-24 12:01:55, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > When users allocate memory with the __GFP_NOFAIL flag, they might > incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kind > of non-blockable __GFP_NOFAIL is not supported and is pointless. If > we attempt and still fail to allocate memory for these users, we have > two choices: > > 1. We could busy-loop and hope that some other direct reclamation or > kswapd rescues the current process. However, this is unreliable > and could ultimately lead to hard or soft lockups, which might not > be well supported by some architectures. > > 2. We could use BUG_ON to trigger a reliable system crash, avoiding > exposing NULL dereference. > > This patch chooses the second option because the first is unreliable. Even > if the process incorrectly using __GFP_NOFAIL is sometimes rescued, the > long latency might be unacceptable, especially considering that misusing > GFP_ATOMIC and __GFP_NOFAIL is likely to occur in atomic contexts with > strict timing requirements. Well, any latency arguments are out of table with BUG_ON crashing the system. So this is not about reliability but rather making those incorrect uses more obvious. With your GFP_NOFAIL follow up this should be simply impossible to trigger though. I am still not sure which of the bad solutions is more appropriate so I am not giving this an ack. Either of them is better than allow to fail though. > Cc: Michal Hocko <mhocko@suse.com> > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Lorenzo Stoakes <lstoakes@gmail.com> > Cc: Christoph Lameter <cl@linux.com> > Cc: Pekka Enberg <penberg@kernel.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Roman Gushchin <roman.gushchin@linux.dev> > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Kees Cook <kees@kernel.org> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > --- > mm/page_alloc.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cc179c3e68df..ed1bd8f595bd 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4439,11 +4439,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > */ > if (gfp_mask & __GFP_NOFAIL) { > /* > - * All existing users of the __GFP_NOFAIL are blockable, so warn > - * of any new users that actually require GFP_NOWAIT > + * All existing users of the __GFP_NOFAIL are blockable > + * otherwise we introduce a busy loop with inside the page > + * allocator from non-sleepable contexts > */ > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > - goto fail; > + BUG_ON(!can_direct_reclaim); > > /* > * PF_MEMALLOC request from this context is rather bizarre > @@ -4474,7 +4474,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > cond_resched(); > goto retry; > } > -fail: > + > warn_alloc(gfp_mask, ac->nodemask, > "page allocation failure: order:%u", order); > got_pg: > -- > 2.34.1 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL 2024-07-31 0:01 ` [PATCH v2 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL Barry Song 2024-07-31 7:15 ` Michal Hocko @ 2024-07-31 10:55 ` Vlastimil Babka 2024-07-31 11:08 ` Barry Song 1 sibling, 1 reply; 28+ messages in thread From: Vlastimil Babka @ 2024-07-31 10:55 UTC (permalink / raw) To: Barry Song, akpm, linux-mm Cc: 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, lstoakes, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, virtualization, Kees Cook On 7/31/24 2:01 AM, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > When users allocate memory with the __GFP_NOFAIL flag, they might > incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kind > of non-blockable __GFP_NOFAIL is not supported and is pointless. If > we attempt and still fail to allocate memory for these users, we have > two choices: > > 1. We could busy-loop and hope that some other direct reclamation or > kswapd rescues the current process. However, this is unreliable > and could ultimately lead to hard or soft lockups, which might not > be well supported by some architectures. > > 2. We could use BUG_ON to trigger a reliable system crash, avoiding > exposing NULL dereference. > > This patch chooses the second option because the first is unreliable. Even > if the process incorrectly using __GFP_NOFAIL is sometimes rescued, the > long latency might be unacceptable, especially considering that misusing > GFP_ATOMIC and __GFP_NOFAIL is likely to occur in atomic contexts with > strict timing requirements. > > Cc: Michal Hocko <mhocko@suse.com> > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Lorenzo Stoakes <lstoakes@gmail.com> > Cc: Christoph Lameter <cl@linux.com> > Cc: Pekka Enberg <penberg@kernel.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Roman Gushchin <roman.gushchin@linux.dev> > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Kees Cook <kees@kernel.org> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > --- > mm/page_alloc.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cc179c3e68df..ed1bd8f595bd 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4439,11 +4439,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > */ > if (gfp_mask & __GFP_NOFAIL) { > /* > - * All existing users of the __GFP_NOFAIL are blockable, so warn > - * of any new users that actually require GFP_NOWAIT > + * All existing users of the __GFP_NOFAIL are blockable > + * otherwise we introduce a busy loop with inside the page > + * allocator from non-sleepable contexts > */ > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > - goto fail; > + BUG_ON(!can_direct_reclaim); We might get more useful output if here we did just "if (!can_direct_reclaim) goto fail; and let warn_alloc() print it, and then there would be a BUG_ON(gfp_mask & __GFP_NOFAIL)? Additionally we could mask out __GFP_NOWARN from gfp_mask before the goto, as a __GFP_NOWARN would suppress the output in a non-recoverable situation so it would be wrong. > > /* > * PF_MEMALLOC request from this context is rather bizarre > @@ -4474,7 +4474,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > cond_resched(); > goto retry; > } > -fail: > + > warn_alloc(gfp_mask, ac->nodemask, > "page allocation failure: order:%u", order); > got_pg: ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL 2024-07-31 10:55 ` Vlastimil Babka @ 2024-07-31 11:08 ` Barry Song 2024-07-31 11:31 ` Michal Hocko 0 siblings, 1 reply; 28+ messages in thread From: Barry Song @ 2024-07-31 11:08 UTC (permalink / raw) To: Vlastimil Babka Cc: akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, mhocko, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, virtualization, Kees Cook, lorenzo.stoakes On Wed, Jul 31, 2024 at 6:55 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 7/31/24 2:01 AM, Barry Song wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > When users allocate memory with the __GFP_NOFAIL flag, they might > > incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kind > > of non-blockable __GFP_NOFAIL is not supported and is pointless. If > > we attempt and still fail to allocate memory for these users, we have > > two choices: > > > > 1. We could busy-loop and hope that some other direct reclamation or > > kswapd rescues the current process. However, this is unreliable > > and could ultimately lead to hard or soft lockups, which might not > > be well supported by some architectures. > > > > 2. We could use BUG_ON to trigger a reliable system crash, avoiding > > exposing NULL dereference. > > > > This patch chooses the second option because the first is unreliable. Even > > if the process incorrectly using __GFP_NOFAIL is sometimes rescued, the > > long latency might be unacceptable, especially considering that misusing > > GFP_ATOMIC and __GFP_NOFAIL is likely to occur in atomic contexts with > > strict timing requirements. > > > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> > > Cc: Christoph Hellwig <hch@infradead.org> > > Cc: Lorenzo Stoakes <lstoakes@gmail.com> > > Cc: Christoph Lameter <cl@linux.com> > > Cc: Pekka Enberg <penberg@kernel.org> > > Cc: David Rientjes <rientjes@google.com> > > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > > Cc: Vlastimil Babka <vbabka@suse.cz> > > Cc: Roman Gushchin <roman.gushchin@linux.dev> > > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > Cc: Kees Cook <kees@kernel.org> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > --- > > mm/page_alloc.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index cc179c3e68df..ed1bd8f595bd 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -4439,11 +4439,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > */ > > if (gfp_mask & __GFP_NOFAIL) { > > /* > > - * All existing users of the __GFP_NOFAIL are blockable, so warn > > - * of any new users that actually require GFP_NOWAIT > > + * All existing users of the __GFP_NOFAIL are blockable > > + * otherwise we introduce a busy loop with inside the page > > + * allocator from non-sleepable contexts > > */ > > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > > - goto fail; > > + BUG_ON(!can_direct_reclaim); > > We might get more useful output if here we did just "if > (!can_direct_reclaim) goto fail; and let warn_alloc() print it, and then > there would be a BUG_ON(gfp_mask & __GFP_NOFAIL)? > Additionally we could mask out __GFP_NOWARN from gfp_mask before the goto, > as a __GFP_NOWARN would suppress the output in a non-recoverable situation > so it would be wrong. If we use BUG_ON, it seems like we don't need to do anything else, as the BUG_ON report gives developers all the information they need. If we go with approach 1—doing a busy loop until rescued or a lockup occurs—I agree it might be better to add more warnings. > > > > > /* > > * PF_MEMALLOC request from this context is rather bizarre > > @@ -4474,7 +4474,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > cond_resched(); > > goto retry; > > } > > -fail: > > + > > warn_alloc(gfp_mask, ac->nodemask, > > "page allocation failure: order:%u", order); > > got_pg: > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL 2024-07-31 11:08 ` Barry Song @ 2024-07-31 11:31 ` Michal Hocko 0 siblings, 0 replies; 28+ messages in thread From: Michal Hocko @ 2024-07-31 11:31 UTC (permalink / raw) To: Barry Song Cc: Vlastimil Babka, akpm, linux-mm, 42.hyeyoo, cl, hailong.liu, hch, iamjoonsoo.kim, penberg, rientjes, roman.gushchin, torvalds, urezki, v-songbaohua, virtualization, Kees Cook, lorenzo.stoakes On Wed 31-07-24 19:08:44, Barry Song wrote: > On Wed, Jul 31, 2024 at 6:55 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > On 7/31/24 2:01 AM, Barry Song wrote: > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > When users allocate memory with the __GFP_NOFAIL flag, they might > > > incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kind > > > of non-blockable __GFP_NOFAIL is not supported and is pointless. If > > > we attempt and still fail to allocate memory for these users, we have > > > two choices: > > > > > > 1. We could busy-loop and hope that some other direct reclamation or > > > kswapd rescues the current process. However, this is unreliable > > > and could ultimately lead to hard or soft lockups, which might not > > > be well supported by some architectures. > > > > > > 2. We could use BUG_ON to trigger a reliable system crash, avoiding > > > exposing NULL dereference. > > > > > > This patch chooses the second option because the first is unreliable. Even > > > if the process incorrectly using __GFP_NOFAIL is sometimes rescued, the > > > long latency might be unacceptable, especially considering that misusing > > > GFP_ATOMIC and __GFP_NOFAIL is likely to occur in atomic contexts with > > > strict timing requirements. > > > > > > Cc: Michal Hocko <mhocko@suse.com> > > > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > Cc: Christoph Hellwig <hch@infradead.org> > > > Cc: Lorenzo Stoakes <lstoakes@gmail.com> > > > Cc: Christoph Lameter <cl@linux.com> > > > Cc: Pekka Enberg <penberg@kernel.org> > > > Cc: David Rientjes <rientjes@google.com> > > > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > Cc: Roman Gushchin <roman.gushchin@linux.dev> > > > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > > Cc: Kees Cook <kees@kernel.org> > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > > --- > > > mm/page_alloc.c | 10 +++++----- > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index cc179c3e68df..ed1bd8f595bd 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -4439,11 +4439,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > > */ > > > if (gfp_mask & __GFP_NOFAIL) { > > > /* > > > - * All existing users of the __GFP_NOFAIL are blockable, so warn > > > - * of any new users that actually require GFP_NOWAIT > > > + * All existing users of the __GFP_NOFAIL are blockable > > > + * otherwise we introduce a busy loop with inside the page > > > + * allocator from non-sleepable contexts > > > */ > > > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > > > - goto fail; > > > + BUG_ON(!can_direct_reclaim); > > > > We might get more useful output if here we did just "if > > (!can_direct_reclaim) goto fail; and let warn_alloc() print it, and then > > there would be a BUG_ON(gfp_mask & __GFP_NOFAIL)? > > Additionally we could mask out __GFP_NOWARN from gfp_mask before the goto, > > as a __GFP_NOWARN would suppress the output in a non-recoverable situation > > so it would be wrong. > > If we use BUG_ON, it seems like we don't need to do anything else, as the BUG_ON > report gives developers all the information they need. It will not give warn_alloc - aka state of the page allocator at the time of failure. Is this really necessary? I don't know because it is "shouldn't ever happen" rather than "how come this allocation has failed" case. So IMHO a simple BUG_ON should be sufficient to scream out loud that impossible has happened and need fixing. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2024-08-05 8:19 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-31 0:01 [PATCH v2 0/4] mm: clarify nofail memory allocation Barry Song 2024-07-31 0:01 ` [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL Barry Song 2024-07-31 3:09 ` Jason Wang 2024-07-31 3:15 ` Barry Song 2024-07-31 3:58 ` Jason Wang 2024-07-31 4:11 ` Barry Song 2024-07-31 4:13 ` Jason Wang 2024-07-31 5:05 ` Barry Song 2024-07-31 10:20 ` Tetsuo Handa 2024-08-01 2:37 ` Jason Wang 2024-08-05 1:32 ` Barry Song 2024-08-05 8:19 ` Jason Wang 2024-08-01 2:30 ` Jason Wang 2024-07-31 0:01 ` [PATCH v2 2/4] mm: Document __GFP_NOFAIL must be blockable Barry Song 2024-07-31 10:18 ` Vlastimil Babka 2024-07-31 16:26 ` Christoph Hellwig 2024-07-31 0:01 ` [PATCH v2 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails Barry Song 2024-07-31 7:11 ` Michal Hocko 2024-07-31 10:29 ` Vlastimil Babka 2024-07-31 10:44 ` Tetsuo Handa 2024-07-31 10:48 ` Vlastimil Babka 2024-07-31 10:57 ` Barry Song 2024-07-31 16:28 ` Christoph Hellwig 2024-07-31 0:01 ` [PATCH v2 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL Barry Song 2024-07-31 7:15 ` Michal Hocko 2024-07-31 10:55 ` Vlastimil Babka 2024-07-31 11:08 ` Barry Song 2024-07-31 11:31 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).