All of lore.kernel.org
 help / color / mirror / Atom feed
* + vpda-try-to-fix-the-potential-crash-due-to-misusing-__gfp_nofail.patch added to mm-unstable branch
@ 2024-07-31  0:39 Andrew Morton
  0 siblings, 0 replies; only message in thread
From: Andrew Morton @ 2024-07-31  0:39 UTC (permalink / raw)
  To: mm-commits, xuanzhuo, vbabka, urezki, torvalds, roman.gushchin,
	rientjes, penberg, mst, mhocko, maxime.coquelin, lstoakes, kees,
	jasowang, iamjoonsoo.kim, hch, hailong.liu, eperezma, cl,
	42.hyeyoo, v-songbaohua, akpm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 9082 bytes --]


The patch titled
     Subject: vpda: try to fix the potential crash due to misusing __GFP_NOFAIL
has been added to the -mm mm-unstable branch.  Its filename is
     vpda-try-to-fix-the-potential-crash-due-to-misusing-__gfp_nofail.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/vpda-try-to-fix-the-potential-crash-due-to-misusing-__gfp_nofail.patch

This patch will later appear in the mm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: Barry Song <v-songbaohua@oppo.com>
Subject: vpda: try to fix the potential crash due to misusing __GFP_NOFAIL
Date: Wed, 31 Jul 2024 12:01:52 +1200

Patch series "mm: clarify nofail memory allocation", v2.

__GFP_NOFAIL carries the semantics of never failing, so its callers
do not check the return value:
  %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
  cannot handle allocation failures. The allocation could block
  indefinitely but will never return with failure. Testing for
  failure is pointless.

However, __GFP_NOFAIL can sometimes fail if it exceeds size limits or is
used with GFP_ATOMIC/GFP_NOWAIT in a non-sleepable context.  This can
expose security vulnerabilities due to potential NULL dereferences.

Since __GFP_NOFAIL does not support non-blocking allocation, we introduce
GFP_NOFAIL with inclusive blocking semantics and encourage using
GFP_NOFAIL as a replacement for __GFP_NOFAIL in non-mm.

If we must still fail a nofail allocation, we should trigger a BUG rather
than exposing NULL dereferences to callers who do not check the return
value.

* The discussion started from this topic:
 [PATCH RFC] mm: warn potential return NULL for kmalloc_array and
             kvmalloc_array with __GFP_NOFAIL

 https://lore.kernel.org/linux-mm/20240717230025.77361-1-21cnbao@gmail.com/


This patch (of 4):

mm doesn't support non-blockable __GFP_NOFAIL allocation.  Because
__GFP_NOFAIL without direct reclamation may just result in a busy loop
within non-sleepable contexts.

static inline struct page *
__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
                                                struct alloc_context *ac)
{
        ...
        /*
         * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
         * we always retry
         */
        if (gfp_mask & __GFP_NOFAIL) {
                /*
                 * All existing users of the __GFP_NOFAIL are blockable, so warn
                 * of any new users that actually require GFP_NOWAIT
                 */
                if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
                        goto fail;
                ...
        }
        ...
fail:
        warn_alloc(gfp_mask, ac->nodemask,
                        "page allocation failure: order:%u", order);
got_pg:
        return page;
}

Let's move the memory allocation out of the atomic context and use the
normal sleepable context to get pages.

Link: https://lkml.kernel.org/r/20240731000155.109583-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240731000155.109583-2-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hailong.Liu <hailong.liu@oppo.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/vdpa/vdpa_user/iova_domain.c |   31 ++++++++++++++++++++-----
 drivers/vdpa/vdpa_user/iova_domain.h |    5 +++-
 drivers/vdpa/vdpa_user/vduse_dev.c   |    4 ++-
 3 files changed, 33 insertions(+), 7 deletions(-)

--- a/drivers/vdpa/vdpa_user/iova_domain.c~vpda-try-to-fix-the-potential-crash-due-to-misusing-__gfp_nofail
+++ a/drivers/vdpa/vdpa_user/iova_domain.c
@@ -283,7 +283,23 @@ out:
 	return ret;
 }
 
-void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain)
+struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain)
+{
+	struct page **pages;
+	unsigned long count, i;
+
+	if (!domain->user_bounce_pages)
+		return NULL;
+
+	count = domain->bounce_size >> PAGE_SHIFT;
+	pages = kmalloc_array(count, sizeof(*pages), GFP_KERNEL | __GFP_NOFAIL);
+	for (i = 0; i < count; i++)
+		pages[i] = alloc_page(GFP_KERNEL | __GFP_NOFAIL);
+
+	return pages;
+}
+
+void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain, struct page **pages)
 {
 	struct vduse_bounce_map *map;
 	unsigned long i, count;
@@ -294,15 +310,16 @@ void vduse_domain_remove_user_bounce_pag
 
 	count = domain->bounce_size >> PAGE_SHIFT;
 	for (i = 0; i < count; i++) {
-		struct page *page = NULL;
+		struct page *page = pages[i];
 
 		map = &domain->bounce_maps[i];
-		if (WARN_ON(!map->bounce_page))
+		if (WARN_ON(!map->bounce_page)) {
+			put_page(page);
 			continue;
+		}
 
 		/* Copy user page to kernel page if it's in use */
 		if (map->orig_phys != INVALID_PHYS_ADDR) {
-			page = alloc_page(GFP_ATOMIC | __GFP_NOFAIL);
 			memcpy_from_page(page_address(page),
 					 map->bounce_page, 0, PAGE_SIZE);
 		}
@@ -310,6 +327,7 @@ void vduse_domain_remove_user_bounce_pag
 		map->bounce_page = page;
 	}
 	domain->user_bounce_pages = false;
+	kfree(pages);
 out:
 	write_unlock(&domain->bounce_lock);
 }
@@ -543,10 +561,13 @@ static int vduse_domain_mmap(struct file
 static int vduse_domain_release(struct inode *inode, struct file *file)
 {
 	struct vduse_iova_domain *domain = file->private_data;
+	struct page **pages;
+
+	pages = vduse_domain_alloc_pages_to_remove_bounce(domain);
 
 	spin_lock(&domain->iotlb_lock);
 	vduse_iotlb_del_range(domain, 0, ULLONG_MAX);
-	vduse_domain_remove_user_bounce_pages(domain);
+	vduse_domain_remove_user_bounce_pages(domain, pages);
 	vduse_domain_free_kernel_bounce_pages(domain);
 	spin_unlock(&domain->iotlb_lock);
 	put_iova_domain(&domain->stream_iovad);
--- a/drivers/vdpa/vdpa_user/iova_domain.h~vpda-try-to-fix-the-potential-crash-due-to-misusing-__gfp_nofail
+++ a/drivers/vdpa/vdpa_user/iova_domain.h
@@ -74,7 +74,10 @@ void vduse_domain_reset_bounce_map(struc
 int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain *domain,
 				       struct page **pages, int count);
 
-void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain);
+void vduse_domain_remove_user_bounce_pages(struct vduse_iova_domain *domain,
+					   struct page **pages);
+
+struct page **vduse_domain_alloc_pages_to_remove_bounce(struct vduse_iova_domain *domain);
 
 void vduse_domain_destroy(struct vduse_iova_domain *domain);
 
--- a/drivers/vdpa/vdpa_user/vduse_dev.c~vpda-try-to-fix-the-potential-crash-due-to-misusing-__gfp_nofail
+++ a/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1030,6 +1030,7 @@ unlock:
 static int vduse_dev_dereg_umem(struct vduse_dev *dev,
 				u64 iova, u64 size)
 {
+	struct page **pages;
 	int ret;
 
 	mutex_lock(&dev->mem_lock);
@@ -1044,7 +1045,8 @@ static int vduse_dev_dereg_umem(struct v
 	if (dev->umem->iova != iova || size != dev->domain->bounce_size)
 		goto unlock;
 
-	vduse_domain_remove_user_bounce_pages(dev->domain);
+	pages = vduse_domain_alloc_pages_to_remove_bounce(dev->domain);
+	vduse_domain_remove_user_bounce_pages(dev->domain, pages);
 	unpin_user_pages_dirty_lock(dev->umem->pages,
 				    dev->umem->npages, true);
 	atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm);
_

Patches currently in -mm which might be from v-songbaohua@oppo.com are

mm-extend-usage-parameter-so-that-cluster_swap_free_nr-can-be-reused.patch
mm-swap-add-nr-argument-in-swapcache_prepare-and-swapcache_clear-to-support-large-folios.patch
vpda-try-to-fix-the-potential-crash-due-to-misusing-__gfp_nofail.patch
mm-document-__gfp_nofail-must-be-blockable.patch
mm-bug_on-to-avoid-null-deference-while-__gfp_nofail-fails.patch
mm-prohibit-null-deference-exposed-for-unsupported-non-blockable-__gfp_nofail.patch


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-07-31  0:39 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-31  0:39 + vpda-try-to-fix-the-potential-crash-due-to-misusing-__gfp_nofail.patch added to mm-unstable branch Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.