From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=xNql=FF=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID,
	DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B3CBFC64E7A
	for <linux-mm@archiver.kernel.org>; Tue,  1 Dec 2020 17:51:57 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id F263621741
	for <linux-mm@archiver.kernel.org>; Tue,  1 Dec 2020 17:51:56 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="R4GNhwPX"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F263621741
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 747AB6B0068; Tue,  1 Dec 2020 12:51:56 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 6D24A6B006C; Tue,  1 Dec 2020 12:51:56 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 5706D8D0001; Tue,  1 Dec 2020 12:51:56 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5])
	by kanga.kvack.org (Postfix) with ESMTP id 3CDDD6B0068
	for <linux-mm@kvack.org>; Tue,  1 Dec 2020 12:51:56 -0500 (EST)
Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id 01919181AEF1A
	for <linux-mm@kvack.org>; Tue,  1 Dec 2020 17:51:56 +0000 (UTC)
X-FDA: 77545456632.12.rock73_5f0484b273ac
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin12.hostedemail.com (Postfix) with ESMTP id D3E221800BC34
	for <linux-mm@kvack.org>; Tue,  1 Dec 2020 17:51:55 +0000 (UTC)
X-HE-Tag: rock73_5f0484b273ac
X-Filterd-Recvd-Size: 14752
Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65])
	by imf29.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue,  1 Dec 2020 17:51:55 +0000 (UTC)
Received: by mail-pj1-f65.google.com with SMTP id hk16so1707240pjb.4
        for <linux-mm@kvack.org>; Tue, 01 Dec 2020 09:51:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=sender:from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=3unagI/0BHWWOJ0Y3ZMT8ueBfxzXn5Uqgna//GYokw0=;
        b=R4GNhwPXvS4+cNCeRN1dv7JxUb9I1Xr8kcTRAhooqHbaqiif7YhXhajWTvdI1bItVr
         9EhXS+9mVcK/+IIF+OLKihL9ScqcKINVXIyFoMRLWlMx5VkkFtXOukjtNPi5LRbUzec0
         1Ijtk3xneiQU1naVixqxALzGEIZmsgRz0jwFIJS18pCABVCyIYxzC3IH9SSfAIJT6Gjj
         Du1fi4KZ6+7bF4ui/kseRcYfYCh7mJbDNsFTT1pEA/Fc+bJG9uVjLvsydb4BIbDWjgD9
         GYGhOUL0FMsV63SmaQ8LRL4VMj013WFbEHns+2i0DijVR14UyJ7ItCmlT53wPDzsjLvk
         MVew==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:sender:from:to:cc:subject:date:message-id
         :in-reply-to:references:mime-version:content-transfer-encoding;
        bh=3unagI/0BHWWOJ0Y3ZMT8ueBfxzXn5Uqgna//GYokw0=;
        b=X5uATtbolpInWXoD7yzEAzZhfy3KpHaIhu5nsXyO1Lnm2kRbKh6n31vZbVQMaX0nm0
         GWgl8HYcp5Rf+ZQfrhNfrH7UciqqDtnHP5Kra0FTXTNPmGJUgvSdAw0z3mM2+KGBtTQD
         Zh8qQBQ3m+ow4Ej8ZvPlLQUP7Sye2eiVKUA/kJBxBTlIgLEO7TpAJSgsFwDicgTqVN/n
         X8n2FfmNpDc7ZttN9LWbVUZ7WpBnYulMM2YXjnqsZo61fyEwVHo9Jqt1ueuHFtPXLrkU
         F10lsNSWcSNiKOmrEkXGQS+pLit9rBUQlzPJ7RfL8Iz9mbfIhHOKmID5mkKV0NgnjQpE
         ZKdQ==
X-Gm-Message-State: AOAM531hnafu3nzwXibbaM47ET87BkE4IIUbQR6KADRwu4ooEQ355fvJ
	PiWjkKbwPElNDuFbVTY+xK8=
X-Google-Smtp-Source: ABdhPJxqGmkYiOGzCY2ozTIIBsCp1TH4bT3wdVU7dS7WQ+pQKlms2XBDjLXG6YKwvQKLj7CUH27h/Q==
X-Received: by 2002:a17:902:b905:b029:d8:ad03:8c93 with SMTP id bf5-20020a170902b905b02900d8ad038c93mr3958549plb.15.1606845114421;
        Tue, 01 Dec 2020 09:51:54 -0800 (PST)
Received: from bbox-1.mtv.corp.google.com ([2620:15c:211:201:7220:84ff:fe09:5e58])
        by smtp.gmail.com with ESMTPSA id q23sm390082pfg.192.2020.12.01.09.51.52
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 01 Dec 2020 09:51:53 -0800 (PST)
From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	hyesoo.yu@samsung.com,
	willy@infradead.org,
	david@redhat.com,
	iamjoonsoo.kim@lge.com,
	vbabka@suse.cz,
	surenb@google.com,
	pullip.cho@samsung.com,
	joaodias@google.com,
	hridya@google.com,
	sumit.semwal@linaro.org,
	john.stultz@linaro.org,
	Brian.Starkey@arm.com,
	linux-media@vger.kernel.org,
	devicetree@vger.kernel.org,
	robh@kernel.org,
	christian.koenig@amd.com,
	linaro-mm-sig@lists.linaro.org,
	Minchan Kim <minchan@kernel.org>
Subject: [PATCH v2 2/4] mm: introduce cma_alloc_bulk API
Date: Tue,  1 Dec 2020 09:51:42 -0800
Message-Id: <20201201175144.3996569-3-minchan@kernel.org>
X-Mailer: git-send-email 2.29.2.454.gaff20da3a2-goog
In-Reply-To: <20201201175144.3996569-1-minchan@kernel.org>
References: <20201201175144.3996569-1-minchan@kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

There is a need for special HW to require bulk allocation of
high-order pages. For example, 4800 * order-4 pages, which
would be minimum, sometimes, it requires more.

To meet the requirement, a option reserves 300M CMA area and
requests the whole 300M contiguous memory. However, it doesn't
work if even one of those pages in the range is long-term pinned
directly or indirectly. The other option is to ask higher-order
size (e.g., 2M) than requested order(64K) repeatedly until driver
could gather necessary amount of memory. Basically, this approach
makes the allocation very slow due to cma_alloc's function
slowness and it could be stuck on one of the pageblocks if it
encounters unmigratable page.

To solve the issue, this patch introduces cma_alloc_bulk.

	int cma_alloc_bulk(struct cma *cma, unsigned int align,
		bool fast, unsigned int order, size_t nr_requests,
		struct page **page_array, size_t *nr_allocated);

Most parameters are same with cma_alloc but it additionally passes
vector array to store allocated memory. What's different with cma_alloc
is it will skip pageblocks without waiting/stopping if it has unmovable
page so that API continues to scan other pageblocks to find requested
order page.

cma_alloc_bulk is best effort approach in that it skips some pageblocks
if they have unmovable pages unlike cma_alloc. It doesn't need to be
perfect from the beginning at the cost of performance. Thus, the API
takes "bool fast parameter" which is propagated into alloc_contig_range t=
o
avoid significat overhead functions to inrecase CMA allocation success
ratio(e.g., migration retrial, PCP, LRU draining per pageblock)
at the cost of less allocation success ratio. If the caller couldn't
allocate enough, they could call it with "false" to increase success rati=
o
if they are okay to expense the overhead for the success ratio.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/cma.h |   5 ++
 include/linux/gfp.h |   2 +
 mm/cma.c            | 126 ++++++++++++++++++++++++++++++++++++++++++--
 mm/page_alloc.c     |  19 ++++---
 4 files changed, 140 insertions(+), 12 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index 217999c8a762..7375d3131804 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -46,6 +46,11 @@ extern int cma_init_reserved_mem(phys_addr_t base, phy=
s_addr_t size,
 					struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned in=
t align,
 			      bool no_warn);
+
+extern int cma_alloc_bulk(struct cma *cma, unsigned int align, bool fast=
,
+			unsigned int order, size_t nr_requests,
+			struct page **page_array, size_t *nr_allocated);
+
 extern bool cma_release(struct cma *cma, const struct page *pages, unsig=
ned int count);
=20
 extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), voi=
d *data);
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index ad5872699692..75bfb673d75b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -627,6 +627,8 @@ static inline bool pm_suspended_storage(void)
 enum alloc_contig_mode {
 	/* try several ways to increase success ratio of memory allocation */
 	ALLOC_CONTIG_NORMAL,
+	/* avoid costly functions to make the call fast */
+	ALLOC_CONTIG_FAST,
 };
=20
 /* The below functions must be run on a range from a single zone. */
diff --git a/mm/cma.c b/mm/cma.c
index 8010c1ba04b0..4459045fa717 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -32,6 +32,7 @@
 #include <linux/highmem.h>
 #include <linux/io.h>
 #include <linux/kmemleak.h>
+#include <linux/swap.h>
 #include <trace/events/cma.h>
=20
 #include "cma.h"
@@ -397,6 +398,14 @@ static void cma_debug_show_areas(struct cma *cma)
 static inline void cma_debug_show_areas(struct cma *cma) { }
 #endif
=20
+static void reset_page_kasan_tag(struct page *page, int count)
+{
+	int i;
+
+	for (i =3D 0; i < count; i++)
+		page_kasan_tag_reset(page + i);
+}
+
 /**
  * cma_alloc() - allocate pages from contiguous area
  * @cma:   Contiguous memory region for which the allocation is performe=
d.
@@ -414,7 +423,6 @@ struct page *cma_alloc(struct cma *cma, size_t count,=
 unsigned int align,
 	unsigned long pfn =3D -1;
 	unsigned long start =3D 0;
 	unsigned long bitmap_maxno, bitmap_no, bitmap_count;
-	size_t i;
 	struct page *page =3D NULL;
 	int ret =3D -ENOMEM;
=20
@@ -479,10 +487,8 @@ struct page *cma_alloc(struct cma *cma, size_t count=
, unsigned int align,
 	 * blocks being marked with different tags. Reset the tags to ignore
 	 * those page blocks.
 	 */
-	if (page) {
-		for (i =3D 0; i < count; i++)
-			page_kasan_tag_reset(page + i);
-	}
+	if (page)
+		reset_page_kasan_tag(page, count);
=20
 	if (ret && !no_warn) {
 		pr_err("%s: alloc failed, req-size: %zu pages, ret: %d\n",
@@ -494,6 +500,116 @@ struct page *cma_alloc(struct cma *cma, size_t coun=
t, unsigned int align,
 	return page;
 }
=20
+/*
+ * cma_alloc_bulk() - allocate high order bulk pages from contiguous are=
a with
+ * 		best effort. It will usually be used for private @cma
+ *
+ * @cma:	contiguous memory region for which the allocation is performed.
+ * @align:	requested alignment of pages (in PAGE_SIZE order).
+ * @fast:	will skip costly opeartions if it's true.
+ * @order:	requested page order
+ * @nr_requests: the number of 2^order pages requested to be allocated a=
s input,
+ * @page_array:	page_array pointer to store allocated pages (must have s=
pace
+ *		for at least nr_requests)
+ * @nr_allocated: the number of 2^order pages allocated as output
+ *
+ * This function tries to allocate up to @nr_requests @order pages on sp=
ecific
+ * contiguous memory area. If @fast has true, it will avoid costly funct=
ions
+ * to increase allocation success ratio so it will be faster but might r=
eturn
+ * less than requested number of pages. User could retry it with true if=
 it is
+ * needed.
+ *
+ * Return: it will return 0 only if all pages requested by @nr_requestse=
d are
+ * allocated. Otherwise, it returns negative error code.
+ *
+ * Note: Regardless of success/failure, user should check @nr_allocated =
to see
+ * how many @order pages are allocated and free those pages when they ar=
e not
+ * needed.
+ */
+int cma_alloc_bulk(struct cma *cma, unsigned int align, bool fast,
+			unsigned int order, size_t nr_requests,
+			struct page **page_array, size_t *nr_allocated)
+{
+	int ret =3D 0;
+	size_t i =3D 0;
+	unsigned long nr_pages_needed =3D nr_requests * (1 << order);
+	unsigned long nr_chunk_pages, nr_pages;
+	unsigned long mask, offset;
+	unsigned long pfn =3D -1;
+	unsigned long start =3D 0;
+	unsigned long bitmap_maxno, bitmap_no, bitmap_count;
+	struct page *page =3D NULL;
+	enum alloc_contig_mode mode =3D fast ? ALLOC_CONTIG_FAST :
+						ALLOC_CONTIG_NORMAL;
+	*nr_allocated =3D 0;
+	if (!cma || !cma->count || !cma->bitmap || !page_array)
+		return -EINVAL;
+
+	if (!nr_pages_needed)
+		return 0;
+
+	nr_chunk_pages =3D 1 << max_t(unsigned int, order, pageblock_order);
+
+	mask =3D cma_bitmap_aligned_mask(cma, align);
+	offset =3D cma_bitmap_aligned_offset(cma, align);
+	bitmap_maxno =3D cma_bitmap_maxno(cma);
+
+	lru_add_drain_all();
+	drain_all_pages(NULL);
+
+	while (nr_pages_needed) {
+		nr_pages =3D min(nr_chunk_pages, nr_pages_needed);
+
+		bitmap_count =3D cma_bitmap_pages_to_bits(cma, nr_pages);
+		mutex_lock(&cma->lock);
+		bitmap_no =3D bitmap_find_next_zero_area_off(cma->bitmap,
+				bitmap_maxno, start, bitmap_count, mask,
+				offset);
+		if (bitmap_no >=3D bitmap_maxno) {
+			mutex_unlock(&cma->lock);
+			break;
+		}
+		bitmap_set(cma->bitmap, bitmap_no, bitmap_count);
+		/*
+		 * It's safe to drop the lock here. If the migration fails
+		 * cma_clear_bitmap will take the lock again and unmark it.
+		 */
+		mutex_unlock(&cma->lock);
+
+		pfn =3D cma->base_pfn + (bitmap_no << cma->order_per_bit);
+		ret =3D alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_CMA,
+						GFP_KERNEL|__GFP_NOWARN, mode);
+		if (ret) {
+			cma_clear_bitmap(cma, pfn, nr_pages);
+			if (ret !=3D -EBUSY)
+				break;
+
+			/* continue to search next block */
+			start =3D (pfn + nr_pages - cma->base_pfn) >>
+						cma->order_per_bit;
+			continue;
+		}
+
+		page =3D pfn_to_page(pfn);
+		while (nr_pages) {
+			page_array[i++] =3D page;
+			reset_page_kasan_tag(page, 1 << order);
+			page +=3D 1 << order;
+			nr_pages -=3D 1 << order;
+			nr_pages_needed -=3D 1 << order;
+		}
+
+		start =3D bitmap_no + bitmap_count;
+	}
+
+	*nr_allocated =3D i;
+
+	if (!ret && nr_pages_needed)
+		ret =3D -EBUSY;
+
+	return ret;
+}
+
 /**
  * cma_release() - release allocated pages
  * @cma:   Contiguous memory region for which the allocation is performe=
d.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index adfbfd95fbc3..2a1799ff14fc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8463,7 +8463,8 @@ static unsigned long pfn_max_align_up(unsigned long=
 pfn)
=20
 /* [start, end) must belong to a single zone. */
 static int __alloc_contig_migrate_range(struct compact_control *cc,
-					unsigned long start, unsigned long end)
+					unsigned long start, unsigned long end,
+					unsigned int max_tries)
 {
 	/* This function is based on compact_zone() from compaction.c. */
 	unsigned int nr_reclaimed;
@@ -8491,7 +8492,7 @@ static int __alloc_contig_migrate_range(struct comp=
act_control *cc,
 				break;
 			}
 			tries =3D 0;
-		} else if (++tries =3D=3D 5) {
+		} else if (++tries =3D=3D max_tries) {
 			ret =3D ret < 0 ? ret : -EBUSY;
 			break;
 		}
@@ -8553,6 +8554,7 @@ int alloc_contig_range(unsigned long start, unsigne=
d long end,
 	unsigned long outer_start, outer_end;
 	unsigned int order;
 	int ret =3D 0;
+	bool fast_mode =3D mode =3D=3D ALLOC_CONTIG_FAST;
=20
 	struct compact_control cc =3D {
 		.nr_migratepages =3D 0,
@@ -8595,7 +8597,8 @@ int alloc_contig_range(unsigned long start, unsigne=
d long end,
 	if (ret)
 		return ret;
=20
-	drain_all_pages(cc.zone);
+	if (!fast_mode)
+		drain_all_pages(cc.zone);
=20
 	/*
 	 * In case of -EBUSY, we'd like to know which page causes problem.
@@ -8607,7 +8610,7 @@ int alloc_contig_range(unsigned long start, unsigne=
d long end,
 	 * allocated.  So, if we fall through be sure to clear ret so that
 	 * -EBUSY is not accidentally used or returned to caller.
 	 */
-	ret =3D __alloc_contig_migrate_range(&cc, start, end);
+	ret =3D __alloc_contig_migrate_range(&cc, start, end, fast_mode ? 1 : 5=
);
 	if (ret && ret !=3D -EBUSY)
 		goto done;
 	ret =3D0;
@@ -8629,7 +8632,8 @@ int alloc_contig_range(unsigned long start, unsigne=
d long end,
 	 * isolated thus they won't get removed from buddy.
 	 */
=20
-	lru_add_drain_all();
+	if (!fast_mode)
+		lru_add_drain_all();
=20
 	order =3D 0;
 	outer_start =3D start;
@@ -8656,8 +8660,9 @@ int alloc_contig_range(unsigned long start, unsigne=
d long end,
=20
 	/* Make sure the range is really isolated. */
 	if (test_pages_isolated(outer_start, end, 0)) {
-		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
-			__func__, outer_start, end);
+		if (!fast_mode)
+			pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
+				__func__, outer_start, end);
 		ret =3D -EBUSY;
 		goto done;
 	}
--=20
2.29.2.454.gaff20da3a2-goog