public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Add alloc_pages_exact() and free_pages_exact()
@ 2008-06-24 16:40 Timur Tabi
  2008-06-24 21:33 ` Dave Hansen
  0 siblings, 1 reply; 3+ messages in thread
From: Timur Tabi @ 2008-06-24 16:40 UTC (permalink / raw)
  To: linux-kernel, andi, randy.dunlap, corbet, torvalds

alloc_pages_exact() is similar to alloc_pages(), except that it allocates
the minimum number of pages to fulfill the request.  This is useful if you
want to allocate a very large buffer that is slightly larger than an
even power-of-two number of pages.  In that case, alloc_pages() will waste
a lot of memory.

Signed-off-by: Timur Tabi <timur@freescale.com>
---

I have a video driver that wants to allocate a 5MB buffer.  alloc_pages()
will waste 3MB of physically-contiguous memory.  Therefore, I would
like to see alloc_pages_exact() added to 2.6.27.

Please note that I am not a Linux VM expert.  I wrote these functions based
on guidance from Andi Kleen.  I have no familiarity with NUMA, so I don't know
how to handle that.  Any and all suggestions are welcome.

 include/linux/gfp.h |    3 ++
 mm/page_alloc.c     |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b414be3..1054801 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -215,6 +215,9 @@ extern struct page *alloc_page_vma(gfp_t gfp_mask,
 extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
 extern unsigned long get_zeroed_page(gfp_t gfp_mask);
 
+void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
+void free_pages_exact(void *virt, size_t size);
+
 #define __get_free_page(gfp_mask) \
 		__get_free_pages((gfp_mask),0)
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2f55295..08bf9d7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1711,6 +1711,59 @@ void free_pages(unsigned long addr, unsigned int order)
 
 EXPORT_SYMBOL(free_pages);
 
+/**
+ * alloc_pages_exact - allocate an exact number physically-contiguous pages.
+ * @size: the number of bytes to allocate
+ * @gfp_mask: GFP flags for the allocation
+ *
+ * This function is similar to alloc_pages(), except that it allocates the
+ * minimum number of pages to satisfy the request.  alloc_pages() can only
+ * allocate memory in power-of-two pages.
+ *
+ * This function is also limited by MAX_ORDER.
+ *
+ * Memory allocated by this function must be released by free_pages_exact().
+ */
+void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
+{
+	unsigned int order = get_order(size);
+	unsigned long addr;
+
+	addr = __get_free_pages(gfp_mask, order);
+	if (addr) {
+		unsigned long alloc_end = addr + (PAGE_SIZE << order);
+		unsigned long used = addr + PAGE_ALIGN(size);
+
+		split_page(virt_to_page(addr), order);
+		while (used < alloc_end) {
+			free_page(used);
+			used += PAGE_SIZE;
+		}
+	}
+
+	return (void *)addr;
+}
+EXPORT_SYMBOL(alloc_pages_exact);
+
+/**
+ * free_pages_exact - release memory allocated via alloc_pages_exact()
+ * @virt: the value returned by alloc_pages_exact.
+ * @size: size of allocation, same value as passed to alloc_pages_exact().
+ *
+ * Release the memory allocated by a previous call to alloc_pages_exact.
+ */
+void free_pages_exact(void *virt, size_t size)
+{
+	unsigned long addr = (unsigned long)virt;
+	unsigned long end = addr + PAGE_ALIGN(size);
+
+	while (addr < end) {
+		free_page(addr);
+		addr += PAGE_SIZE;
+	}
+}
+EXPORT_SYMBOL(free_pages_exact);
+
 static unsigned int nr_free_zone_pages(int offset)
 {
 	struct zoneref *z;
-- 
1.5.5


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] Add alloc_pages_exact() and free_pages_exact()
  2008-06-24 16:40 [PATCH] Add alloc_pages_exact() and free_pages_exact() Timur Tabi
@ 2008-06-24 21:33 ` Dave Hansen
  2008-06-24 21:44   ` Timur Tabi
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Hansen @ 2008-06-24 21:33 UTC (permalink / raw)
  To: Timur Tabi; +Cc: linux-kernel, andi, randy.dunlap, corbet, torvalds

On Tue, 2008-06-24 at 11:40 -0500, Timur Tabi wrote:
> +void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
> +{
> +       unsigned int order = get_order(size);
> +       unsigned long addr;
> +
> +       addr = __get_free_pages(gfp_mask, order);
> +       if (addr) {
> +               unsigned long alloc_end = addr + (PAGE_SIZE << order);
> +               unsigned long used = addr + PAGE_ALIGN(size);
> +
> +               split_page(virt_to_page(addr), order);
> +               while (used < alloc_end) {
> +                       free_page(used);
> +                       used += PAGE_SIZE;
> +               }
> +       }
> +
> +       return (void *)addr;
> +}

Hi Timur,

This looks like a really good idea.  It looks pretty good to me, no
functional problems.  My brain had a really hard time parsing that code
for some reason, though.  Could just be a lack of coffee.

I think the thing that confused me was trying to figure out if
'alloc_end' was the end of what we *did* allocate from
__get_free_pages() or if it was the *goal* allocation end.

'used' also seemed like a slightly strange variable name because it
points to the memory which is about to be freed and ends up *unused*.

I'll offer this up just in case you like it better.  For me, it is
easier to parse, and should do the exact same thing.  I also think it's
slightly nicer to do the arithmetic on 'struct page *' rather than
vaddrs in 'unsigned long'.  It is _slightly_ cheaper not having to do a
virt_to_page() on each free_page() call.  The same would go for the free
side as well.

All of the 'struct page *' arithmetic is OK since it is all done inside
one MAX_ORDER area.

void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
{
	unsigned int order = get_order(size);
	void *alloc;
	struct page *surplus_start;
	struct page *surplus_end;
	struct page *page;

	size = PAGE_ALIGN(size);

	alloc = (void *)__get_free_pages(gfp_mask, order);
	if (!alloc)
		return NULL;

	/* Turn the big allocation into a bunch of single pages */
	split_page(virt_to_page(alloc), order);

	surplus_start = virt_to_page(alloc + size);
	surplus_end = surplus_start + (1 << order);

	for (page = surplus_start; page < surplus_end; page++)
		__free_page(page);

	return alloc;
}

-- Dave


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Add alloc_pages_exact() and free_pages_exact()
  2008-06-24 21:33 ` Dave Hansen
@ 2008-06-24 21:44   ` Timur Tabi
  0 siblings, 0 replies; 3+ messages in thread
From: Timur Tabi @ 2008-06-24 21:44 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, andi, randy.dunlap, corbet, torvalds

Dave Hansen wrote:

> This looks like a really good idea.  It looks pretty good to me, no
> functional problems.  

Thanks.

> I think the thing that confused me was trying to figure out if
> 'alloc_end' was the end of what we *did* allocate from
> __get_free_pages() or if it was the *goal* allocation end.
> 
> 'used' also seemed like a slightly strange variable name because it
> points to the memory which is about to be freed and ends up *unused*.

This function is taken almost verbatim from alloc_large_system_hash().  I
figured if the terminology was good for that function, it's good for mine.

> I'll offer this up just in case you like it better.  For me, it is
> easier to parse, and should do the exact same thing.  I also think it's
> slightly nicer to do the arithmetic on 'struct page *' rather than
> vaddrs in 'unsigned long'.  It is _slightly_ cheaper not having to do a
> virt_to_page() on each free_page() call.  The same would go for the free
> side as well.

It does seem to be an improvement, although AKPM just accepted this patch.  I'd
hate to bother him with a replacement patch for something so minor.

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-06-24 22:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-24 16:40 [PATCH] Add alloc_pages_exact() and free_pages_exact() Timur Tabi
2008-06-24 21:33 ` Dave Hansen
2008-06-24 21:44   ` Timur Tabi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox