[RFC] buddy allocator without bitmap(2) [0/3]

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
To: Linux Kernel ML <linux-kernel@vger.kernel.org>
Cc: linux-mm <linux-mm@kvack.org>, LHMS <lhms-devel@lists.sourceforge.net>
Subject: [RFC] buddy allocator without bitmap(2) [0/3]
Date: Tue, 31 Aug 2004 19:36:01 +0900	[thread overview]
Message-ID: <41345491.1020209@jp.fujitsu.com> (raw)

Hi,
thanks for many advices on previous RFC.

This new patch is against 2.6.9-rc1-mm1.
A new algorithm is implemented.

Considering advises on previous patch, I think the problems of that was
(1) vagueness of zone->aligned_order
(2) vagueness of zone->nr_mem_map
(3) using pfn_valid() in the core of buddy allocator only for rare case.
(ex) readability ;)

I removed these (1)-(3) and this version is not so micro-optimized.
I think inner-loop of free_pages_bulk() becomes simpler.

I added one more PG_xxx flag, PG_buddyend and reserve one page as a stopper.
PG_buddyend is set in boot-time and never cleared. This flags works as a
lower-address stopper of buddy allocator.The reserved one page works as a
higher-address stopper. "No accessing not-existing page" is guaranteed by these two.
About a usage of PG_buddyend flag, please read patch.
There is a sample illustration in attached patch.

Advantage:
  - information about buddy system is fully stored in the mem_map.
  - no bitmap
  - can work well on discontiguous mem_map.
Disadvantage:
  - using one more PG_xxx flag.
  - If mem_map is not aligned, reserve one page as a victim for buddy allocater.

How about this approach ?

Regards
-- Kame

This patch removes bitmap from buddy allocator,
removes free_area_t's bitmap in include/linux/mmzone.h
and adds some definition in include/linux/mm.h

This is the 1st patch.

Currently,Linux's page allocator uses buddy algorithm and codes for buddy
allocator uses bitmaps. For what is bitmaps are used ?

(*) for recording "a page is free" and page's order.

here, page's order means size of contiguous free pages.
if a free page[x] 's order is Y, there are contiguous free pages
among page[X] to page[X + 2^(Y) - 1]

If a page is free and is a head of contiguous free pages of order 'X',
we can record it by
set_bit(free_area[X]->map, index of page[X] in this order)

For coalescing, when there is a chunk of free pages of order 'X',
we can test whether we can coalesce or not by,
test_bit(free_aera[X]->bitmap,index_of_buddy)
index_of_buddy can be calculated by (index_of_page ^ (1 << order))

This patch removes bitmap and recording a free page's order
in its page structure's page->private field. If a page is free and
it is a head of a free contiguous memory chunk, page->private indicates
the order of the page.PG_private bit is used to show propriety of this field.

For coalescing, when there is a page which is a chunk of contiguous free pages
of order 'X', we can test whether the page is to be coalesced or not by
(page_is_free(buddy) && PagePrivate(buddy) && page_order(buddy) == 'X')
address of buddy can be calculated by the same way in bitmap case.

If page is free and on the buddy system, PG_private bit is set
and has its order in page->private. This scheme is safe because...
(a) when page is being freed, PG_private is not set. (see free_pages_check())
(b) when page is free and on the buddy system, PG_private is set.
These facts are guaranteed by zone->lock.
Only one thread can change a free page's PG_private bit and private field
at anytime.

[ Not MAX_ORDER aligned memory ]
If all memory are aligned to system's MAX_ORDER, all buddy algorithm works
fine and there is no trouble.But if memory is not aligned, we must check
"whether buddy exists or not?".
Checking this in main-loop of free_pages_bulk() is ugly but a page which
has no buddy in some order can be calculated in boot time.

New PG_xxx flag, PG_buddyend flag is introduced.
This flag works as a stopper of buddy coalescing.

A page is on the lower address end of mem_map and it cannot have
its lower address buddy is marked as PG_buddyend. Please see below.

If mem_map's start address is aligned, pages & buddy looks like this

order   start_pfn   ->                     higher address
         -------------------------------------
0       |  |  |  |  |  |  |  |  |  |  |  |  | ..
         -------------------------------------
1       |     |     |     |     |     |     |
         -------------------------------------
2       |           |           |           |
         -------------------------------------
3       |                       |
         -------------------------------------

If mem_map start address is not aligned, some of "lower address buddy"
disappears.

            ----------------------------------
0          |X |  |  |  |  |  |  |  |  |  |  | ..
            ----------------------------------
1             |X    |     |     |     |     |
               -------------------------------
2                   |X          |           |
                     -------------------------
3                               |X
                                 -------------

A group of pages marked 'X' in each order never has its lower address buddy.
'X' pages are marked as PG_buddyend.

Now,this check can work well enough to avoid coalescing not aligned pages.
-----------------------
buddy_index = page_index ^ (1 << order)
if ((buddy_index < page_index)  &&
     PageBuddyend(base + page_index))
       stop coalescing.
-----------------------

Above is for lower address case, How about higher address ?
If higher end of mem_map is not aligned to MAX_ORDER, we reserve
the highest address page and don't put it into buddy system. (don't pass to
free_pages())
This one page is a victim for buddy allocator, and this is enough.

-- Kame

---

  linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mm.h         |   24 ++++++++++++++
  linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mmzone.h     |    1
  linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/page-flags.h |   10 +++++
  3 files changed, 34 insertions(+), 1 deletion(-)

diff -puN include/linux/mm.h~eliminate-bitmap-includes include/linux/mm.h
--- linux-2.6.9-rc1-mm1-k/include/linux/mm.h~eliminate-bitmap-includes	2004-08-31 18:37:04.186101664 +0900
+++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mm.h	2004-08-31 18:37:04.194100448 +0900
@@ -209,6 +209,9 @@ struct page {
  					 * usually used for buffer_heads
  					 * if PagePrivate set; used for
  					 * swp_entry_t if PageSwapCache
+					 * When page is free:
+					 * this indicates order of page
+					 * in buddy allocator.
  					 */
  	struct address_space *mapping;	/* If low bit clear, points to
  					 * inode address_space, or NULL.
@@ -322,6 +325,27 @@ static inline void put_page(struct page
  #endif		/* CONFIG_HUGETLB_PAGE */

  /*
+ * These functions are used in alloc_pages()/free_pages(), buddy allocator.
+ * page_order(page) returns an order of a free page in buddy allocator.
+ *
+ * this is used with PG_private flag
+ *
+ * Note : all PG_private operations used in buddy system is done while
+ * zone->lock is acquired. So set and clear PG_private bit operation
+ * does not need to be atomic.
+ */
+
+static inline int page_order(struct page *page)
+{
+	return page->private;
+}
+
+static inline void set_page_order(struct page *page, int order)
+{
+	page->private = order;
+}
+
+/*
   * Multiple processes may "see" the same page. E.g. for untouched
   * mappings of /dev/null, all processes see the same page full of
   * zeroes, and text pages of executables and shared libraries have
diff -puN include/linux/mmzone.h~eliminate-bitmap-includes include/linux/mmzone.h
--- linux-2.6.9-rc1-mm1-k/include/linux/mmzone.h~eliminate-bitmap-includes	2004-08-31 18:37:04.188101360 +0900
+++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mmzone.h	2004-08-31 18:37:04.195100296 +0900
@@ -22,7 +22,6 @@

  struct free_area {
  	struct list_head	free_list;
-	unsigned long		*map;
  };

  struct pglist_data;
diff -puN include/linux/page-flags.h~eliminate-bitmap-includes include/linux/page-flags.h
--- linux-2.6.9-rc1-mm1-k/include/linux/page-flags.h~eliminate-bitmap-includes	2004-08-31 18:37:04.190101056 +0900
+++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/page-flags.h	2004-08-31 18:37:04.196100144 +0900
@@ -44,6 +44,12 @@
   * space, they need to be kmapped separately for doing IO on the pages.  The
   * struct page (these bits with information) are always mapped into kernel
   * address space...
+ *
+ * PG_buddyend pages don't have its buddy in buddy allocator in some meaning.
+ * If a page is on the lower address end of mem_map and a buddy of it, in some
+ * order, is below the end, a page is marked as PG_buddyend. If a page is on
+ * the higher addres end of mem_map and a buddy of it,in some order, is above
+ * the end, a page is marked as PG_buddyend.
   */

  /*
@@ -75,6 +81,7 @@
  #define PG_mappedtodisk		17	/* Has blocks allocated on-disk */
  #define PG_reclaim		18	/* To be reclaimed asap */

+#define PG_buddyend             19      /* end of the buddy allocator */

  /*
   * Global page accounting.  One instance per CPU.  Only unsigned longs are
@@ -290,6 +297,9 @@ extern unsigned long __read_page_state(u
  #define SetPageCompound(page)	set_bit(PG_compound, &(page)->flags)
  #define ClearPageCompound(page)	clear_bit(PG_compound, &(page)->flags)

+#define PageBuddyend(page)      test_bit(PG_buddyend, &(page)->flags)
+#define SetPageBuddyend(page)   set_bit(PG_buddyend, &(page)->flags)
+
  #ifdef CONFIG_SWAP
  #define PageSwapCache(page)	test_bit(PG_swapcache, &(page)->flags)
  #define SetPageSwapCache(page)	set_bit(PG_swapcache, &(page)->flags)

_

WARNING: multiple messages have this Message-ID (diff)

From: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
To: Linux Kernel ML <linux-kernel@vger.kernel.org>
Cc: linux-mm <linux-mm@kvack.org>, LHMS <lhms-devel@lists.sourceforge.net>
Subject: [RFC] buddy allocator without bitmap(2) [0/3]
Date: Tue, 31 Aug 2004 19:36:01 +0900	[thread overview]
Message-ID: <41345491.1020209@jp.fujitsu.com> (raw)

Hi,
thanks for many advices on previous RFC.

This new patch is against 2.6.9-rc1-mm1.
A new algorithm is implemented.

Considering advises on previous patch, I think the problems of that was
(1) vagueness of zone->aligned_order
(2) vagueness of zone->nr_mem_map
(3) using pfn_valid() in the core of buddy allocator only for rare case.
(ex) readability ;)

I removed these (1)-(3) and this version is not so micro-optimized.
I think inner-loop of free_pages_bulk() becomes simpler.

I added one more PG_xxx flag, PG_buddyend and reserve one page as a stopper.
PG_buddyend is set in boot-time and never cleared. This flags works as a
lower-address stopper of buddy allocator.The reserved one page works as a
higher-address stopper. "No accessing not-existing page" is guaranteed by these two.
About a usage of PG_buddyend flag, please read patch.
There is a sample illustration in attached patch.

Advantage:
  - information about buddy system is fully stored in the mem_map.
  - no bitmap
  - can work well on discontiguous mem_map.
Disadvantage:
  - using one more PG_xxx flag.
  - If mem_map is not aligned, reserve one page as a victim for buddy allocater.

How about this approach ?

Regards
-- Kame

This patch removes bitmap from buddy allocator,
removes free_area_t's bitmap in include/linux/mmzone.h
and adds some definition in include/linux/mm.h

This is the 1st patch.

Currently,Linux's page allocator uses buddy algorithm and codes for buddy
allocator uses bitmaps. For what is bitmaps are used ?

(*) for recording "a page is free" and page's order.

here, page's order means size of contiguous free pages.
if a free page[x] 's order is Y, there are contiguous free pages
among page[X] to page[X + 2^(Y) - 1]

If a page is free and is a head of contiguous free pages of order 'X',
we can record it by
set_bit(free_area[X]->map, index of page[X] in this order)

For coalescing, when there is a chunk of free pages of order 'X',
we can test whether we can coalesce or not by,
test_bit(free_aera[X]->bitmap,index_of_buddy)
index_of_buddy can be calculated by (index_of_page ^ (1 << order))

This patch removes bitmap and recording a free page's order
in its page structure's page->private field. If a page is free and
it is a head of a free contiguous memory chunk, page->private indicates
the order of the page.PG_private bit is used to show propriety of this field.

For coalescing, when there is a page which is a chunk of contiguous free pages
of order 'X', we can test whether the page is to be coalesced or not by
(page_is_free(buddy) && PagePrivate(buddy) && page_order(buddy) == 'X')
address of buddy can be calculated by the same way in bitmap case.

If page is free and on the buddy system, PG_private bit is set
and has its order in page->private. This scheme is safe because...
(a) when page is being freed, PG_private is not set. (see free_pages_check())
(b) when page is free and on the buddy system, PG_private is set.
These facts are guaranteed by zone->lock.
Only one thread can change a free page's PG_private bit and private field
at anytime.

[ Not MAX_ORDER aligned memory ]
If all memory are aligned to system's MAX_ORDER, all buddy algorithm works
fine and there is no trouble.But if memory is not aligned, we must check
"whether buddy exists or not?".
Checking this in main-loop of free_pages_bulk() is ugly but a page which
has no buddy in some order can be calculated in boot time.

New PG_xxx flag, PG_buddyend flag is introduced.
This flag works as a stopper of buddy coalescing.

A page is on the lower address end of mem_map and it cannot have
its lower address buddy is marked as PG_buddyend. Please see below.

If mem_map's start address is aligned, pages & buddy looks like this

order   start_pfn   ->                     higher address
         -------------------------------------
0       |  |  |  |  |  |  |  |  |  |  |  |  | ..
         -------------------------------------
1       |     |     |     |     |     |     |
         -------------------------------------
2       |           |           |           |
         -------------------------------------
3       |                       |
         -------------------------------------

If mem_map start address is not aligned, some of "lower address buddy"
disappears.

            ----------------------------------
0          |X |  |  |  |  |  |  |  |  |  |  | ..
            ----------------------------------
1             |X    |     |     |     |     |
               -------------------------------
2                   |X          |           |
                     -------------------------
3                               |X
                                 -------------

A group of pages marked 'X' in each order never has its lower address buddy.
'X' pages are marked as PG_buddyend.

Now,this check can work well enough to avoid coalescing not aligned pages.
-----------------------
buddy_index = page_index ^ (1 << order)
if ((buddy_index < page_index)  &&
     PageBuddyend(base + page_index))
       stop coalescing.
-----------------------

Above is for lower address case, How about higher address ?
If higher end of mem_map is not aligned to MAX_ORDER, we reserve
the highest address page and don't put it into buddy system. (don't pass to
free_pages())
This one page is a victim for buddy allocator, and this is enough.

-- Kame

---

  linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mm.h         |   24 ++++++++++++++
  linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mmzone.h     |    1
  linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/page-flags.h |   10 +++++
  3 files changed, 34 insertions(+), 1 deletion(-)

diff -puN include/linux/mm.h~eliminate-bitmap-includes include/linux/mm.h
--- linux-2.6.9-rc1-mm1-k/include/linux/mm.h~eliminate-bitmap-includes	2004-08-31 18:37:04.186101664 +0900
+++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mm.h	2004-08-31 18:37:04.194100448 +0900
@@ -209,6 +209,9 @@ struct page {
  					 * usually used for buffer_heads
  					 * if PagePrivate set; used for
  					 * swp_entry_t if PageSwapCache
+					 * When page is free:
+					 * this indicates order of page
+					 * in buddy allocator.
  					 */
  	struct address_space *mapping;	/* If low bit clear, points to
  					 * inode address_space, or NULL.
@@ -322,6 +325,27 @@ static inline void put_page(struct page
  #endif		/* CONFIG_HUGETLB_PAGE */

  /*
+ * These functions are used in alloc_pages()/free_pages(), buddy allocator.
+ * page_order(page) returns an order of a free page in buddy allocator.
+ *
+ * this is used with PG_private flag
+ *
+ * Note : all PG_private operations used in buddy system is done while
+ * zone->lock is acquired. So set and clear PG_private bit operation
+ * does not need to be atomic.
+ */
+
+static inline int page_order(struct page *page)
+{
+	return page->private;
+}
+
+static inline void set_page_order(struct page *page, int order)
+{
+	page->private = order;
+}
+
+/*
   * Multiple processes may "see" the same page. E.g. for untouched
   * mappings of /dev/null, all processes see the same page full of
   * zeroes, and text pages of executables and shared libraries have
diff -puN include/linux/mmzone.h~eliminate-bitmap-includes include/linux/mmzone.h
--- linux-2.6.9-rc1-mm1-k/include/linux/mmzone.h~eliminate-bitmap-includes	2004-08-31 18:37:04.188101360 +0900
+++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mmzone.h	2004-08-31 18:37:04.195100296 +0900
@@ -22,7 +22,6 @@

  struct free_area {
  	struct list_head	free_list;
-	unsigned long		*map;
  };

  struct pglist_data;
diff -puN include/linux/page-flags.h~eliminate-bitmap-includes include/linux/page-flags.h
--- linux-2.6.9-rc1-mm1-k/include/linux/page-flags.h~eliminate-bitmap-includes	2004-08-31 18:37:04.190101056 +0900
+++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/page-flags.h	2004-08-31 18:37:04.196100144 +0900
@@ -44,6 +44,12 @@
   * space, they need to be kmapped separately for doing IO on the pages.  The
   * struct page (these bits with information) are always mapped into kernel
   * address space...
+ *
+ * PG_buddyend pages don't have its buddy in buddy allocator in some meaning.
+ * If a page is on the lower address end of mem_map and a buddy of it, in some
+ * order, is below the end, a page is marked as PG_buddyend. If a page is on
+ * the higher addres end of mem_map and a buddy of it,in some order, is above
+ * the end, a page is marked as PG_buddyend.
   */

  /*
@@ -75,6 +81,7 @@
  #define PG_mappedtodisk		17	/* Has blocks allocated on-disk */
  #define PG_reclaim		18	/* To be reclaimed asap */

+#define PG_buddyend             19      /* end of the buddy allocator */

  /*
   * Global page accounting.  One instance per CPU.  Only unsigned longs are
@@ -290,6 +297,9 @@ extern unsigned long __read_page_state(u
  #define SetPageCompound(page)	set_bit(PG_compound, &(page)->flags)
  #define ClearPageCompound(page)	clear_bit(PG_compound, &(page)->flags)

+#define PageBuddyend(page)      test_bit(PG_buddyend, &(page)->flags)
+#define SetPageBuddyend(page)   set_bit(PG_buddyend, &(page)->flags)
+
  #ifdef CONFIG_SWAP
  #define PageSwapCache(page)	test_bit(PG_swapcache, &(page)->flags)
  #define SetPageSwapCache(page)	set_bit(PG_swapcache, &(page)->flags)

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next             reply	other threads:[~2004-08-31 10:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-31 10:36 Hiroyuki KAMEZAWA [this message]
2004-08-31 10:36 ` [RFC] buddy allocator without bitmap(2) [0/3] Hiroyuki KAMEZAWA
2004-08-31 16:26 ` Dave Hansen
2004-08-31 16:26   ` Dave Hansen
2004-08-31 22:44   ` [Lhms-devel] " Hiroyuki KAMEZAWA
2004-08-31 22:44     ` Hiroyuki KAMEZAWA
2004-08-31 23:24     ` Andrew Morton
2004-08-31 23:24       ` Andrew Morton
2004-08-31 23:53       ` Hiroyuki KAMEZAWA
2004-08-31 23:53         ` Hiroyuki KAMEZAWA

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41345491.1020209@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=lhms-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.