From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933228AbWFZXSG (ORCPT ); Mon, 26 Jun 2006 19:18:06 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S933236AbWFZXSE (ORCPT ); Mon, 26 Jun 2006 19:18:04 -0400 Received: from pop5-1.us4.outblaze.com ([205.158.62.125]:63155 "HELO pop5-1.us4.outblaze.com") by vger.kernel.org with SMTP id S933257AbWFZXRk (ORCPT ); Mon, 26 Jun 2006 19:17:40 -0400 From: Nigel Cunningham Reply-To: nigel@suspend2.net To: "Rafael J. Wysocki" Subject: Re: [Suspend2][] [Suspend2] Dynamically allocated pageflags Date: Tue, 27 Jun 2006 09:17:32 +1000 User-Agent: KMail/1.9.1 Cc: linux-kernel@vger.kernel.org References: <20060626165209.10918.86526.stgit@nigel.suspend2.net> <20060626165211.10918.58434.stgit@nigel.suspend2.net> <200606262311.50846.rjw@sisk.pl> In-Reply-To: <200606262311.50846.rjw@sisk.pl> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart2357383.hNUFaYYRLM"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200606270917.36229.nigel@suspend2.net> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org --nextPart2357383.hNUFaYYRLM Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi. On Tuesday 27 June 2006 07:11, Rafael J. Wysocki wrote: > On Monday 26 June 2006 18:52, Nigel Cunningham wrote: > > Add support for dynamically allocated pageflags - bitmaps consisting of > > single pages merged together into per zone bitmaps that can then be used > > as temporary page flags. > > IIRC, this has already been discussed on LKML and some people argued it > would be overkill. Could you please say why you think it's not so? Some people did, but IIRC, Dave Jones liked the general idea. The main=20 rationale was that we have a limited number of pageflags, and this provides= a=20 way that we can easily add new ones without expanding struct page. Obviousl= y=20 it's better for usage patterns like suspend2. The other (suspend2 specific) advantage is that having them stored like thi= s=20 makes storing the image data much simpler and while we're calculating the=20 contents of the image, there is zero impact on the size of this metadata=20 storage from changes in the pages that are in each pagedir. > Well, having read the entire patch I think I'd do this in a different way. > Namely, we can always use pfn_to_page() to get the struct page > corresponding to given PFN, so we can enumerate pages from 0 to max_pfn a= nd > create a simple bitmap with one or more bits per PFN. The only difficulty > I see wrt this is to make sure 0-order allocations are used for the > bitmaps. I used to do that, but it arm pfns don't start at zero and it would leave=20 gaps/wastage in the discontig mem case. > Anyway you can find some comments below. > > Greetings, > Rafael > > > Signed-off-by: Nigel Cunningham > > > > include/linux/dyn_pageflags.h | 66 ++++++++ > > lib/Kconfig | 3 > > lib/Makefile | 2 > > lib/dyn_pageflags.c | 330 > > +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 401 > > insertions(+), 0 deletions(-) > > > > diff --git a/include/linux/dyn_pageflags.h > > b/include/linux/dyn_pageflags.h new file mode 100644 > > index 0000000..1a7b5d8 > > --- /dev/null > > +++ b/include/linux/dyn_pageflags.h > > @@ -0,0 +1,66 @@ > > +/* > > + * include/linux/dyn_pageflags.h > > + * > > + * Copyright (C) 2004-2006 Nigel Cunningham > > + * > > + * This file is released under the GPLv2. > > + * > > + * It implements support for dynamically allocated bitmaps that are > > + * used for temporary or infrequently used pageflags, in lieu of > > + * bits in the struct page flags entry. > > + */ > > + > > +#ifndef DYN_PAGEFLAGS_H > > +#define DYN_PAGEFLAGS_H > > + > > +#include > > + > > +typedef unsigned long *** dyn_pageflags_t; > > Is this really necessary to define the dyn_pageflags_t here? This will also be used in Suspend2's kernel/power/pageflags.c, which adds=20 support for saving the pageflags in the image header and reloading and=20 relocating them at resume time. > > + > > +#define BITNUMBER(page) (page_to_pfn(page)) > > I think it would be cleaner to use page_to_pfn(page) verbatim instead of > this. I was going for readability. > > + > > +#if BITS_PER_LONG =3D=3D 32 > > +#define UL_SHIFT 5 > > +#else > > +#if BITS_PER_LONG =3D=3D 64 > > +#define UL_SHIFT 6 > > +#else > > +#error Bits per long not 32 or 64? > > +#endif > > +#endif > > + > > +#define BIT_NUM_MASK (sizeof(unsigned long) * 8 - 1) > > +#define PAGE_NUM_MASK (~((1 << (PAGE_SHIFT + 3)) - 1)) > > +#define UL_NUM_MASK (~(BIT_NUM_MASK | PAGE_NUM_MASK)) > > + > > +#define BITS_PER_PAGE (PAGE_SIZE << 3) > > +#define PAGENUMBER(zone_offset) (zone_offset >> (PAGE_SHIFT + 3)) > > +#define PAGEINDEX(zone_offset) ((zone_offset & UL_NUM_MASK) >> UL_SHIF= T) > > +#define PAGEBIT(zone_offset) (zone_offset & BIT_NUM_MASK) > > + > > +#define PAGE_UL_PTR(bitmap, zone_num, zone_pfn) \ > > + ((bitmap[zone_num][PAGENUMBER(zone_pfn)])+PAGEINDEX(zone_pfn)) > > All of the above definitions seem to be related in an opaque way. Any > chance to make them a bit more clearer? A comment, maybe? Ok. > > + > > +/* With the above macros defined, you can do... > > + > > +#define PagePageset1(page) (test_dynpageflag(&pageset1_map, page)) > > +#define SetPagePageset1(page) (set_dynpageflag(&pageset1_map, page)) > > +#define ClearPagePageset1(page) (clear_dynpageflag(&pageset1_map, page= )) > > +*/ > > I'd choose different names for these. Also the definitions shouldn't go > before test/set/clear_dynpageflag() are defined, IMO. I was seeking to make them similar to the real pageflags semantics. Did you= =20 notice that this is a comment? Will rearrange anyway. > > + > > +#define BITMAP_FOR_EACH_SET(bitmap, counter) \ > > + for (counter =3D get_next_bit_on(bitmap, -1); counter > -1; \ > > + counter =3D get_next_bit_on(bitmap, counter)) > > + > > +extern void clear_dyn_pageflags(dyn_pageflags_t pagemap); > > +extern int allocate_dyn_pageflags(dyn_pageflags_t *pagemap); > > +extern void free_dyn_pageflags(dyn_pageflags_t *pagemap); > > +extern int dyn_pageflags_pages_per_bitmap(void); > > +extern int get_next_bit_on(dyn_pageflags_t bitmap, int counter); > > +extern unsigned long *dyn_pageflags_ul_ptr(dyn_pageflags_t *bitmap, > > + struct page *pg); > > + > > +extern int test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page= ); > > +extern void set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page= ); > > +extern void clear_dynpageflag(dyn_pageflags_t *bitmap, struct page > > *page); +#endif > > diff --git a/lib/Kconfig b/lib/Kconfig > > index 3de9335..5596bd8 100644 > > --- a/lib/Kconfig > > +++ b/lib/Kconfig > > @@ -38,6 +38,9 @@ config LIBCRC32C > > require M here. See Castagnoli93. > > Module will be libcrc32c. > > > > +config DYN_PAGEFLAGS > > + bool > > + > > # > > # compression support is select'ed if needed > > # > > diff --git a/lib/Makefile b/lib/Makefile > > index b830c9a..8290e9b 100644 > > --- a/lib/Makefile > > +++ b/lib/Makefile > > @@ -31,6 +31,8 @@ ifneq ($(CONFIG_HAVE_DEC_LOCK),y) > > lib-y +=3D dec_and_lock.o > > endif > > > > +obj-$(CONFIG_DYN_PAGEFLAGS) +=3D dyn_pageflags.o > > + > > obj-$(CONFIG_CRC_CCITT) +=3D crc-ccitt.o > > obj-$(CONFIG_CRC16) +=3D crc16.o > > obj-$(CONFIG_CRC32) +=3D crc32.o > > diff --git a/lib/dyn_pageflags.c b/lib/dyn_pageflags.c > > new file mode 100644 > > index 0000000..4da52f5 > > --- /dev/null > > +++ b/lib/dyn_pageflags.c > > @@ -0,0 +1,330 @@ > > +/* > > + * lib/dyn_pageflags.c > > + * > > + * Copyright (C) 2004-2006 Nigel Cunningham > > + * > > + * This file is released under the GPLv2. > > + * > > + * Routines for dynamically allocating and releasing bitmaps > > + * used as pseudo-pageflags. > > + * > > + * Arrays are not contiguous. The first sizeof(void *) bytes are > > + * the pointer to the next page in the bitmap. This allows us to > > + * work under low memory conditions where order 0 might be all > > + * that's available. In their original use (suspend2), it also > > + * lets us save the pages at suspend time, reload and relocate them > > + * as necessary at resume time without much effort. > > + * > > + */ > > + > > +#include > > +#include > > +#include > > + > > +#define page_to_zone_offset(pg) (page_to_pfn(pg) - > > page_zone(pg)->zone_start_pfn) + > > +/* > > + * num_zones > > + * > > + * How many zones are there? > > + * > > + */ > > + > > +static int num_zones(void) > > +{ > > + int result =3D 0; > > + struct zone *zone; > > + > > + for_each_zone(zone) > > + result++; > > + > > + return result; > > +} > > Do we really need this? Well, I need to know how many zones there are, and haven't seen a generic=20 function. If you know of one, please point me to it. (Or I could shift this= =20 somewhere else and make it a generic function - I'm aware that I have=20 something similar in pageflags.c). > > + > > +/* > > + * pages_for_zone(struct zone *zone) > > + * > > + * How many pages do we need for a bitmap for this zone? > > + * > > + */ > > + > > +static int pages_for_zone(struct zone *zone) > > +{ > > + return (zone->spanned_pages + (PAGE_SIZE << 3) - 1) >> > > + (PAGE_SHIFT + 3); > > +} > > Could you please explain the maths above? Sure. Page_size << 3 is the number of bits in a page. We are calculating th= e=20 number of pages needed to store spanned_pages bits, which is (spanned_pages= +=20 bits_per_page - 1) / (bits_per_page - 1). (Rounds up). > > + > > +/* > > + * page_zone_number(struct page *page) > > + * > > + * Which zone index does the page match? > > + * > > + */ > > + > > +static int page_zone_number(struct page *page) > > +{ > > + struct zone *zone, *zone_sought =3D page_zone(page); > > + int zone_num =3D 0; > > + > > + for_each_zone(zone) > > + if (zone =3D=3D zone_sought) > > + return zone_num; > > + else > > + zone_num++; > > + > > + printk("Was looking for a zone for page %p.\n", page); > > + BUG_ON(1); > > + > > + return 0; > > +} > > Why can't we use page_zonenum() instead of this? They will only be the same thing in the x86 case. If you have multiple zone= s=20 and some don't have any dma pages or highmem pages, this will give a more=20 compact numbering. > > + > > +/* > > + * dyn_pageflags_pages_per_bitmap > > + * > > + * Number of pages needed for a bitmap covering all zones. > > + * > > + */ > > +int dyn_pageflags_pages_per_bitmap(void) > > +{ > > + int total =3D 0; > > + struct zone *zone; > > + > > + for_each_zone(zone) > > + total +=3D pages_for_zone(zone); > > + > > + return total; > > +} > > Why is a separate function needed for this? Readability. Actually, answering this made me check again, and I couldn't f= ind=20 a user of this function. I'll remove it. > > + > > +/* > > + * clear_dyn_pageflags(dyn_pageflags_t pagemap) > > + * > > + * Clear an array used to store local page flags. > > + * > > + */ > > + > > +void clear_dyn_pageflags(dyn_pageflags_t pagemap) > > +{ > > + int i =3D 0, zone_num =3D 0; > > + struct zone *zone; > > + > > + BUG_ON(!pagemap); > > + > > + for_each_zone(zone) { > > + for (i =3D 0; i < pages_for_zone(zone); i++) > > + memset((pagemap[zone_num][i]), 0, PAGE_SIZE); > > + zone_num++; > > + } > > +} > > I think this could be done in a simpler way too. > > > + > > +/* > > + * allocate_dyn_pageflags(dyn_pageflags_t *pagemap) > > + * > > + * Allocate a bitmap for dynamic page flags. > > + * > > + */ > > +int allocate_dyn_pageflags(dyn_pageflags_t *pagemap) > > +{ > > + int i, zone_num =3D 0; > > + struct zone *zone; > > + > > + BUG_ON(*pagemap); > > + > > + *pagemap =3D kmalloc(sizeof(void *) * num_zones(), GFP_ATOMIC); > > By using kmalloc(), you use slab. I'd rather allocate entire pages. You might only be using 20 bytes out of the page. At resume time, I do use= =20 entire pages if the kmalloc'd value needs relocating, but only if that's=20 needed. > > + > > + if (!*pagemap) > > + return -ENOMEM; > > + > > + for_each_zone(zone) { > > + int zone_pages =3D pages_for_zone(zone); > > + (*pagemap)[zone_num] =3D kmalloc(sizeof(void *) * zone_pages, > > + GFP_ATOMIC); > > + > > + if (!(*pagemap)[zone_num]) { > > + kfree (*pagemap); > > This seems to leak memory, eg. if *pagemap[0] is allocated, but the > allocation of *pagemap[1] fails. You should free *pagemap[j] up to > zone_num-1 and then *pagemap, IMO. Yes, I should. Thanks. > > + return -ENOMEM; > > + } > > + > > + for (i =3D 0; i < zone_pages; i++) { > > + unsigned long address =3D get_zeroed_page(GFP_ATOMIC); > > + (*pagemap)[zone_num][i] =3D (unsigned long *) address; > > + if (!(*pagemap)[zone_num][i]) { > > + printk("Error. Unable to allocate memory for " > > + "dynamic pageflags."); > > + free_dyn_pageflags(pagemap); > > + return -ENOMEM; > > + } > > + } > > + zone_num++; > > + } > > + > > + return 0; > > +} > > + > > +/* > > + * free_dyn_pageflags(dyn_pageflags_t *pagemap) > > + * > > + * Free a dynamically allocated pageflags bitmap. For Suspend2 usage, = we > > + * support data being relocated from slab to pages that don't conflict > > + * with the image that will be copied back. This is the reason for the > > + * PageSlab tests below. > > + * > > + */ > > +void free_dyn_pageflags(dyn_pageflags_t *pagemap) > > +{ > > + int i =3D 0, zone_num =3D 0; > > + struct zone *zone; > > + > > + if (!*pagemap) > > + return; > > + > > + for_each_zone(zone) { > > + int zone_pages =3D pages_for_zone(zone); > > + > > + if (!((*pagemap)[zone_num])) > > + continue; > > + for (i =3D 0; i < zone_pages; i++) > > + if ((*pagemap)[zone_num][i]) > > + free_page((unsigned long) (*pagemap)[zone_num][i]); > > + > > + if (PageSlab(virt_to_page((*pagemap)[zone_num]))) > > + kfree((*pagemap)[zone_num]); > > + else > > + free_page((unsigned long) (*pagemap)[zone_num]); > > + > > + zone_num++; > > + } > > + > > + if (PageSlab(virt_to_page((*pagemap)))) > > + kfree(*pagemap); > > + else > > + free_page((unsigned long) (*pagemap)); > > Why so? See above (Suspend2 relocation at resume time). > > + > > + *pagemap =3D NULL; > > + return; > > +} > > + > > +/* > > + * dyn_pageflags_ul_ptr(dyn_pageflags_t *bitmap, struct page *pg) > > + * > > + * Get a pointer to the unsigned long containing the flag in the bitmap > > + * for the given page. > > + * > > + */ > > + > > +unsigned long *dyn_pageflags_ul_ptr(dyn_pageflags_t *bitmap, struct pa= ge > > *pg) +{ > > + int zone_pfn =3D page_to_zone_offset(pg); > > + int zone_num =3D page_zone_number(pg); > > + int pagenum =3D PAGENUMBER(zone_pfn); > > + int page_offset =3D PAGEINDEX(zone_pfn); > > + return ((*bitmap)[zone_num][pagenum]) + page_offset; > > +} > > It doesn't look very efficient, does it? ;-) Readability again. I'm assuming the compiler is semi intelligent :) > > + > > +/* > > + * test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) > > + * > > + * Is the page flagged in the given bitmap? > > + * > > + */ > > + > > +int test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) > > +{ > > + unsigned long *ul =3D dyn_pageflags_ul_ptr(bitmap, page); > > + int zone_offset =3D page_to_zone_offset(page); > > + int bit =3D PAGEBIT(zone_offset); > > + > > + return test_bit(bit, ul); > > +} > > + > > +/* > > + * set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) > > + * > > + * Set the flag for the page in the given bitmap. > > + * > > + */ > > + > > +void set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) > > +{ > > + unsigned long *ul =3D dyn_pageflags_ul_ptr(bitmap, page); > > + int zone_offset =3D page_to_zone_offset(page); > > + int bit =3D PAGEBIT(zone_offset); > > + set_bit(bit, ul); > > +} > > + > > +/* > > + * clear_dynpageflags(dyn_pageflags_t *bitmap, struct page *page) > > + * > > + * Clear the flag for the page in the given bitmap. > > + * > > + */ > > + > > +void clear_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) > > +{ > > + unsigned long *ul =3D dyn_pageflags_ul_ptr(bitmap, page); > > + int zone_offset =3D page_to_zone_offset(page); > > + int bit =3D PAGEBIT(zone_offset); > > + clear_bit(bit, ul); > > +} > > + > > +/* > > + * get_next_bit_on(dyn_pageflags_t bitmap, int counter) > > + * > > + * Given a pfn (possibly -1), find the next pfn in the bitmap that > > + * is set. If there are no more flags set, return -1. > > + * > > + */ > > + > > +int get_next_bit_on(dyn_pageflags_t bitmap, int counter) > > +{ > > + struct page *page; > > + struct zone *zone; > > + unsigned long *ul =3D NULL; > > + int zone_offset, pagebit, zone_num, first; > > + > > + first =3D (counter =3D=3D -1); > > + > > + if (first) > > + counter =3D first_online_pgdat()->node_zones->zone_start_pfn; > > + > > + page =3D pfn_to_page(counter); > > + zone =3D page_zone(page); > > + zone_num =3D page_zone_number(page); > > + zone_offset =3D counter - zone->zone_start_pfn; > > + > > + if (first) { > > + counter--; > > + zone_offset--; > > + } else > > + ul =3D (bitmap[zone_num][PAGENUMBER(zone_offset)]) + > > PAGEINDEX(zone_offset); + > > + do { > > + counter++; > > + zone_offset++; > > + > > + if (zone_offset >=3D zone->spanned_pages) { > > + do { > > + zone =3D next_zone(zone); > > + if (!zone) > > + return -1; > > + zone_num++; > > + } while(!zone->spanned_pages); > > + > > + counter =3D zone->zone_start_pfn; > > + zone_offset =3D 0; > > + } > > + > > + /* > > + * This could be optimised, but there are more > > + * important things and the code is simple at > > + * the moment > > + */ > > Please let me disagree with this comment. I think it would be more > efficient if you searched the bitmap for the first zero bit and then find > the page, pfn, whatever corresponding to this bit. Yes, it would be more efficient. I don't mind modifying it. This is a=20 different moment to the one when I wrote that :) > > + pagebit =3D PAGEBIT(zone_offset); > > + > > + if (!pagebit) > > + ul =3D (bitmap[zone_num][PAGENUMBER(zone_offset)]) + > > PAGEINDEX(zone_offset); + > > + } while(!test_bit(pagebit, ul)); > > + > > + return counter; > > +} > > + Once again, Thanks for your feedback. It's good to get picked up on cleanups I've misse= d=20 and be made to remember again and provide the justification for decisions=20 I've made. Regards, Nigel =2D-=20 See http://www.suspend2.net for Howtos, FAQs, mailing lists, wiki and bugzilla info. --nextPart2357383.hNUFaYYRLM Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQBEoGsQN0y+n1M3mo0RAh+sAKCE2QvrRtiq6evqNR2mtOuSk0c36wCgrNtF TnQ6XvR+/TPJLxm4REgTIro= =JsQ5 -----END PGP SIGNATURE----- --nextPart2357383.hNUFaYYRLM--