public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFT][PATCH 0/3] swsusp: improve freeing of memory
@ 2005-11-12 20:13 Rafael J. Wysocki
  2005-11-12 20:19 ` [RFT][PATCH 1/3] swsusp: remove encryption Rafael J. Wysocki
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Rafael J. Wysocki @ 2005-11-12 20:13 UTC (permalink / raw)
  To: LKML; +Cc: Pavel Machek

Hi,

Currently swsusp frees as much memory as possible during suspend.  This slows
down the suspend and causes the system to be slow after resume due to
the swapping-in activity.  The following series of patches is designed to
change this behavior so that swsusp will free only as much memory as necessary
to complete the suspend.

The essential changes are in the last patch.  However, to make it work
properly on x86-64 it is necessary to get rid of the suspend image size limit
imposed by the size of the swsusp_info structure.  Of course this can be done
in many different ways, but I think it is reasonable to do this in a way which
will allow us to further separate the image-handling and swap-handling
functionalities of swsusp in the future.  Thus I propose to introduce a
separate data structure for handling the swap and to remove the swap-related
fields from the PBEs, which is done in the second patch.

The proposed approach yields some additional benefits, like the following:
1) the amount memory needed to store the list of PBEs (aka pagedir) is
reduced by 1/4,
2) the amount of swap needed to store the image metadata is reduced by 1/2,
3) there's more memory available during suspend (the data structure used
for keepeng track of the data pages written to the swap need not be loaded
into memory during suspend),
4) the size of swsusp_info structure is reduced so it can be merged with the
swsusp_header structure in the future,
5) the swap-handling part need not use any global variables related to the
snapshot data structure,
6) the use of __nosavedata variables is reduced significantly.

The first patch removes the image encryption from swsusp so that
the second patch can be simpler.  Encryption is only used in swsusp instead
of zeroing the image after resume in order to prevent someone from
reading some confidential data from it in the future and it does not protect
the image from being read by an unauthorized person before resume.
In principle the encryption code need not be removed to make the other
changes, but then it would have to be modified and tested.  Moreover,
the functionality it provides should really belong to the user space,
so we (ie. Pavel and me) decided to remove it for now and possibly
reimplement it after the swap-handling functionality of swsusp is
moved to the user space.

The changes made by the patches are substantial, so they have to be
tested thoroughly before they can be merged.  If you can, please test them
and tell me what are your opinions.  I will very much appreciate any
feedback, negative as well as positive.

The patches are against 2.6.14-mm2.  They have been tested on x86-64
and compile tested on i386.  The previous iteration of the patches was
also stress-tested by Pavel on i386.

Greetings,
Rafael


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFT][PATCH 1/3] swsusp: remove encryption
  2005-11-12 20:13 [RFT][PATCH 0/3] swsusp: improve freeing of memory Rafael J. Wysocki
@ 2005-11-12 20:19 ` Rafael J. Wysocki
  2005-11-12 23:42   ` Pavel Machek
  2005-11-12 20:22 ` [RFT][PATCH 2/3] swsusp: introduce the swap map structure Rafael J. Wysocki
  2005-11-12 20:24 ` [RFT][PATCH 3/3] swsusp: improve freeing of memory Rafael J. Wysocki
  2 siblings, 1 reply; 9+ messages in thread
From: Rafael J. Wysocki @ 2005-11-12 20:19 UTC (permalink / raw)
  To: LKML; +Cc: Pavel Machek

This patch removes the image encryption that is only used by swsusp instead
of zeroing the image after resume in order to prevent someone from
reading some confidential data from it in the future and it does not protect
the image from being read by an unauthorized person before resume.
The functionality it provides should really belong to the user space
and will possibly be reimplemented after the swap-handling functionality
of swsusp is moved to the user space.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

 kernel/power/swsusp.c |  159 +-------------------------------------------------
 1 files changed, 4 insertions(+), 155 deletions(-)

Index: linux-2.6.14-mm2/kernel/power/swsusp.c
===================================================================
--- linux-2.6.14-mm2.orig/kernel/power/swsusp.c	2005-11-11 11:25:45.000000000 +0100
+++ linux-2.6.14-mm2/kernel/power/swsusp.c	2005-11-12 12:03:53.000000000 +0100
@@ -30,9 +30,6 @@
  * Alex Badea <vampire@go.ro>:
  * Fixed runaway init
  *
- * Andreas Steinmetz <ast@domdv.de>:
- * Added encrypted suspend option
- *
  * More state savers are welcome. Especially for the scsi layer...
  *
  * For TODOs,FIXMEs also look in Documentation/power/swsusp.txt
@@ -81,10 +78,6 @@
 static int restore_highmem(void) { return 0; }
 #endif
 
-#define CIPHER "aes"
-#define MAXKEY 32
-#define MAXIV  32
-
 extern char resume_file[];
 
 /* Local variables that should not be affected by save */
@@ -102,8 +95,7 @@
 #define SWSUSP_SIG	"S1SUSPEND"
 
 static struct swsusp_header {
-	char reserved[PAGE_SIZE - 20 - MAXKEY - MAXIV - sizeof(swp_entry_t)];
-	u8 key_iv[MAXKEY+MAXIV];
+	char reserved[PAGE_SIZE - 20 - sizeof(swp_entry_t)];
 	swp_entry_t swsusp_info;
 	char	orig_sig[10];
 	char	sig[10];
@@ -123,131 +115,6 @@
 static unsigned short swapfile_used[MAX_SWAPFILES];
 static unsigned short root_swap;
 
-static int write_page(unsigned long addr, swp_entry_t *loc);
-static int bio_read_page(pgoff_t page_off, void *page);
-
-static u8 key_iv[MAXKEY+MAXIV];
-
-#ifdef CONFIG_SWSUSP_ENCRYPT
-
-static int crypto_init(int mode, void **mem)
-{
-	int error = 0;
-	int len;
-	char *modemsg;
-	struct crypto_tfm *tfm;
-
-	modemsg = mode ? "suspend not possible" : "resume not possible";
-
-	tfm = crypto_alloc_tfm(CIPHER, CRYPTO_TFM_MODE_CBC);
-	if(!tfm) {
-		printk(KERN_ERR "swsusp: no tfm, %s\n", modemsg);
-		error = -EINVAL;
-		goto out;
-	}
-
-	if(MAXKEY < crypto_tfm_alg_min_keysize(tfm)) {
-		printk(KERN_ERR "swsusp: key buffer too small, %s\n", modemsg);
-		error = -ENOKEY;
-		goto fail;
-	}
-
-	if (mode)
-		get_random_bytes(key_iv, MAXKEY+MAXIV);
-
-	len = crypto_tfm_alg_max_keysize(tfm);
-	if (len > MAXKEY)
-		len = MAXKEY;
-
-	if (crypto_cipher_setkey(tfm, key_iv, len)) {
-		printk(KERN_ERR "swsusp: key setup failure, %s\n", modemsg);
-		error = -EKEYREJECTED;
-		goto fail;
-	}
-
-	len = crypto_tfm_alg_ivsize(tfm);
-
-	if (MAXIV < len) {
-		printk(KERN_ERR "swsusp: iv buffer too small, %s\n", modemsg);
-		error = -EOVERFLOW;
-		goto fail;
-	}
-
-	crypto_cipher_set_iv(tfm, key_iv+MAXKEY, len);
-
-	*mem=(void *)tfm;
-
-	goto out;
-
-fail:	crypto_free_tfm(tfm);
-out:	return error;
-}
-
-static __inline__ void crypto_exit(void *mem)
-{
-	crypto_free_tfm((struct crypto_tfm *)mem);
-}
-
-static __inline__ int crypto_write(struct pbe *p, void *mem)
-{
-	int error = 0;
-	struct scatterlist src, dst;
-
-	src.page   = virt_to_page(p->address);
-	src.offset = 0;
-	src.length = PAGE_SIZE;
-	dst.page   = virt_to_page((void *)&swsusp_header);
-	dst.offset = 0;
-	dst.length = PAGE_SIZE;
-
-	error = crypto_cipher_encrypt((struct crypto_tfm *)mem, &dst, &src,
-					PAGE_SIZE);
-
-	if (!error)
-		error = write_page((unsigned long)&swsusp_header,
-				&(p->swap_address));
-	return error;
-}
-
-static __inline__ int crypto_read(struct pbe *p, void *mem)
-{
-	int error = 0;
-	struct scatterlist src, dst;
-
-	error = bio_read_page(swp_offset(p->swap_address), (void *)p->address);
-	if (!error) {
-		src.offset = 0;
-		src.length = PAGE_SIZE;
-		dst.offset = 0;
-		dst.length = PAGE_SIZE;
-		src.page = dst.page = virt_to_page((void *)p->address);
-
-		error = crypto_cipher_decrypt((struct crypto_tfm *)mem, &dst,
-						&src, PAGE_SIZE);
-	}
-	return error;
-}
-#else
-static __inline__ int crypto_init(int mode, void *mem)
-{
-	return 0;
-}
-
-static __inline__ void crypto_exit(void *mem)
-{
-}
-
-static __inline__ int crypto_write(struct pbe *p, void *mem)
-{
-	return write_page(p->address, &(p->swap_address));
-}
-
-static __inline__ int crypto_read(struct pbe *p, void *mem)
-{
-	return bio_read_page(swp_offset(p->swap_address), (void *)p->address);
-}
-#endif
-
 static int mark_swapfiles(swp_entry_t prev)
 {
 	int error;
@@ -259,7 +126,6 @@
 	    !memcmp("SWAPSPACE2",swsusp_header.sig, 10)) {
 		memcpy(swsusp_header.orig_sig,swsusp_header.sig, 10);
 		memcpy(swsusp_header.sig,SWSUSP_SIG, 10);
-		memcpy(swsusp_header.key_iv, key_iv, MAXKEY+MAXIV);
 		swsusp_header.swsusp_info = prev;
 		error = rw_swap_page_sync(WRITE,
 					  swp_entry(root_swap, 0),
@@ -405,10 +271,6 @@
 	int error = 0, i = 0;
 	unsigned int mod = nr_copy_pages / 100;
 	struct pbe *p;
-	void *tfm;
-
-	if ((error = crypto_init(1, &tfm)))
-		return error;
 
 	if (!mod)
 		mod = 1;
@@ -417,14 +279,11 @@
 	for_each_pbe (p, pagedir_nosave) {
 		if (!(i%mod))
 			printk( "\b\b\b\b%3d%%", i / mod );
-		if ((error = crypto_write(p, tfm))) {
-			crypto_exit(tfm);
+		if ((error = write_page(p->address, &p->swap_address)))
 			return error;
-		}
 		i++;
 	}
 	printk("\b\b\b\bdone\n");
-	crypto_exit(tfm);
 	return error;
 }
 
@@ -550,7 +409,6 @@
 	if ((error = close_swap()))
 		goto FreePagedir;
  Done:
-	memset(key_iv, 0, MAXKEY+MAXIV);
 	return error;
  FreePagedir:
 	free_pagedir_entries();
@@ -812,8 +670,6 @@
 		return error;
 	if (!memcmp(SWSUSP_SIG, swsusp_header.sig, 10)) {
 		memcpy(swsusp_header.sig, swsusp_header.orig_sig, 10);
-		memcpy(key_iv, swsusp_header.key_iv, MAXKEY+MAXIV);
-		memset(swsusp_header.key_iv, 0, MAXKEY+MAXIV);
 
 		/*
 		 * Reset swap signature now.
@@ -840,10 +696,6 @@
 	int error = 0;
 	int i = 0;
 	int mod = swsusp_info.image_pages / 100;
-	void *tfm;
-
-	if ((error = crypto_init(0, &tfm)))
-		return error;
 
 	if (!mod)
 		mod = 1;
@@ -855,15 +707,13 @@
 		if (!(i % mod))
 			printk("\b\b\b\b%3d%%", i / mod);
 
-		if ((error = crypto_read(p, tfm))) {
-			crypto_exit(tfm);
+		if ((error = bio_read_page(swp_offset(p->swap_address),
+						(void *)p->address)))
 			return error;
-		}
 
 		i++;
 	}
 	printk("\b\b\b\bdone\n");
-	crypto_exit(tfm);
 	return error;
 }
 
@@ -986,7 +836,6 @@
 
 	error = read_suspend_image();
 	blkdev_put(resume_bdev);
-	memset(key_iv, 0, MAXKEY+MAXIV);
 
 	if (!error)
 		pr_debug("swsusp: Reading resume file was successful\n");


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFT][PATCH 2/3] swsusp: introduce the swap map structure
  2005-11-12 20:13 [RFT][PATCH 0/3] swsusp: improve freeing of memory Rafael J. Wysocki
  2005-11-12 20:19 ` [RFT][PATCH 1/3] swsusp: remove encryption Rafael J. Wysocki
@ 2005-11-12 20:22 ` Rafael J. Wysocki
  2005-11-13 21:16   ` Pavel Machek
  2005-11-12 20:24 ` [RFT][PATCH 3/3] swsusp: improve freeing of memory Rafael J. Wysocki
  2 siblings, 1 reply; 9+ messages in thread
From: Rafael J. Wysocki @ 2005-11-12 20:22 UTC (permalink / raw)
  To: LKML; +Cc: Pavel Machek

This patch introduces the swap map structure that can be used by swsusp for
keeping tracks of data pages written to the swap.  The structure itself is
described in a comment within the patch.

The overall idea is to reduce the amount of metadata written to the swap
and to write and read the image pages sequentially, in a file-alike way.
This makes the swap-handling part of swsusp fairly independent of its
snapshot-handling part and will hopefully allow us to completely
separate these two parts in the future.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

 include/linux/suspend.h |    6 
 kernel/power/disk.c     |    8 
 kernel/power/power.h    |   13 -
 kernel/power/snapshot.c |   14 -
 kernel/power/swsusp.c   |  559 ++++++++++++++++++++++++++++++++++--------------
 5 files changed, 419 insertions(+), 181 deletions(-)

Index: linux-2.6.14-mm2/kernel/power/swsusp.c
===================================================================
--- linux-2.6.14-mm2.orig/kernel/power/swsusp.c	2005-11-12 12:03:53.000000000 +0100
+++ linux-2.6.14-mm2/kernel/power/swsusp.c	2005-11-12 12:07:47.000000000 +0100
@@ -30,6 +30,9 @@
  * Alex Badea <vampire@go.ro>:
  * Fixed runaway init
  *
+ * Rafael J. Wysocki <rjw@sisk.pl>
+ * Added the swap map data structure and reworked the handling of swap
+ *
  * More state savers are welcome. Especially for the scsi layer...
  *
  * For TODOs,FIXMEs also look in Documentation/power/swsusp.txt
@@ -80,18 +83,6 @@
 
 extern char resume_file[];
 
-/* Local variables that should not be affected by save */
-unsigned int nr_copy_pages __nosavedata = 0;
-
-/* Suspend pagedir is allocated before final copy, therefore it
-   must be freed after resume
-
-   Warning: this is even more evil than it seems. Pagedirs this file
-   talks about are completely different from page directories used by
-   MMU hardware.
- */
-suspend_pagedir_t *pagedir_nosave __nosavedata = NULL;
-
 #define SWSUSP_SIG	"S1SUSPEND"
 
 static struct swsusp_header {
@@ -242,48 +233,206 @@
 }
 
 /**
- *	data_free - Free the swap entries used by the saved image.
+ *	Swap map-handling functions
+ *
+ *	The swap map is a data structure used for keeping track of each page
+ *	written to the swap.  It consists of many swap_map_page structures
+ *	that contain each an array of MAP_PAGE_SIZE swap entries.
+ *	These structures are linked together with the help of either the
+ *	.next (in memory) or the .next_swap (in swap) member.
  *
- *	Walk the list of used swap entries and free each one.
- *	This is only used for cleanup when suspend fails.
+ *	The swap map is created during suspend.  At that time we need to keep
+ *	it in memory, because we have to free all of the allocated swap
+ *	entries if an error occurs.  The memory needed is preallocated
+ *	so that we know in advance if there's enough of it.
+ *
+ *	The first swap_map_page structure is filled with the swap entries that
+ *	correspond to the first MAP_PAGE_SIZE data pages written to swap and
+ *	so on.  After the all of the data pages have been written, the order
+ *	of the swap_map_page structures in the map is reversed so that they
+ *	can be read from swap in the original order.  This causes the data
+ *	pages to be loaded in exactly the same order in which they have been
+ *	saved.
+ *
+ *	During resume we only need to use one swap_map_page structure
+ *	at a time, which means that we only need to use two memory pages for
+ *	reading the image - one for reading the swap_map_page structures
+ *	and the second for reading the data pages from swap.
  */
-static void data_free(void)
+
+#define MAP_PAGE_SIZE	((PAGE_SIZE - sizeof(swp_entry_t) - sizeof(void *)) \
+			/ sizeof(swp_entry_t))
+
+struct swap_map_page {
+	swp_entry_t		entries[MAP_PAGE_SIZE];
+	swp_entry_t		next_swap;
+	struct swap_map_page	*next;
+};
+
+static inline void free_swap_map(struct swap_map_page *swap_map)
 {
-	swp_entry_t entry;
-	struct pbe *p;
+	struct swap_map_page *swp;
 
-	for_each_pbe (p, pagedir_nosave) {
-		entry = p->swap_address;
-		if (entry.val)
-			swap_free(entry);
-		else
-			break;
+	while (swap_map) {
+		swp = swap_map->next;
+		free_page((unsigned long)swap_map);
+		swap_map = swp;
+	}
+}
+
+static struct swap_map_page *alloc_swap_map(unsigned int nr_pages)
+{
+	struct swap_map_page *swap_map, *swp;
+	unsigned n = 0;
+
+	if (!nr_pages)
+		return NULL;
+
+	pr_debug("alloc_swap_map(): nr_pages = %d\n", nr_pages);
+	swap_map = (struct swap_map_page *)get_zeroed_page(GFP_ATOMIC);
+	swp = swap_map;
+	for (n = MAP_PAGE_SIZE; n < nr_pages; n += MAP_PAGE_SIZE) {
+		swp->next = (struct swap_map_page *)get_zeroed_page(GFP_ATOMIC);
+		swp = swp->next;
+		if (!swp) {
+			free_swap_map(swap_map);
+			return NULL;
+		}
 	}
+	return swap_map;
 }
 
 /**
- *	data_write - Write saved image to swap.
- *
- *	Walk the list of pages in the image and sync each one to swap.
+ *	reverse_swap_map - reverse the order of pages in the swap map
+ *	@swap_map
  */
-static int data_write(void)
+
+static inline struct swap_map_page *reverse_swap_map(struct swap_map_page *swap_map)
 {
-	int error = 0, i = 0;
-	unsigned int mod = nr_copy_pages / 100;
-	struct pbe *p;
+	struct swap_map_page *prev, *next;
 
-	if (!mod)
-		mod = 1;
+	prev = NULL;
+	while (swap_map) {
+		next = swap_map->next;
+		swap_map->next = prev;
+		prev = swap_map;
+		swap_map = next;
+	}
+	return prev;
+}
 
-	printk( "Writing data to swap (%d pages)...     ", nr_copy_pages );
-	for_each_pbe (p, pagedir_nosave) {
-		if (!(i%mod))
-			printk( "\b\b\b\b%3d%%", i / mod );
-		if ((error = write_page(p->address, &p->swap_address)))
+/**
+ *	free_swap_map_entries - free the swap entries allocated to store
+ *	the swap map @swap_map (this is only called in case of an error)
+ */
+static inline void free_swap_map_entries(struct swap_map_page *swap_map)
+{
+	while (swap_map) {
+		if (swap_map->next_swap.val)
+			swap_free(swap_map->next_swap);
+		swap_map = swap_map->next;
+	}
+}
+
+/**
+ *	save_swap_map - save the swap map used for tracing the data pages
+ *	stored in the swap
+ */
+
+static int save_swap_map(struct swap_map_page *swap_map, swp_entry_t *start)
+{
+	swp_entry_t entry = (swp_entry_t){0};
+	int error;
+
+	while (swap_map) {
+		swap_map->next_swap = entry;
+		if ((error = write_page((unsigned long)swap_map, &entry)))
 			return error;
-		i++;
+		swap_map = swap_map->next;
+	}
+	*start = entry;
+	return 0;
+}
+
+/**
+ *	free_image_entries - free the swap entries allocated to store
+ *	the image data pages (this is only called in case of an error)
+ */
+
+static inline void free_image_entries(struct swap_map_page *swp)
+{
+	unsigned k;
+
+	while (swp) {
+		for (k = 0; k < MAP_PAGE_SIZE; k++)
+			if (swp->entries[k].val)
+				swap_free(swp->entries[k]);
+		swp = swp->next;
+	}
+}
+
+/**
+ *	The swap_map_handle structure is used for handling the swap map in
+ *	a file-alike way
+ */
+
+struct swap_map_handle {
+	void			*tfm; /* Needed for the encryption */
+	struct swap_map_page	*cur;
+	unsigned int		k;
+};
+
+static inline void init_swap_map_handle(struct swap_map_handle *handle,
+                                        struct swap_map_page *map)
+{
+	handle->cur = map;
+	handle->k = 0;
+}
+
+static inline int swap_map_write_page(struct swap_map_handle *handle,
+                                      unsigned long addr)
+{
+	int error;
+
+	error = write_page(addr, handle->cur->entries + handle->k);
+	if (error)
+		return error;
+	if (++handle->k >= MAP_PAGE_SIZE) {
+		handle->cur = handle->cur->next;
+		handle->k = 0;
+	}
+	return 0;
+}
+
+/**
+ *	save_image_data - save the data pages pointed to by the PBEs
+ *	from the list @pblist using the swap map handle @handle
+ *	(assume there are @nr_pages data pages to save)
+ */
+
+static int save_image_data(struct pbe *pblist,
+                           struct swap_map_handle *handle,
+                           unsigned int nr_pages)
+{
+	unsigned int m;
+	struct pbe *p;
+	int error = 0;
+
+	printk("Saving image data pages (%u pages) ...     ", nr_pages);
+	m = nr_pages / 100;
+	if (!m)
+		m = 1;
+	nr_pages = 0;
+	for_each_pbe (p, pblist) {
+		error = swap_map_write_page(handle, p->address);
+		if (error)
+			break;
+		if (!(nr_pages % m))
+			printk("\b\b\b\b%3d%%", nr_pages / m);
+		nr_pages++;
 	}
-	printk("\b\b\b\bdone\n");
+	if (!error)
+		printk("\b\b\b\bdone\n");
 	return error;
 }
 
@@ -299,19 +448,20 @@
 	pr_debug(" swsusp: UTS Domain: %s\n",swsusp_info.uts.domainname);
 	pr_debug(" swsusp: CPUs: %d\n",swsusp_info.cpus);
 	pr_debug(" swsusp: Image: %ld Pages\n",swsusp_info.image_pages);
-	pr_debug(" swsusp: Pagedir: %ld Pages\n",swsusp_info.pagedir_pages);
+	pr_debug(" swsusp: Total: %ld Pages\n", swsusp_info.pages);
 }
 
-static void init_header(void)
+static void init_header(unsigned int nr_pages)
 {
 	memset(&swsusp_info, 0, sizeof(swsusp_info));
 	swsusp_info.version_code = LINUX_VERSION_CODE;
 	swsusp_info.num_physpages = num_physpages;
 	memcpy(&swsusp_info.uts, &system_utsname, sizeof(system_utsname));
 
-	swsusp_info.suspend_pagedir = pagedir_nosave;
 	swsusp_info.cpus = num_online_cpus();
-	swsusp_info.image_pages = nr_copy_pages;
+	swsusp_info.image_pages = nr_pages;
+	swsusp_info.pages = nr_pages +
+		((nr_pages * sizeof(long) + PAGE_SIZE - 1) >> PAGE_SHIFT);
 }
 
 static int close_swap(void)
@@ -330,39 +480,53 @@
 }
 
 /**
- *	free_pagedir_entries - Free pages used by the page directory.
- *
- *	This is used during suspend for error recovery.
+ *	pack_orig_addresses - the .orig_address fields of the PBEs from the
+ *	list starting at @pbe are stored in the array @buf[] (1 page)
  */
 
-static void free_pagedir_entries(void)
+static inline struct pbe *pack_orig_addresses(unsigned long *buf,
+                                              struct pbe *pbe)
 {
-	int i;
+	int j;
 
-	for (i = 0; i < swsusp_info.pagedir_pages; i++)
-		swap_free(swsusp_info.pagedir[i]);
+	for (j = 0; j < PAGE_SIZE / sizeof(long) && pbe; j++) {
+		buf[j] = pbe->orig_address;
+		pbe = pbe->next;
+	}
+	if (!pbe)
+		for (; j < PAGE_SIZE / sizeof(long); j++)
+			buf[j] = 0;
+	return pbe;
 }
 
-
 /**
- *	write_pagedir - Write the array of pages holding the page directory.
- *	@last:	Last swap entry we write (needed for header).
+ *	save_image_metadata - save the .orig_address fields of the PBEs
+ *	from the list @pblist using the swap map handle @handle
  */
 
-static int write_pagedir(void)
+static int save_image_metadata(struct pbe *pblist,
+                               struct swap_map_handle *handle)
 {
-	int error = 0;
+	unsigned long *buf;
 	unsigned int n = 0;
-	struct pbe *pbe;
+	struct pbe *p;
+	int error = 0;
 
-	printk( "Writing pagedir...");
-	for_each_pb_page (pbe, pagedir_nosave) {
-		if ((error = write_page((unsigned long)pbe, &swsusp_info.pagedir[n++])))
-			return error;
+	printk("Saving image metadata ... ");
+	buf = (unsigned long *)get_zeroed_page(GFP_ATOMIC);
+	if (!buf)
+		return -ENOMEM;
+	p = pblist;
+	while (p) {
+		p = pack_orig_addresses(buf, p);
+		error = swap_map_write_page(handle, (unsigned long)buf);
+		if (error)
+			break;
+		n++;
 	}
-
-	swsusp_info.pagedir_pages = n;
-	printk("done (%u pages)\n", n);
+	free_page((unsigned long)buf);
+	if (!error)
+		printk("done (%u pages saved)\n", n);
 	return error;
 }
 
@@ -388,33 +552,48 @@
 
 /**
  *	write_suspend_image - Write entire image and metadata.
- *
  */
-static int write_suspend_image(void)
+static int write_suspend_image(struct pbe *pblist, unsigned int nr_pages)
 {
+	struct swap_map_page *swap_map;
+	struct swap_map_handle handle;
 	int error;
 
-	if (!enough_swap(nr_copy_pages)) {
+	if (!enough_swap(nr_pages)) {
 		printk(KERN_ERR "swsusp: Not enough free swap\n");
 		return -ENOSPC;
 	}
 
-	init_header();
-	if ((error = data_write()))
-		goto FreeData;
+	init_header(nr_pages);
+	swap_map = alloc_swap_map(swsusp_info.pages);
+	if (!swap_map)
+		return -ENOMEM;
+	init_swap_map_handle(&handle, swap_map);
 
-	if ((error = write_pagedir()))
-		goto FreePagedir;
+	error = save_image_metadata(pblist, &handle);
+	if (!error)
+		error = save_image_data(pblist, &handle, nr_pages);
+	if (error)
+		goto Free_image_entries;
+
+	swap_map = reverse_swap_map(swap_map);
+	error = save_swap_map(swap_map, &swsusp_info.start);
+	if (error)
+		goto Free_map_entries;
+
+	error = close_swap();
+	if (error)
+		goto Free_map_entries;
 
-	if ((error = close_swap()))
-		goto FreePagedir;
- Done:
+Free_swap_map:
+	free_swap_map(swap_map);
 	return error;
- FreePagedir:
-	free_pagedir_entries();
- FreeData:
-	data_free();
-	goto Done;
+
+Free_map_entries:
+	free_swap_map_entries(swap_map);
+Free_image_entries:
+	free_image_entries(swap_map);
+	goto Free_swap_map;
 }
 
 /* It is important _NOT_ to umount filesystems at this point. We want
@@ -422,7 +601,7 @@
  * filesystem clean: it is not. (And it does not matter, if we resume
  * correctly, we'll mark system clean, anyway.)
  */
-int swsusp_write(void)
+int swsusp_write(struct pbe *pblist, unsigned int nr_pages)
 {
 	int error;
 
@@ -431,14 +610,12 @@
 		return error;
 	}
 	lock_swapdevices();
-	error = write_suspend_image();
+	error = write_suspend_image(pblist, nr_pages);
 	/* This will unlock ignored swap devices since writing is finished */
 	lock_swapdevices();
 	return error;
 }
 
-
-
 int swsusp_suspend(void)
 {
 	int error;
@@ -535,7 +712,6 @@
 	/* We assume both lists contain the same number of elements */
 	while (src) {
 		dst->orig_address = src->orig_address;
-		dst->swap_address = src->swap_address;
 		dst = dst->next;
 		src = src->next;
 	}
@@ -615,6 +791,61 @@
 	return submit(WRITE, page_off, page);
 }
 
+/**
+ *	The following functions allow us to read data using a swap map
+ *	in a file-alike way
+ */
+
+static inline void release_swap_map_reader(struct swap_map_handle *handle)
+{
+	if (handle->cur)
+		free_page((unsigned long)handle->cur);
+	handle->cur = NULL;
+}
+
+static inline int get_swap_map_reader(struct swap_map_handle *handle,
+                                      swp_entry_t start)
+{
+	int error;
+
+	if (!swp_offset(start))
+		return -EINVAL;
+	handle->cur = (struct swap_map_page *)get_zeroed_page(GFP_ATOMIC);
+	if (!handle->cur)
+		return -ENOMEM;
+	error = bio_read_page(swp_offset(start), handle->cur);
+	if (error) {
+		release_swap_map_reader(handle);
+		return error;
+	}
+	handle->k = 0;
+	return 0;
+}
+
+static inline int swap_map_read_page(struct swap_map_handle *handle, void *buf)
+{
+	unsigned long offset;
+	int error;
+
+	if (!handle->cur)
+		return -EINVAL;
+	offset = swp_offset(handle->cur->entries[handle->k]);
+	if (!offset)
+		return -EINVAL;
+	error = bio_read_page(offset, buf);
+	if (error)
+		return error;
+	if (++handle->k >= MAP_PAGE_SIZE) {
+		handle->k = 0;
+		offset = swp_offset(handle->cur->next_swap);
+		if (!offset)
+			release_swap_map_reader(handle);
+		else
+			error = bio_read_page(offset, handle->cur);
+	}
+	return error;
+}
+
 /*
  * Sanity check if this image makes sense with this kernel/swap context
  * I really don't think that it's foolproof but more than nothing..
@@ -643,7 +874,6 @@
 	return NULL;
 }
 
-
 static int check_header(void)
 {
 	const char *reason = NULL;
@@ -657,7 +887,6 @@
 		printk(KERN_ERR "swsusp: Resume mismatch: %s\n",reason);
 		return -EPERM;
 	}
-	nr_copy_pages = swsusp_info.image_pages;
 	return error;
 }
 
@@ -684,75 +913,88 @@
 }
 
 /**
- *	data_read - Read image pages from swap.
- *
- *	You do not need to check for overlaps, check_pagedir()
- *	already did that.
+ *	load_image_data - load the image data using the swap map handle
+ *	@handle and store them using the page backup list @pblist
+ *	(assume there are @nr_pages pages to load)
  */
 
-static int data_read(struct pbe *pblist)
+static int load_image_data(struct pbe *pblist,
+                           struct swap_map_handle *handle,
+                           unsigned int nr_pages)
 {
+	int error;
+	unsigned int m;
 	struct pbe *p;
-	int error = 0;
-	int i = 0;
-	int mod = swsusp_info.image_pages / 100;
-
-	if (!mod)
-		mod = 1;
-
-	printk("swsusp: Reading image data (%lu pages):     ",
-			swsusp_info.image_pages);
-
-	for_each_pbe (p, pblist) {
-		if (!(i % mod))
-			printk("\b\b\b\b%3d%%", i / mod);
 
-		if ((error = bio_read_page(swp_offset(p->swap_address),
-						(void *)p->address)))
-			return error;
-
-		i++;
+	if (!pblist)
+		return -EINVAL;
+	printk("Loading image data pages (%u pages) ...     ", nr_pages);
+	m = nr_pages / 100;
+	if (!m)
+		m = 1;
+	nr_pages = 0;
+	p = pblist;
+	while (p) {
+		error = swap_map_read_page(handle, (void *)p->address);
+		if (error)
+			break;
+		p = p->next;
+		if (!(nr_pages % m))
+			printk("\b\b\b\b%3d%%", nr_pages / m);
+		nr_pages++;
 	}
-	printk("\b\b\b\bdone\n");
+	if (!error)
+		printk("\b\b\b\bdone\n");
 	return error;
 }
 
 /**
- *	read_pagedir - Read page backup list pages from swap
+ *	unpack_orig_addresses - copy the elements of @buf[] (1 page) to
+ *	the PBEs in the list starting at @pbe
  */
 
-static int read_pagedir(struct pbe *pblist)
+static inline struct pbe *unpack_orig_addresses(unsigned long *buf,
+                                                struct pbe *pbe)
 {
-	struct pbe *pbpage, *p;
-	unsigned int i = 0;
-	int error;
+	int j;
 
-	if (!pblist)
-		return -EFAULT;
+	for (j = 0; j < PAGE_SIZE / sizeof(long) && pbe; j++) {
+		pbe->orig_address = buf[j];
+		pbe = pbe->next;
+	}
+	return pbe;
+}
 
-	printk("swsusp: Reading pagedir (%lu pages)\n",
-			swsusp_info.pagedir_pages);
+/**
+ *	load_image_metadata - load the image metadata using the swap map
+ *	handle @handle and put them into the PBEs in the list @pblist
+ */
 
-	for_each_pb_page (pbpage, pblist) {
-		unsigned long offset = swp_offset(swsusp_info.pagedir[i++]);
+static int load_image_metadata(struct pbe *pblist, struct swap_map_handle *handle)
+{
+	struct pbe *p;
+	unsigned long *buf;
+	unsigned int n = 0;
+	int error = 0;
 
-		error = -EFAULT;
-		if (offset) {
-			p = (pbpage + PB_PAGE_SKIP)->next;
-			error = bio_read_page(offset, (void *)pbpage);
-			(pbpage + PB_PAGE_SKIP)->next = p;
-		}
+	printk("Loading image metadata ... ");
+	buf = (unsigned long *)get_zeroed_page(GFP_ATOMIC);
+	if (!buf)
+		return -ENOMEM;
+	p = pblist;
+	while (p) {
+		error = swap_map_read_page(handle, buf);
 		if (error)
 			break;
+		p = unpack_orig_addresses(buf, p);
+		n++;
 	}
-
+	free_page((unsigned long)buf);
 	if (!error)
-		BUG_ON(i != swsusp_info.pagedir_pages);
-
+		printk("done (%u pages loaded)\n", n);
 	return error;
 }
 
-
 static int check_suspend_image(void)
 {
 	int error = 0;
@@ -766,34 +1008,39 @@
 	return 0;
 }
 
-static int read_suspend_image(void)
+static int read_suspend_image(struct pbe **pblist_ptr)
 {
 	int error = 0;
-	struct pbe *p;
+	struct pbe *p, *pblist;
+	struct swap_map_handle handle;
+	unsigned int nr_pages = swsusp_info.image_pages;
 
-	if (!(p = alloc_pagedir(nr_copy_pages, GFP_ATOMIC, 0)))
+	p = alloc_pagedir(nr_pages, GFP_ATOMIC, 0);
+	if (!p)
 		return -ENOMEM;
-
-	if ((error = read_pagedir(p)))
+	error = get_swap_map_reader(&handle, swsusp_info.start);
+	if (error)
+		/* The PBE list at p will be released by swsusp_free() */
 		return error;
-	create_pbe_list(p, nr_copy_pages);
-	mark_unsafe_pages(p);
-	pagedir_nosave = alloc_pagedir(nr_copy_pages, GFP_ATOMIC, 1);
-	if (pagedir_nosave) {
-		create_pbe_list(pagedir_nosave, nr_copy_pages);
-		copy_page_backup_list(pagedir_nosave, p);
-	}
-	free_pagedir(p);
-	if (!pagedir_nosave)
-		return -ENOMEM;
-
-	/* Allocate memory for the image and read the data from swap */
-
-	error = alloc_data_pages(pagedir_nosave, GFP_ATOMIC, 1);
-
-	if (!error)
-		error = data_read(pagedir_nosave);
+	error = load_image_metadata(p, &handle);
+	if (!error) {
+		mark_unsafe_pages(p);
+		pblist = alloc_pagedir(nr_pages, GFP_ATOMIC, 1);
+		if (pblist)
+			copy_page_backup_list(pblist, p);
+		free_pagedir(p);
+		if (!pblist)
+			error = -ENOMEM;
 
+		/* Allocate memory for the image and read the data from swap */
+		if (!error)
+			error = alloc_data_pages(pblist, GFP_ATOMIC, 1);
+		if (!error)
+			error = load_image_data(pblist, &handle, nr_pages);
+		if (!error)
+			*pblist_ptr = pblist;
+	}
+	release_swap_map_reader(&handle);
 	return error;
 }
 
@@ -825,7 +1072,7 @@
  *	swsusp_read - Read saved image from swap.
  */
 
-int swsusp_read(void)
+int swsusp_read(struct pbe **pblist_ptr)
 {
 	int error;
 
@@ -834,7 +1081,7 @@
 		return PTR_ERR(resume_bdev);
 	}
 
-	error = read_suspend_image();
+	error = read_suspend_image(pblist_ptr);
 	blkdev_put(resume_bdev);
 
 	if (!error)
Index: linux-2.6.14-mm2/kernel/power/power.h
===================================================================
--- linux-2.6.14-mm2.orig/kernel/power/power.h	2005-11-12 11:18:38.000000000 +0100
+++ linux-2.6.14-mm2/kernel/power/power.h	2005-11-12 12:07:47.000000000 +0100
@@ -9,19 +9,14 @@
 #define SUSPEND_CONSOLE	(MAX_NR_CONSOLES-1)
 #endif
 
-#define MAX_PBES	((PAGE_SIZE - sizeof(struct new_utsname) \
-			- 4 - 3*sizeof(unsigned long) - sizeof(int) \
-			- sizeof(void *)) / sizeof(swp_entry_t))
-
 struct swsusp_info {
 	struct new_utsname	uts;
 	u32			version_code;
 	unsigned long		num_physpages;
 	int			cpus;
 	unsigned long		image_pages;
-	unsigned long		pagedir_pages;
-	suspend_pagedir_t	* suspend_pagedir;
-	swp_entry_t		pagedir[MAX_PBES];
+	unsigned long		pages;
+	swp_entry_t		start;
 } __attribute__((aligned(PAGE_SIZE)));
 
 
@@ -67,6 +62,8 @@
 
 extern void free_pagedir(struct pbe *pblist);
 extern struct pbe *alloc_pagedir(unsigned nr_pages, gfp_t gfp_mask, int safe_needed);
-extern void create_pbe_list(struct pbe *pblist, unsigned nr_pages);
 extern void swsusp_free(void);
 extern int alloc_data_pages(struct pbe *pblist, gfp_t gfp_mask, int safe_needed);
+extern unsigned int snapshot_nr_pages(void);
+extern struct pbe *snapshot_pblist(void);
+extern void snapshot_pblist_set(struct pbe *pblist);
Index: linux-2.6.14-mm2/kernel/power/snapshot.c
===================================================================
--- linux-2.6.14-mm2.orig/kernel/power/snapshot.c	2005-11-12 11:18:38.000000000 +0100
+++ linux-2.6.14-mm2/kernel/power/snapshot.c	2005-11-12 12:07:47.000000000 +0100
@@ -33,6 +33,9 @@
 
 #include "power.h"
 
+struct pbe *pagedir_nosave = NULL;
+unsigned int nr_copy_pages = 0;
+
 #ifdef CONFIG_HIGHMEM
 struct highmem_page {
 	char *data;
@@ -244,7 +247,7 @@
  *	of memory pages allocated with alloc_pagedir()
  */
 
-void create_pbe_list(struct pbe *pblist, unsigned int nr_pages)
+static inline void create_pbe_list(struct pbe *pblist, unsigned int nr_pages)
 {
 	struct pbe *pbpage, *p;
 	unsigned int num = PBES_PER_PAGE;
@@ -261,7 +264,6 @@
 			p->next = p + 1;
 		p->next = NULL;
 	}
-	pr_debug("create_pbe_list(): initialized %d PBEs\n", num);
 }
 
 /**
@@ -332,7 +334,8 @@
 	if (!pbe) { /* get_zeroed_page() failed */
 		free_pagedir(pblist);
 		pblist = NULL;
-        }
+        } else
+        	create_pbe_list(pblist, nr_pages);
 	return pblist;
 }
 
@@ -395,7 +398,6 @@
 		printk(KERN_ERR "suspend: Allocating pagedir failed.\n");
 		return NULL;
 	}
-	create_pbe_list(pblist, nr_pages);
 
 	if (alloc_data_pages(pblist, GFP_ATOMIC | __GFP_COLD, 0)) {
 		printk(KERN_ERR "suspend: Allocating image pages failed.\n");
@@ -421,10 +423,6 @@
 		 (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
 		 PAGES_FOR_IO, nr_free_pages());
 
-	/* This is needed because of the fixed size of swsusp_info */
-	if (MAX_PBES < (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE)
-		return -ENOSPC;
-
 	if (!enough_free_mem(nr_pages)) {
 		printk(KERN_ERR "swsusp: Not enough free memory\n");
 		return -ENOMEM;
Index: linux-2.6.14-mm2/include/linux/suspend.h
===================================================================
--- linux-2.6.14-mm2.orig/include/linux/suspend.h	2005-11-12 11:18:38.000000000 +0100
+++ linux-2.6.14-mm2/include/linux/suspend.h	2005-11-12 12:07:47.000000000 +0100
@@ -14,11 +14,7 @@
 typedef struct pbe {
 	unsigned long address;		/* address of the copy */
 	unsigned long orig_address;	/* original address of page */
-	swp_entry_t swap_address;	
-
-	struct pbe *next;	/* also used as scratch space at
-				 * end of page (see link, diskpage)
-				 */
+	struct pbe *next;
 } suspend_pagedir_t;
 
 #define for_each_pbe(pbe, pblist) \
Index: linux-2.6.14-mm2/kernel/power/disk.c
===================================================================
--- linux-2.6.14-mm2.orig/kernel/power/disk.c	2005-11-12 11:18:38.000000000 +0100
+++ linux-2.6.14-mm2/kernel/power/disk.c	2005-11-12 12:07:47.000000000 +0100
@@ -25,9 +25,9 @@
 extern suspend_disk_method_t pm_disk_mode;
 
 extern int swsusp_suspend(void);
-extern int swsusp_write(void);
+extern int swsusp_write(struct pbe *pblist, unsigned int nr_pages);
 extern int swsusp_check(void);
-extern int swsusp_read(void);
+extern int swsusp_read(struct pbe **pblist_ptr);
 extern void swsusp_close(void);
 extern int swsusp_resume(void);
 
@@ -176,7 +176,7 @@
 	if (in_suspend) {
 		device_resume();
 		pr_debug("PM: writing image.\n");
-		error = swsusp_write();
+		error = swsusp_write(pagedir_nosave, nr_copy_pages);
 		if (!error)
 			power_down(pm_disk_mode);
 		else {
@@ -247,7 +247,7 @@
 
 	pr_debug("PM: Reading swsusp image.\n");
 
-	if ((error = swsusp_read())) {
+	if ((error = swsusp_read(&pagedir_nosave))) {
 		swsusp_free();
 		goto Thaw;
 	}


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFT][PATCH 3/3] swsusp: improve freeing of memory
  2005-11-12 20:13 [RFT][PATCH 0/3] swsusp: improve freeing of memory Rafael J. Wysocki
  2005-11-12 20:19 ` [RFT][PATCH 1/3] swsusp: remove encryption Rafael J. Wysocki
  2005-11-12 20:22 ` [RFT][PATCH 2/3] swsusp: introduce the swap map structure Rafael J. Wysocki
@ 2005-11-12 20:24 ` Rafael J. Wysocki
  2005-11-13 21:14   ` Pavel Machek
  2 siblings, 1 reply; 9+ messages in thread
From: Rafael J. Wysocki @ 2005-11-12 20:24 UTC (permalink / raw)
  To: LKML; +Cc: Pavel Machek

This patch makes swsusp free only as much memory as needed and not as much
as possible.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

 include/linux/suspend.h |    2 -
 kernel/power/disk.c     |   30 ++--------------------
 kernel/power/power.h    |   20 ++++++++++++--
 kernel/power/snapshot.c |   65 ++++++++++++++++++++++++++++++++++++++++++++----
 kernel/power/swsusp.c   |   45 ++++++++++++++++++++++++++++++++-
 5 files changed, 125 insertions(+), 37 deletions(-)

Index: linux-2.6.14-mm2/kernel/power/disk.c
===================================================================
--- linux-2.6.14-mm2.orig/kernel/power/disk.c	2005-11-12 12:07:47.000000000 +0100
+++ linux-2.6.14-mm2/kernel/power/disk.c	2005-11-12 12:10:54.000000000 +0100
@@ -24,6 +24,7 @@
 
 extern suspend_disk_method_t pm_disk_mode;
 
+extern int swsusp_shrink_memory(void);
 extern int swsusp_suspend(void);
 extern int swsusp_write(struct pbe *pblist, unsigned int nr_pages);
 extern int swsusp_check(void);
@@ -73,31 +74,6 @@
 static int in_suspend __nosavedata = 0;
 
 
-/**
- *	free_some_memory -  Try to free as much memory as possible
- *
- *	... but do not OOM-kill anyone
- *
- *	Notice: all userland should be stopped at this point, or
- *	livelock is possible.
- */
-
-static void free_some_memory(void)
-{
-	unsigned int i = 0;
-	unsigned int tmp;
-	unsigned long pages = 0;
-	char *p = "-\\|/";
-
-	printk("Freeing memory...  ");
-	while ((tmp = shrink_all_memory(10000))) {
-		pages += tmp;
-		printk("\b%c", p[i++ % 4]);
-	}
-	printk("\bdone (%li pages freed)\n", pages);
-}
-
-
 static inline void platform_finish(void)
 {
 	if (pm_disk_mode == PM_DISK_PLATFORM) {
@@ -127,8 +103,8 @@
 	}
 
 	/* Free memory before shutting down devices. */
-	free_some_memory();
-	return 0;
+	if (!(error = swsusp_shrink_memory()))
+		return 0;
 thaw:
 	thaw_processes();
 	enable_nonboot_cpus();
Index: linux-2.6.14-mm2/kernel/power/power.h
===================================================================
--- linux-2.6.14-mm2.orig/kernel/power/power.h	2005-11-12 12:07:47.000000000 +0100
+++ linux-2.6.14-mm2/kernel/power/power.h	2005-11-12 12:10:54.000000000 +0100
@@ -49,18 +49,32 @@
 extern int pm_prepare_console(void);
 extern void pm_restore_console(void);
 
-
 /* References to section boundaries */
 extern const void __nosave_begin, __nosave_end;
 
 extern unsigned int nr_copy_pages;
-extern suspend_pagedir_t *pagedir_nosave;
-extern suspend_pagedir_t *pagedir_save;
+extern struct pbe *pagedir_nosave;
+
+/*
+ * This compilation switch determines the way in which memory will be freed
+ * during suspend.  If defined, only as much memory will be freed as needed
+ * to complete the suspend.  Otherwise, the largest possible amount of memory
+ * will be freed.
+ */
+#define OPPORTUNISTIC_SHRINKING		1
+
+/*
+ * During suspend, on each attempt to free some more memory SHRINK_BITE
+ * is used as the number of pages to free
+ */
+#define SHRINK_BITE	10000
 
 extern asmlinkage int swsusp_arch_suspend(void);
 extern asmlinkage int swsusp_arch_resume(void);
 
+extern unsigned int count_data_pages(void);
 extern void free_pagedir(struct pbe *pblist);
+extern void release_eaten_pages(void);
 extern struct pbe *alloc_pagedir(unsigned nr_pages, gfp_t gfp_mask, int safe_needed);
 extern void swsusp_free(void);
 extern int alloc_data_pages(struct pbe *pblist, gfp_t gfp_mask, int safe_needed);
Index: linux-2.6.14-mm2/kernel/power/snapshot.c
===================================================================
--- linux-2.6.14-mm2.orig/kernel/power/snapshot.c	2005-11-12 12:07:47.000000000 +0100
+++ linux-2.6.14-mm2/kernel/power/snapshot.c	2005-11-12 12:10:54.000000000 +0100
@@ -37,6 +37,31 @@
 unsigned int nr_copy_pages = 0;
 
 #ifdef CONFIG_HIGHMEM
+unsigned int count_highmem_pages(void)
+{
+	struct zone *zone;
+	unsigned long zone_pfn;
+	unsigned int n = 0;
+
+	for_each_zone (zone)
+		if (is_highmem(zone)) {
+			mark_free_pages(zone);
+			for (zone_pfn = 0; zone_pfn < zone->spanned_pages; zone_pfn++) {
+				struct page *page;
+				unsigned long pfn = zone_pfn + zone->zone_start_pfn;
+				if (!pfn_valid(pfn))
+					continue;
+				page = pfn_to_page(pfn);
+				if (PageReserved(page))
+					continue;
+				if (PageNosaveFree(page))
+					continue;
+				n++;
+			}
+		}
+	return n;
+}
+
 struct highmem_page {
 	char *data;
 	struct page *page;
@@ -152,17 +177,15 @@
 	BUG_ON(PageReserved(page) && PageNosave(page));
 	if (PageNosave(page))
 		return 0;
-	if (PageReserved(page) && pfn_is_nosave(pfn)) {
-		pr_debug("[nosave pfn 0x%lx]", pfn);
+	if (PageReserved(page) && pfn_is_nosave(pfn))
 		return 0;
-	}
 	if (PageNosaveFree(page))
 		return 0;
 
 	return 1;
 }
 
-static unsigned count_data_pages(void)
+unsigned int count_data_pages(void)
 {
 	struct zone *zone;
 	unsigned long zone_pfn;
@@ -267,6 +290,35 @@
 }
 
 /**
+ *	On resume it is necessary to trace and eventually free the unsafe
+ *	pages that have been allocated, because they are needed for I/O
+ *	(on x86-64 we likely will "eat" these pages once again while
+ *	creating the temporary page translation tables)
+ */
+
+struct eaten_page {
+	struct eaten_page	*next;
+	char			padding[PAGE_SIZE - sizeof(void *)];
+};
+
+static struct eaten_page *eaten_pages = NULL;
+
+void release_eaten_pages(void)
+{
+	struct eaten_page *p, *q;
+
+	p = eaten_pages;
+	while (p) {
+		q = p->next;
+		/* We don't want swsusp_free() to free this page again */
+		ClearPageNosave(virt_to_page(p));
+		free_page((unsigned long)p);
+		p = q;
+	}
+	eaten_pages = NULL;
+}
+
+/**
  *	@safe_needed - on resume, for storing the PBE list and the image,
  *	we can only use memory pages that do not conflict with the pages
  *	which had been used before suspend.
@@ -284,9 +336,12 @@
 	if (safe_needed)
 		do {
 			res = (void *)get_zeroed_page(gfp_mask);
-			if (res && PageNosaveFree(virt_to_page(res)))
+			if (res && PageNosaveFree(virt_to_page(res))) {
 				/* This is for swsusp_free() */
 				SetPageNosave(virt_to_page(res));
+				((struct eaten_page *)res)->next = eaten_pages;
+				eaten_pages = res;
+			}
 		} while (res && PageNosaveFree(virt_to_page(res)));
 	else
 		res = (void *)get_zeroed_page(gfp_mask);
Index: linux-2.6.14-mm2/kernel/power/swsusp.c
===================================================================
--- linux-2.6.14-mm2.orig/kernel/power/swsusp.c	2005-11-12 12:07:47.000000000 +0100
+++ linux-2.6.14-mm2/kernel/power/swsusp.c	2005-11-12 12:10:54.000000000 +0100
@@ -74,11 +74,13 @@
 #include "power.h"
 
 #ifdef CONFIG_HIGHMEM
+unsigned int count_highmem_pages(void);
 int save_highmem(void);
 int restore_highmem(void);
 #else
 static int save_highmem(void) { return 0; }
 static int restore_highmem(void) { return 0; }
+static unsigned int count_highmem_pages(void) { return 0; }
 #endif
 
 extern char resume_file[];
@@ -616,6 +618,45 @@
 	return error;
 }
 
+/**
+ *	swsusp_shrink_memory -  Try to free as much memory as needed
+ *
+ *	... but do not OOM-kill anyone
+ *
+ *	Notice: all userland should be stopped before it is called, or
+ *	livelock is possible.
+ */
+
+int swsusp_shrink_memory(void)
+{
+	unsigned long cnt = SHRINK_BITE;
+	long tmp;
+	unsigned long pages = 0;
+	unsigned int i = 0;
+	char *p = "-\\|/";
+
+	printk("Shrinking memory...  ");
+	do {
+#ifdef OPPORTUNISTIC_SHRINKING
+		cnt = count_data_pages() + count_highmem_pages();
+		cnt += (cnt + PBES_PER_PAGE - 1) / PBES_PER_PAGE +
+			PAGES_FOR_IO;
+		tmp = cnt - nr_free_pages();
+		if (tmp > 0) {
+			cnt = shrink_all_memory(SHRINK_BITE);
+			pages += cnt;
+		}
+#else
+		tmp = shrink_all_memory(SHRINK_BITE);
+		pages += tmp;
+#endif
+		printk("\b%c", p[i++%4]);
+	} while (tmp > 0 && cnt > 0);
+	printk("\bdone (%lu pages freed)\n", pages);
+
+	return tmp > 0 ? -ENOMEM : 0;
+}
+
 int swsusp_suspend(void)
 {
 	int error;
@@ -1035,8 +1076,10 @@
 		/* Allocate memory for the image and read the data from swap */
 		if (!error)
 			error = alloc_data_pages(pblist, GFP_ATOMIC, 1);
-		if (!error)
+		if (!error) {
+			release_eaten_pages();
 			error = load_image_data(pblist, &handle, nr_pages);
+		}
 		if (!error)
 			*pblist_ptr = pblist;
 	}
Index: linux-2.6.14-mm2/include/linux/suspend.h
===================================================================
--- linux-2.6.14-mm2.orig/include/linux/suspend.h	2005-11-12 12:07:47.000000000 +0100
+++ linux-2.6.14-mm2/include/linux/suspend.h	2005-11-12 12:10:54.000000000 +0100
@@ -73,6 +73,6 @@
  * XXX: We try to keep some more pages free so that I/O operations succeed
  * without paging. Might this be more?
  */
-#define PAGES_FOR_IO	512
+#define PAGES_FOR_IO	1024
 
 #endif /* _LINUX_SWSUSP_H */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFT][PATCH 1/3] swsusp: remove encryption
  2005-11-12 20:19 ` [RFT][PATCH 1/3] swsusp: remove encryption Rafael J. Wysocki
@ 2005-11-12 23:42   ` Pavel Machek
  0 siblings, 0 replies; 9+ messages in thread
From: Pavel Machek @ 2005-11-12 23:42 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML

On So 12-11-05 21:19:46, Rafael J. Wysocki wrote:
> This patch removes the image encryption that is only used by swsusp instead
> of zeroing the image after resume in order to prevent someone from
> reading some confidential data from it in the future and it does not protect
> the image from being read by an unauthorized person before resume.
> The functionality it provides should really belong to the user space
> and will possibly be reimplemented after the swap-handling functionality
> of swsusp is moved to the user space.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

ACK.
								Pavel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFT][PATCH 3/3] swsusp: improve freeing of memory
  2005-11-12 20:24 ` [RFT][PATCH 3/3] swsusp: improve freeing of memory Rafael J. Wysocki
@ 2005-11-13 21:14   ` Pavel Machek
  2005-11-13 22:27     ` Rafael J. Wysocki
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Machek @ 2005-11-13 21:14 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: kernel list

Hi!

> This patch makes swsusp free only as much memory as needed and not as much
> as possible.

Looks okay to me. ACK, modulo few small things.

> -
>  /* References to section boundaries */
>  extern const void __nosave_begin, __nosave_end;
>  
>  extern unsigned int nr_copy_pages;
> -extern suspend_pagedir_t *pagedir_nosave;
> -extern suspend_pagedir_t *pagedir_save;
> +extern struct pbe *pagedir_nosave;
> +
> +/*
> + * This compilation switch determines the way in which memory will be freed
> + * during suspend.  If defined, only as much memory will be freed as needed
> + * to complete the suspend.  Otherwise, the largest possible amount of memory
> + * will be freed.
> + */
> +#define OPPORTUNISTIC_SHRINKING		1

Can you use little less tabelators? Also shorter name for this one
might be "FREE_ALL". 

> +/*
> + * During suspend, on each attempt to free some more memory SHRINK_BITE
> + * is used as the number of pages to free
> + */
> +#define SHRINK_BITE	10000

Does this really need this kind of visibility? There's nothing user
should tweak here.

>  /**
> + *	On resume it is necessary to trace and eventually free the unsafe
> + *	pages that have been allocated, because they are needed for I/O
> + *	(on x86-64 we likely will "eat" these pages once again while
> + *	creating the temporary page translation tables)
> + */
> +
> +struct eaten_page {
> +	struct eaten_page	*next;
> +	char			padding[PAGE_SIZE - sizeof(void *)];
> +};

Less tabelators here, please...

-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFT][PATCH 2/3] swsusp: introduce the swap map structure
  2005-11-12 20:22 ` [RFT][PATCH 2/3] swsusp: introduce the swap map structure Rafael J. Wysocki
@ 2005-11-13 21:16   ` Pavel Machek
  2005-11-13 22:33     ` Rafael J. Wysocki
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Machek @ 2005-11-13 21:16 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML

Hi!

> This patch introduces the swap map structure that can be used by swsusp for
> keeping tracks of data pages written to the swap.  The structure itself is
> described in a comment within the patch.
> 
> The overall idea is to reduce the amount of metadata written to the swap
> and to write and read the image pages sequentially, in a file-alike way.
> This makes the swap-handling part of swsusp fairly independent of its
> snapshot-handling part and will hopefully allow us to completely
> separate these two parts in the future.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

ACK.

> +struct swap_map_handle {
> +	void			*tfm; /* Needed for the encryption */
> +	struct swap_map_page	*cur;
> +	unsigned int		k;
> +};

I thought you killed encryption in 1/3?

> @@ -33,6 +33,9 @@
>  
>  #include "power.h"
>  
> +struct pbe *pagedir_nosave = NULL;
> +unsigned int nr_copy_pages = 0;
> +
>  #ifdef CONFIG_HIGHMEM
>  struct highmem_page {
>  	char *data;

You don't need to initialize to zero/NULL.

								Pavel
-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFT][PATCH 3/3] swsusp: improve freeing of memory
  2005-11-13 21:14   ` Pavel Machek
@ 2005-11-13 22:27     ` Rafael J. Wysocki
  0 siblings, 0 replies; 9+ messages in thread
From: Rafael J. Wysocki @ 2005-11-13 22:27 UTC (permalink / raw)
  To: Pavel Machek; +Cc: kernel list

Hi,

On Sunday, 13 of November 2005 22:14, Pavel Machek wrote:
> Hi!
> 
> > This patch makes swsusp free only as much memory as needed and not as much
> > as possible.
> 
> Looks okay to me. ACK, modulo few small things.
> 
> > -
> >  /* References to section boundaries */
> >  extern const void __nosave_begin, __nosave_end;
> >  
> >  extern unsigned int nr_copy_pages;
> > -extern suspend_pagedir_t *pagedir_nosave;
> > -extern suspend_pagedir_t *pagedir_save;
> > +extern struct pbe *pagedir_nosave;
> > +
> > +/*
> > + * This compilation switch determines the way in which memory will be freed
> > + * during suspend.  If defined, only as much memory will be freed as needed
> > + * to complete the suspend.  Otherwise, the largest possible amount of memory
> > + * will be freed.
> > + */
> > +#define OPPORTUNISTIC_SHRINKING		1
> 
> Can you use little less tabelators? Also shorter name for this one
> might be "FREE_ALL". 

OK

> > +/*
> > + * During suspend, on each attempt to free some more memory SHRINK_BITE
> > + * is used as the number of pages to free
> > + */
> > +#define SHRINK_BITE	10000
> 
> Does this really need this kind of visibility? There's nothing user
> should tweak here.

By setting this to a smaller value you can make swsusp free more memory
sometimes, but of course it need not be visible.  I'll move it to swsusp.c

> >  /**
> > + *	On resume it is necessary to trace and eventually free the unsafe
> > + *	pages that have been allocated, because they are needed for I/O
> > + *	(on x86-64 we likely will "eat" these pages once again while
> > + *	creating the temporary page translation tables)
> > + */
> > +
> > +struct eaten_page {
> > +	struct eaten_page	*next;
> > +	char			padding[PAGE_SIZE - sizeof(void *)];
> > +};
> 
> Less tabelators here, please...

OK

Greetings,
Rafael


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFT][PATCH 2/3] swsusp: introduce the swap map structure
  2005-11-13 21:16   ` Pavel Machek
@ 2005-11-13 22:33     ` Rafael J. Wysocki
  0 siblings, 0 replies; 9+ messages in thread
From: Rafael J. Wysocki @ 2005-11-13 22:33 UTC (permalink / raw)
  To: Pavel Machek; +Cc: LKML

Hi,

On Sunday, 13 of November 2005 22:16, Pavel Machek wrote:
> Hi!
> 
> > This patch introduces the swap map structure that can be used by swsusp for
> > keeping tracks of data pages written to the swap.  The structure itself is
> > described in a comment within the patch.
> > 
> > The overall idea is to reduce the amount of metadata written to the swap
> > and to write and read the image pages sequentially, in a file-alike way.
> > This makes the swap-handling part of swsusp fairly independent of its
> > snapshot-handling part and will hopefully allow us to completely
> > separate these two parts in the future.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> ACK.
> 
> > +struct swap_map_handle {
> > +	void			*tfm; /* Needed for the encryption */
> > +	struct swap_map_page	*cur;
> > +	unsigned int		k;
> > +};
> 
> I thought you killed encryption in 1/3?

And I thought so, but this one apparently survived ...

> > @@ -33,6 +33,9 @@
> >  
> >  #include "power.h"
> >  
> > +struct pbe *pagedir_nosave = NULL;
> > +unsigned int nr_copy_pages = 0;
> > +
> >  #ifdef CONFIG_HIGHMEM
> >  struct highmem_page {
> >  	char *data;
> 
> You don't need to initialize to zero/NULL.

OK

I'll make the changes and post for inclusion into -mm in a couple of days.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-11-13 22:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-12 20:13 [RFT][PATCH 0/3] swsusp: improve freeing of memory Rafael J. Wysocki
2005-11-12 20:19 ` [RFT][PATCH 1/3] swsusp: remove encryption Rafael J. Wysocki
2005-11-12 23:42   ` Pavel Machek
2005-11-12 20:22 ` [RFT][PATCH 2/3] swsusp: introduce the swap map structure Rafael J. Wysocki
2005-11-13 21:16   ` Pavel Machek
2005-11-13 22:33     ` Rafael J. Wysocki
2005-11-12 20:24 ` [RFT][PATCH 3/3] swsusp: improve freeing of memory Rafael J. Wysocki
2005-11-13 21:14   ` Pavel Machek
2005-11-13 22:27     ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox