Re: [RFC][PATCH] swsusp: support creating bigger images

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Pavel Machek <pavel@suse.cz>, Linux PM <linux-pm@osdl.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Nigel Cunningham <nigel@suspend2.net>
Subject: Re: [RFC][PATCH] swsusp: support creating bigger images
Date: Tue, 25 Apr 2006 17:39:12 +0200	[thread overview]
Message-ID: <200604251739.13377.rjw@sisk.pl> (raw)
In-Reply-To: <444DF9B3.7080600@yahoo.com.au>

Hi,

First, thanks for reviewing.

On Tuesday 25 April 2006 12:28, Nick Piggin wrote:
> pageRafael J. Wysocki wrote:
> 
> > Index: linux-2.6.17-rc1-mm3/mm/rmap.c
> > ===================================================================
> > --- linux-2.6.17-rc1-mm3.orig/mm/rmap.c	2006-04-22 10:34:33.000000000 +0200
> > +++ linux-2.6.17-rc1-mm3/mm/rmap.c	2006-04-23 00:41:19.000000000 +0200
> > @@ -851,3 +851,22 @@ int try_to_unmap(struct page *page, int 
> >  	return ret;
> >  }
> >  
> > +#ifdef CONFIG_SOFTWARE_SUSPEND
> > +int page_mapped_by_current(struct page *page)
> > +{
> > +	struct vm_area_struct *vma;
> > +	int ret = 0;
> > +
> > +	spin_lock(&current->mm->page_table_lock);
> > +
> > +	for (vma = current->mm->mmap; vma; vma = vma->vm_next)
> > +		if (page_address_in_vma(page, vma) != -EFAULT) {
> > +			ret = 1;
> > +			break;
> > +		}
> > +
> > +	spin_unlock(&current->mm->page_table_lock);
> > +
> > +	return ret;
> > +}
> > +#endif /* CONFIG_SOFTWARE_SUSPEND */
> 
> Why not just pass in the task struct? It might make the interface more
> useful elsewhere.

Good idea, I'll do that.

> > Index: linux-2.6.17-rc1-mm3/include/linux/rmap.h
> > ===================================================================
> > --- linux-2.6.17-rc1-mm3.orig/include/linux/rmap.h	2006-04-22 10:34:33.000000000 +0200
> > +++ linux-2.6.17-rc1-mm3/include/linux/rmap.h	2006-04-22 10:34:45.000000000 +0200
> > @@ -104,6 +104,12 @@ pte_t *page_check_address(struct page *,
> >   */
> >  unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
> >  
> > +#ifdef CONFIG_SOFTWARE_SUSPEND
> > +int page_mapped_by_current(struct page *);
> > +#else
> > +static inline int page_mapped_by_current(struct page *page) { return 0; }
> > +#endif /* CONFIG_SOFTWARE_SUSPEND */
> 
> This just wrong. What if some random unrelated code calls
> page_mapped_by_current? You should only fold inlines if their result
> folds accordingly.
> 
> For now just leave it out completely if !CONFIG_SOFTWARE_SUSPEND.

OK

>   > -unsigned int count_data_pages(void)
> > +/**
> > + *	need_to_copy - determine if a page needs to be copied before saving.
> > + *	Returns false if the page can be saved without copying.
> > + */
> > +
> > +static int need_to_copy(struct page *page)
> > +{
> > +	if (!PageLRU(page) || PageCompound(page))
> > +		return 1;
> > +	if (page_mapped(page))
> > +		return page_mapped_by_current(page);
> > +
> > +	return 1;
> > +}
> 
> I'd much rather VM internal type stuff get moved *out* of kernel/power :(

Well, I kind of agree, but I don't know where to place it under mm/.

> It needs more comments too. Also, how important is it for the page to be
> off the LRU?

Hm, I'm not sure if that's what you're asking about, but the pages off the LRU
are handled in a usual way, ie. copied when snapshotting the system.  The
pages _on_ the LRU may be included in the snapshot image without
copying, but I require them additionally to be (a) mapped by someone and
(b) not mapped by the current task.

> What if kswapd has taken it off the LRU temporarily (or is it 
> guanteed not to at this point)? What if it is in an lru_add pagevec?

At this point kswapd is frozen as well as everything else except for the
current task (ie. us).  IOW no one is expected to process the LRU lists.

> > +
> > +unsigned int count_data_pages(unsigned int *total)
> >  {
> >  	struct zone *zone;
> >  	unsigned long zone_pfn;
> > -	unsigned int n = 0;
> > +	unsigned int n, m;
> >  
> > +	n = m = 0;
> >  	for_each_zone (zone) {
> >  		if (is_highmem(zone))
> >  			continue;
> >  		mark_free_pages(zone);
> > -		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn)
> > -			n += saveable(zone, &zone_pfn);
> > +		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
> > +			unsigned long pfn = zone_pfn + zone->zone_start_pfn;
> > +
> > +			if (saveable(pfn)) {
> > +				n++;
> > +				if (!need_to_copy(pfn_to_page(pfn)))
> > +					m++;
> > +			}
> > +		}
> >  	}
> > -	return n;
> > +	if (total)
> > +		*total = n;
> > +
> > +	return n - m;
> >  }
> 
> Maybe we could add some commenting to these functions while changing them
> around.

OK, I will.

> > +static LIST_HEAD(active_pages);
> > +static LIST_HEAD(inactive_pages);
> > +
> > +/**
> > + *	protect_data_pages - move data pages that need to be protected from
> > + *	being reclaimed out of their respective LRU lists
> > + */
> > +
> > +static void protect_data_pages(struct pbe *pblist)
> > +{
> > +	struct pbe *p;
> > +
> > +	for_each_pbe (p, pblist)
> > +		if (p->address == p->orig_address) {
> > +			struct page *page = virt_to_page(p->address);
> > +			struct zone *zone = page_zone(page);
> > +
> > +			spin_lock(&zone->lru_lock);
> > +			if (PageActive(page)) {
> > +				del_page_from_active_list(zone, page);
> > +				list_add(&page->lru, &active_pages);
> > +			} else {
> > +				del_page_from_inactive_list(zone, page);
> > +				list_add(&page->lru, &inactive_pages);
> > +			}
> > +			spin_unlock(&zone->lru_lock);
> > +			ClearPageLRU(page);
> > +		}
> > +}
> 
> a) How do you know the page is on the LRU?

Because in copy_data_pages() p->address is only set to p->orig_address for
pages that are on the LRU.  No one can change the state of the pages in
between because everyone else is frozen and we have local IRQs disabled.

> b) Why are you clearing PageLRU outside the spinlock?

Well, good question. ;-)  Moreover it seems I don't need to acquire the
spinlock at all, because this is done on one CPU with IRQs disabled and the
other tasks frozen.

> c) This code isn't going outside mm/ even if it is correct.

OK, so could you please suggest me where to put it (or a corrected version)
under mm/?

> d) swsusp seems to be quite under documented as far as race conditions
>     vs the rest of the kernel go. In some places only a single CPU is
>     running with interrupts off, in other places all processes have
>     frozen, etc. etc. So it is hard to even know if synchronisation is
>     correct or not.
> 
>     Suggest more documentation goes into this.

OK

> > +
> > +/**
> > + *	restore_active_inactive_lists - if suspend fails, the pages protected
> > + *	with protect_data_pages() have to be moved back to their respective
> > + *	lists
> > + */
> > +
> > +static void restore_active_inactive_lists(void)
> 
> Ditto, mostly.
> 
> > +static int enough_free_mem(unsigned int nr_pages, unsigned int copy_pages)
> >  {
> >  	struct zone *zone;
> > -	unsigned int n = 0;
> > +	long n = 0;
> > +
> > +	pr_debug("swsusp: pages needed: %u + %lu + %lu, free: %u\n",
> > +		 copy_pages,
> > +		 (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
> > +		 EXTRA_PAGES, nr_free_pages());
> >  
> >  	for_each_zone (zone)
> > -		if (!is_highmem(zone))
> > +		if (!is_highmem(zone) && populated_zone(zone)) {
> >  			n += zone->free_pages;
> > -	pr_debug("swsusp: available memory: %u pages\n", n);
> > -	return n > (nr_pages + PAGES_FOR_IO +
> > -		(nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
> > -}
> > -
> > -static int alloc_data_pages(struct pbe *pblist, gfp_t gfp_mask, int safe_needed)
> > -{
> > -	struct pbe *p;
> > +			n -= zone->lowmem_reserve[ZONE_NORMAL];
> 
> Comment for this? Eg. why are you taking the ZONE_NORMAL watermark
> in particular?

This is to sync enough_free_mem() with swsusp_shrink_memory() (in swsusp.c).
[We allocate with GFP_ATOMIC ie. from the normal zone and the lowmem reserves
in the lower zones are effectively unavailable to us because of the check in
zone_watermark_ok().]

Thanks a lot for the comments.  I'll do my best to follow your suggestions
in the next version of the patch.

Greetings,
Rafael

next prev parent reply	other threads:[~2006-04-25 15:39 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-24 21:55 [RFC][PATCH] swsusp: support creating bigger images Rafael J. Wysocki
2006-04-24 22:16 ` Pavel Machek
2006-04-25  8:26   ` Rafael J. Wysocki
2006-04-25 10:04     ` Pavel Machek
2006-04-25 10:31       ` Rafael J. Wysocki
2006-04-27 15:27         ` Rafael J. Wysocki
2006-04-27 20:55           ` Pavel Machek
2006-04-28  9:19             ` Rafael J. Wysocki
2006-04-28  9:23               ` Pavel Machek
2006-04-25 10:28 ` Nick Piggin
2006-04-25 15:39   ` Rafael J. Wysocki [this message]
2006-04-25 20:32     ` Pavel Machek
2006-04-25 21:12       ` Rafael J. Wysocki
2006-04-25 21:18         ` Nigel Cunningham
2006-04-25 22:21           ` Rafael J. Wysocki
2006-04-25 22:24             ` Nigel Cunningham
2006-04-25 22:38               ` Rafael J. Wysocki
2006-04-25 22:25             ` Pavel Machek
2006-04-25 22:30               ` Nigel Cunningham
2006-04-25 22:36                 ` Pavel Machek
2006-04-25 22:43               ` Rafael J. Wysocki
2006-04-26  0:49                 ` Nigel Cunningham
2006-04-30 12:27                   ` Rafael J. Wysocki
2006-05-01  1:49                     ` Nigel Cunningham
2006-05-01 11:20                       ` Rafael J. Wysocki
2006-05-01 22:56                         ` Nigel Cunningham
2006-04-26  2:24             ` Nick Piggin
2006-04-26  3:41               ` Nigel Cunningham
2006-04-26 16:22                 ` Nick Piggin
2006-04-26 21:16                   ` Rafael J. Wysocki
2006-04-26  8:10               ` Pavel Machek
2006-04-27 19:53     ` [RFC][PATCH] swsusp: support creating bigger images (rev. 2) Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200604251739.13377.rjw@sisk.pl \
    --to=rjw@sisk.pl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@osdl.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=nigel@suspend2.net \
    --cc=pavel@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox