Re: [RFC]vmscan: doing page_referenced() in batch way

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Rik van Riel <riel@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Shaohua Li <shaohua.li@intel.com>, linux-mm <linux-mm@kvack.org>,
	Andi Kleen <andi@firstfloor.org>,
	hughd@google.com
Subject: Re: [RFC]vmscan: doing page_referenced() in batch way
Date: Sun, 10 Oct 2010 13:10:01 -0400	[thread overview]
Message-ID: <4CB1F369.1080108@redhat.com> (raw)
In-Reply-To: <20101006131052.e3ae026f.akpm@linux-foundation.org>

On 10/06/2010 04:10 PM, Andrew Morton wrote:
> On Wed, 29 Sep 2010 10:57:33 +0800
> Shaohua Li<shaohua.li@intel.com>  wrote:
>
>> when memory pressure is high, page_referenced() causes a lot of lock contention
>> for anon_vma->lock or mapping->i_mmap_lock. Considering pages from one file
>> usually live side by side in LRU list, we can lock several pages in
>> shrink_page_list() and do batch page_referenced() to avoid some lock/unlock,
>> which should reduce lock contention a lot. The locking rule documented in
>> rmap.c is:
>> page_lock
>> 	mapping->i_mmap_lock
>> 		anon_vma->lock
>> For a batch of pages, we do page lock for all of them first and check their
>> reference, and then release their i_mmap_lock or anon_vma lock. This seems not
>> break the rule to me.
>> Before I further polish the patch, I'd like to know if there is anything
>> preventing us to do such batch here.
>
> The patch adds quite a bit of complexity, so we'd need to see benchmark
> testing results which justify it, please.
>
> Also, the entire patch is irrelevant for uniprocessor machines, so the
> runtime overhead and code-size increases for CONFIG_SMP=n builds should
> be as low as possible - ideally zero.  Please quantify this as well
> within the changelog if you pursue this work.
>
>>
>> ...
>>
>> +#define PRC_PAGE_NUM 8
>> +struct page_reference_control {
>> +	int num;
>> +	struct page *pages[PRC_PAGE_NUM];
>> +	int references[PRC_PAGE_NUM];
>> +	struct anon_vma *anon_vma;
>> +	struct address_space *mapping;
>> +	/* no ksm */
>> +};
>
> hm, 120 bytes of stack consumed, deep in page reclaim.
>
>>   #endif
>>
>>   extern int hwpoison_filter(struct page *p);
>>
>> ...
>>
>>   static int page_referenced_file(struct page *page,
>>   				struct mem_cgroup *mem_cont,
>> -				unsigned long *vm_flags)
>> +				unsigned long *vm_flags,
>> +				struct page_reference_control *prc)
>>   {
>>   	unsigned int mapcount;
>>   	struct address_space *mapping = page->mapping;
>> @@ -603,8 +623,25 @@ static int page_referenced_file(struct p
>>   	 */
>>   	BUG_ON(!PageLocked(page));
>>
>> -	spin_lock(&mapping->i_mmap_lock);
>> +	if (prc) {
>> +		if (mapping == prc->mapping) {
>> +			goto skip_lock;
>> +		}
>> +		if (prc->anon_vma) {
>> +			page_unlock_anon_vma(prc->anon_vma);
>> +			prc->anon_vma = NULL;
>> +		}
>> +		if (prc->mapping) {
>> +			spin_unlock(&prc->mapping->i_mmap_lock);
>> +			prc->mapping = NULL;
>> +		}
>> +		prc->mapping = mapping;
>> +
>> +		spin_lock(&mapping->i_mmap_lock);
>> +	} else
>> +		spin_lock(&mapping->i_mmap_lock);
>
> Move the spin_lock() outside, remove the `else' part.
>
>> +skip_lock:
>>   	/*
>>   	 * i_mmap_lock does not stabilize mapcount at all, but mapcount
>>   	 * is more likely to be accurate if we note it after spinning.
>> @@ -628,7 +665,8 @@ static int page_referenced_file(struct p
>>   			break;
>>   	}
>>
>> -	spin_unlock(&mapping->i_mmap_lock);
>> +	if (!prc)
>> +		spin_unlock(&mapping->i_mmap_lock);
>>   	return referenced;
>>   }
>>
>>
>> ...
>>
>> +static void do_prc_batch(struct scan_control *sc,
>> +	struct page_reference_control *prc)
>> +{
>> +	int i;
>> +	for (i = 0; i<  prc->num; i++)
>> +		prc->references[i] = page_check_references(prc->pages[i], sc,
>> +			prc);
>> +	/*
>> +	 * we must release all locks here, the lock ordering requries
>> +	 * pagelock->
>> +	 *   mapping->i_mmap_lock->
>> +	 *     anon_vma->lock
>> +	 * release lock guarantee we don't break the rule in next run
>> +	 */
>> +	if (prc->anon_vma) {
>> +		page_unlock_anon_vma(prc->anon_vma);
>> +		prc->anon_vma = NULL;
>> +	}
>> +	if (prc->mapping) {
>> +		spin_unlock(&prc->mapping->i_mmap_lock);
>> +		prc->mapping = NULL;
>> +	}
>> +}
>
> I didn't check the locking alterations.

I've tried to wrap my head around them, but haven't
yet been able to convince myself that it is safe.

What if we have multiple pages sharing the same
mapping or root anon_vma, which is not the same
as the prc->anon_vma ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-10-10 17:04 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-29  2:57 [RFC]vmscan: doing page_referenced() in batch way Shaohua Li
2010-10-06 20:10 ` Andrew Morton
2010-10-10 17:10   ` Rik van Riel [this message]
2010-10-11  0:49     ` Shaohua Li
2010-10-13  4:53   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CB1F369.1080108@redhat.com \
    --to=riel@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).