Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>,
	akpm@linux-foundation.org, Ury Stankevich <urykhy@gmail.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	stable@kernel.org
Subject: Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous
Date: Thu, 2 Jun 2011 15:50:19 +0100	[thread overview]
Message-ID: <20110602145019.GG7306@suse.de> (raw)
In-Reply-To: <20110602132954.GC19505@random.random>

On Thu, Jun 02, 2011 at 03:29:54PM +0200, Andrea Arcangeli wrote:
> On Thu, Jun 02, 2011 at 02:03:52AM +0100, Mel Gorman wrote:
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 2d29c9a..65fa251 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -631,12 +631,14 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
> >  		entry = mk_pmd(page, vma->vm_page_prot);
> >  		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> >  		entry = pmd_mkhuge(entry);
> > +
> >  		/*
> > -		 * The spinlocking to take the lru_lock inside
> > -		 * page_add_new_anon_rmap() acts as a full memory
> > -		 * barrier to be sure clear_huge_page writes become
> > -		 * visible after the set_pmd_at() write.
> > +		 * Need a write barrier to ensure the writes from
> > +		 * clear_huge_page become visible before the
> > +		 * set_pmd_at
> >  		 */
> > +		smp_wmb();
> > +
> 
> On x86 at least this is noop because of the
> spin_lock(&page_table_lock) after clear_huge_page. But I'm not against
> adding this in case other archs supports THP later.
> 

I thought spin lock acquisition was one-way where loads/stores
preceeding the lock are allowed to leak into the protected region
but not the other way around?

So we have

clear_huge_page()
__SetPageUptodate(page);
spin_lock(&mm->page_table_lock);
...
set_pmd_at(mm, haddr, pmd, entry);

This spinlock itself does not guarantee that writes from
clear_huge_page are complete before that set_pmd_at().

Whether this is right or wrong, why is the same not true in
collapse_huge_page()? There we are

       __collapse_huge_page_copy(pte, new_page, vma, address, ptl);
	....
        smp_wmb();
        spin_lock(&mm->page_table_lock);
	...
        set_pmd_at(mm, address, pmd, _pmd);

with the comment stressing that this is necessary.

> But smp_wmb() is optimized away at build time by cpp so this can't
> possibly help if you're reproducing !SMP.
> 

On X86 !SMP, this is still a barrier() which on gcc is

#define barrier() __asm__ __volatile__("": : :"memory")

so it's a compiler barrier. I'm not working on this at this at the
moment but when I get to it, I'll compare the object files and see
if there are relevant differences. Could be tomorrow before I get
the chance again.

> >  		page_add_new_anon_rmap(page, vma, haddr);
> >  		set_pmd_at(mm, haddr, pmd, entry);
> >  		prepare_pmd_huge_pte(pgtable, mm);
> > @@ -753,6 +755,13 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >  
> >  	pmdp_set_wrprotect(src_mm, addr, src_pmd);
> >  	pmd = pmd_mkold(pmd_wrprotect(pmd));
> > +
> > +	/*
> > +	 * Write barrier to make sure the setup for the PMD is fully visible
> > +	 * before the set_pmd_at
> > +	 */
> > +	smp_wmb();
> > +
> >  	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
> >  	prepare_pmd_huge_pte(pgtable, dst_mm);
> 
> This part seems superfluous to me, it's also noop for !SMP.

Other than being a compiler barrier.

> Only wmb()
> would stay. the pmd is perfectly fine to stay in a register, not even
> a compiler barrier is needed, even less a smp serialization.

There is an explanation in here somewhere because as I write this,
the test machine has survived 14 hours under continual stress without
the isolated counters going negative with over 128 million pages
successfully migrated and a million pages failed to migrate due to
direct compaction being called 80,000 times. It's possible it's a
co-incidence but it's some co-incidence!

-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@suse.de>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>,
	akpm@linux-foundation.org, Ury Stankevich <urykhy@gmail.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	stable@kernel.org
Subject: Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous
Date: Thu, 2 Jun 2011 15:50:19 +0100	[thread overview]
Message-ID: <20110602145019.GG7306@suse.de> (raw)
In-Reply-To: <20110602132954.GC19505@random.random>

On Thu, Jun 02, 2011 at 03:29:54PM +0200, Andrea Arcangeli wrote:
> On Thu, Jun 02, 2011 at 02:03:52AM +0100, Mel Gorman wrote:
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 2d29c9a..65fa251 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -631,12 +631,14 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
> >  		entry = mk_pmd(page, vma->vm_page_prot);
> >  		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> >  		entry = pmd_mkhuge(entry);
> > +
> >  		/*
> > -		 * The spinlocking to take the lru_lock inside
> > -		 * page_add_new_anon_rmap() acts as a full memory
> > -		 * barrier to be sure clear_huge_page writes become
> > -		 * visible after the set_pmd_at() write.
> > +		 * Need a write barrier to ensure the writes from
> > +		 * clear_huge_page become visible before the
> > +		 * set_pmd_at
> >  		 */
> > +		smp_wmb();
> > +
> 
> On x86 at least this is noop because of the
> spin_lock(&page_table_lock) after clear_huge_page. But I'm not against
> adding this in case other archs supports THP later.
> 

I thought spin lock acquisition was one-way where loads/stores
preceeding the lock are allowed to leak into the protected region
but not the other way around?

So we have

clear_huge_page()
__SetPageUptodate(page);
spin_lock(&mm->page_table_lock);
...
set_pmd_at(mm, haddr, pmd, entry);

This spinlock itself does not guarantee that writes from
clear_huge_page are complete before that set_pmd_at().

Whether this is right or wrong, why is the same not true in
collapse_huge_page()? There we are

       __collapse_huge_page_copy(pte, new_page, vma, address, ptl);
	....
        smp_wmb();
        spin_lock(&mm->page_table_lock);
	...
        set_pmd_at(mm, address, pmd, _pmd);

with the comment stressing that this is necessary.

> But smp_wmb() is optimized away at build time by cpp so this can't
> possibly help if you're reproducing !SMP.
> 

On X86 !SMP, this is still a barrier() which on gcc is

#define barrier() __asm__ __volatile__("": : :"memory")

so it's a compiler barrier. I'm not working on this at this at the
moment but when I get to it, I'll compare the object files and see
if there are relevant differences. Could be tomorrow before I get
the chance again.

> >  		page_add_new_anon_rmap(page, vma, haddr);
> >  		set_pmd_at(mm, haddr, pmd, entry);
> >  		prepare_pmd_huge_pte(pgtable, mm);
> > @@ -753,6 +755,13 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >  
> >  	pmdp_set_wrprotect(src_mm, addr, src_pmd);
> >  	pmd = pmd_mkold(pmd_wrprotect(pmd));
> > +
> > +	/*
> > +	 * Write barrier to make sure the setup for the PMD is fully visible
> > +	 * before the set_pmd_at
> > +	 */
> > +	smp_wmb();
> > +
> >  	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
> >  	prepare_pmd_huge_pte(pgtable, dst_mm);
> 
> This part seems superfluous to me, it's also noop for !SMP.

Other than being a compiler barrier.

> Only wmb()
> would stay. the pmd is perfectly fine to stay in a register, not even
> a compiler barrier is needed, even less a smp serialization.

There is an explanation in here somewhere because as I write this,
the test machine has survived 14 hours under continual stress without
the isolated counters going negative with over 128 million pages
successfully migrated and a million pages failed to migrate due to
direct compaction being called 80,000 times. It's possible it's a
co-incidence but it's some co-incidence!

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-06-02 14:50 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-30 13:13 [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous Mel Gorman
2011-05-30 13:13 ` Mel Gorman
2011-05-30 14:31 ` Andrea Arcangeli
2011-05-30 14:31   ` Andrea Arcangeli
2011-05-30 15:37   ` Mel Gorman
2011-05-30 15:37     ` Mel Gorman
2011-05-30 16:55     ` Mel Gorman
2011-05-30 16:55       ` Mel Gorman
2011-05-30 17:53       ` Andrea Arcangeli
2011-05-30 17:53         ` Andrea Arcangeli
2011-05-31 12:16         ` Minchan Kim
2011-05-31 12:16           ` Minchan Kim
2011-05-31 12:24           ` Andrea Arcangeli
2011-05-31 12:24             ` Andrea Arcangeli
2011-05-31 13:33             ` Minchan Kim
2011-05-31 13:33               ` Minchan Kim
2011-05-31 14:14               ` Andrea Arcangeli
2011-05-31 14:14                 ` Andrea Arcangeli
2011-05-31 14:37                 ` Minchan Kim
2011-05-31 14:37                   ` Minchan Kim
2011-05-31 14:38                   ` Minchan Kim
2011-05-31 14:38                     ` Minchan Kim
2011-06-02 18:23                     ` Andrea Arcangeli
2011-06-02 18:23                       ` Andrea Arcangeli
2011-06-02 20:21                       ` Minchan Kim
2011-06-02 20:21                         ` Minchan Kim
2011-06-02 20:59                         ` Minchan Kim
2011-06-02 20:59                           ` Minchan Kim
2011-06-02 22:03                           ` Andrea Arcangeli
2011-06-02 22:03                             ` Andrea Arcangeli
2011-06-02 21:40                         ` Andrea Arcangeli
2011-06-02 21:40                           ` Andrea Arcangeli
2011-06-02 22:23                           ` Minchan Kim
2011-06-02 22:23                             ` Minchan Kim
2011-06-02 22:32                             ` Andrea Arcangeli
2011-06-02 22:32                               ` Andrea Arcangeli
2011-06-02 23:01                               ` Minchan Kim
2011-06-02 23:01                                 ` Minchan Kim
2011-06-03 17:37                                 ` Andrea Arcangeli
2011-06-03 17:37                                   ` Andrea Arcangeli
2011-06-03 18:07                                   ` Andrea Arcangeli
2011-06-03 18:07                                     ` Andrea Arcangeli
2011-06-04  7:59                                     ` Minchan Kim
2011-06-04  7:59                                       ` Minchan Kim
2011-06-06 10:32                                     ` Mel Gorman
2011-06-06 10:32                                       ` Mel Gorman
2011-06-06 12:49                                       ` Andrea Arcangeli
2011-06-06 12:49                                         ` Andrea Arcangeli
2011-06-06 14:47                                         ` Mel Gorman
2011-06-06 14:47                                           ` Mel Gorman
2011-06-06 14:07                                       ` Minchan Kim
2011-06-06 14:07                                         ` Minchan Kim
2011-06-06 10:15                                 ` Mel Gorman
2011-06-06 10:15                                   ` Mel Gorman
2011-06-06 10:26                                   ` Mel Gorman
2011-06-06 10:26                                     ` Mel Gorman
2011-06-06 14:01                                   ` Minchan Kim
2011-06-06 14:01                                     ` Minchan Kim
2011-06-06 14:26                                   ` Minchan Kim
2011-06-06 14:26                                     ` Minchan Kim
2011-06-02 23:02                       ` Minchan Kim
2011-06-02 23:02                         ` Minchan Kim
2011-06-01  0:57                 ` Mel Gorman
2011-06-01  0:57                   ` Mel Gorman
2011-06-01  9:24                   ` Mel Gorman
2011-06-01  9:24                     ` Mel Gorman
2011-06-01 17:58                   ` Mel Gorman
2011-06-01 17:58                     ` Mel Gorman
2011-06-01 19:15                     ` Andrea Arcangeli
2011-06-01 19:15                       ` Andrea Arcangeli
2011-06-01 21:40                       ` Mel Gorman
2011-06-01 21:40                         ` Mel Gorman
2011-06-01 23:30                         ` Andrea Arcangeli
2011-06-01 23:30                           ` Andrea Arcangeli
2011-06-02  1:03                           ` Mel Gorman
2011-06-02  1:03                             ` Mel Gorman
2011-06-02  8:34                             ` Minchan Kim
2011-06-02  8:34                               ` Minchan Kim
2011-06-02 13:29                             ` Andrea Arcangeli
2011-06-02 13:29                               ` Andrea Arcangeli
2011-06-02 14:50                               ` Mel Gorman [this message]
2011-06-02 14:50                                 ` Mel Gorman
2011-06-02 15:37                                 ` Andrea Arcangeli
2011-06-02 15:37                                   ` Andrea Arcangeli
2011-06-03  2:09                                   ` Mel Gorman
2011-06-03  2:09                                     ` Mel Gorman
2011-06-03 14:49                                     ` Mel Gorman
2011-06-03 14:49                                       ` Mel Gorman
2011-06-03 15:45                                       ` Andrea Arcangeli
2011-06-03 15:45                                         ` Andrea Arcangeli
2011-06-04  7:25                                         ` Minchan Kim
2011-06-04  7:25                                           ` Minchan Kim
2011-06-06 10:39                                         ` Mel Gorman
2011-06-06 10:39                                           ` Mel Gorman
2011-06-06 12:38                                           ` Andrea Arcangeli
2011-06-06 12:38                                             ` Andrea Arcangeli
2011-06-06 14:55                                             ` Mel Gorman
2011-06-06 14:55                                               ` Mel Gorman
2011-06-06 14:19                                           ` Minchan Kim
2011-06-06 14:19                                             ` Minchan Kim
2011-06-06 22:32                                         ` Andrew Morton
2011-06-06 22:32                                           ` Andrew Morton
2011-06-04  6:58                                       ` Minchan Kim
2011-06-04  6:58                                         ` Minchan Kim
2011-06-06 10:43                                         ` Mel Gorman
2011-06-06 10:43                                           ` Mel Gorman
2011-06-06 12:40                                           ` Andrea Arcangeli
2011-06-06 12:40                                             ` Andrea Arcangeli
2011-06-06 13:27                                             ` Minchan Kim
2011-06-06 13:27                                               ` Minchan Kim
2011-06-06 13:23                                           ` Minchan Kim
2011-06-06 13:23                                             ` Minchan Kim
2011-05-31 14:34         ` Mel Gorman
2011-05-31 14:34           ` Mel Gorman
2011-05-30 14:45 ` [stable] " Greg KH
2011-05-30 14:45   ` Greg KH
2011-05-30 16:14 ` Minchan Kim
2011-05-30 16:14   ` Minchan Kim
2011-05-31  8:32   ` Mel Gorman
2011-05-31  8:32     ` Mel Gorman
2011-05-31  4:48 ` KAMEZAWA Hiroyuki
2011-05-31  4:48   ` KAMEZAWA Hiroyuki
2011-05-31  5:38   ` Minchan Kim
2011-05-31  5:38     ` Minchan Kim
2011-05-31  7:14 ` KOSAKI Motohiro
2011-05-31  7:14   ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110602145019.GG7306@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=stable@kernel.org \
    --cc=urykhy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.