All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Raphaël Beamonte" <raphael.beamonte@gmail.com>,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	"lttng-dev@lists.lttng.org" <lttng-dev@lists.lttng.org>,
	"Mel Gorman" <mgorman@suse.de>,
	"Rob van der Heij" <rvdheij@gmail.com>
Subject: Re: [lttng-dev] [-stable 3.8.1 performance regression] madvise POSIX_FADV_DONTNEED
Date: Mon, 17 Jun 2013 19:44:30 -0700	[thread overview]
Message-ID: <20130617194430.16d8c3ea.akpm@linux-foundation.org> (raw)
In-Reply-To: <20130618021528.GA29506@Krystal>

On Mon, 17 Jun 2013 22:15:28 -0400 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> * Andrew Morton (akpm@linux-foundation.org) wrote:
> > On Mon, 17 Jun 2013 17:39:36 -0400 Rapha__l Beamonte <raphael.beamonte@gmail.com> wrote:
> > 
> > > 2013/6/17 Andrew Morton <akpm@linux-foundation.org>
> > > 
> > > > That change wasn't terribly efficient - if there are any unpopulated
> > > > pages in the range (which is quite likely), fadvise() will now always
> > > > call invalidate_mapping_pages() a second time.
> > > >
> > > > Perhaps this is fixable.  Say, make lru_add_drain_all() return a
> > > > success code, or even teach lru_add_drain_all() to return a code
> > > > indicating that one of the spilled pages was (or might have been) on a
> > > > particular mapping.
> > > >
> > > 
> > > Following our tests results, that was the call to lru_add_drain_all() that
> > > causes the problem. The second call to invalidate_mapping_pages() isn't
> > > really important. We tried to compile a kernel with the commit introducing
> > > this change but with the "lru_add_drain_all()" line removed, and the
> > > problem disappeared, even if we called two times invalidate_mapping_pages()
> > > (as the rest of the commit was still here).
> > 
> > Ah, OK, schedule_on_each_cpu() could certainly do that - it has to wait
> > for every CPU to context switch and schedule the worker function.
> > 
> > There's a lot we could do here.  Such as not doing the schedule_work()
> > at all for a cpu which has an empty lru_add_pvec.
> 
> First approach proposed, submitted as RFC. Compile-tested only.
> 
> ...
>
> Second approach, submitted as RFC with some questions left unanswered
> in the code. Compile-tested only.
> 
> ---
>  include/linux/swap.h |    1 
>  mm/fadvise.c         |    2 -
>  mm/swap.c            |   62 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 64 insertions(+), 1 deletion(-)
> 
> Index: linux/include/linux/swap.h
> ===================================================================
> --- linux.orig/include/linux/swap.h
> +++ linux/include/linux/swap.h
> @@ -242,6 +242,7 @@ extern void mark_page_accessed(struct pa
>  extern void lru_add_drain(void);
>  extern void lru_add_drain_cpu(int cpu);
>  extern int lru_add_drain_all(void);
> +extern int lru_add_drain_mapping(struct address_space *mapping);
>  extern void rotate_reclaimable_page(struct page *page);
>  extern void deactivate_page(struct page *page);
>  extern void swap_setup(void);
> Index: linux/mm/swap.c
> ===================================================================
> --- linux.orig/mm/swap.c
> +++ linux/mm/swap.c
> @@ -689,6 +689,68 @@ int lru_add_drain_all(void)
>  }
>  
>  /*
> + * Returns 0 for success
> + */
> +int lru_add_drain_mapping(struct address_space *mapping)
> +{
> +	int cpu;
> +	struct work_struct __percpu *works;
> +
> +	works = alloc_percpu(struct work_struct);
> +	if (!works)
> +		return -ENOMEM;
> +
> +	get_online_cpus();
> +
> +	for_each_online_cpu(cpu) {
> +		struct pagevec *pvecs = per_cpu(lru_add_pvecs, cpu);
> +		struct work_struct *work = per_cpu_ptr(works, cpu);
> +		struct pagevec *pvec;
> +		int lru;
> +
> +		INIT_WORK(work, lru_add_drain_per_cpu);
> +		for_each_lru(lru) {
> +			int i;
> +
> +			pvec = &pvecs[lru - LRU_BASE];
> +			/*
> +			 * Race failure mode is to flush unnecessarily.
> +			 * We use PAGEVEC_SIZE rather than pvec->nr to
> +			 * stay on the safe-side wrt pvec resize.
> +			 */
> +			for (i = 0; i < PAGEVEC_SIZE; i++) {
> +				struct page *page;
> +
> +				/*
> +				 * Racy access to page. TODO: is it OK
> +				 * to access it from the remote CPU's
> +				 * lru without any kind of ownership or
> +				 * synchronization ?
> +				 */

Almost.  The only problem I can think of is that the other CPU flushes
its pagevec then a page gets freed then a memory hot-unplug occurs and
the memory hotplug handler frees and then unmaps the memory which was
used to hold the page's `struct page' (I don't think the current
mem-hotplug code even does this) and now this cpu's page->mapping
access goes oops.  

IOW, not worth worrying for a prototype "hey lets test this and see if
it fixes things" patch.


> +				page = ACCESS_ONCE(pvec->pages[i]);
> +				if (!pvec->pages[i])
> +					continue;
> +				if (page->mapping == mapping) {
> +					schedule_work_on(cpu, work);
> +					goto next_cpu;
> +				}
> +			}
> +		}
> +	next_cpu: ;	/* TODO: coding ugliness */
> +	}
> +
> +	for_each_online_cpu(cpu) {
> +		struct work_struct *work = per_cpu_ptr(works, cpu);
> +		if (work_pending(work))
> +			flush_work(work);
> +	}
> +
> +	put_online_cpus();
> +	free_percpu(works);
> +	return 0;
> +}
>
> ...
>

  reply	other threads:[~2013-06-18  2:44 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <51BE1828.3060206@gmail.com>
2013-06-17 14:13 ` [-stable 3.8.1 performance regression] madvise POSIX_FADV_DONTNEED Mathieu Desnoyers
2013-06-17 21:24   ` Andrew Morton
2013-06-17 21:39     ` Raphaël Beamonte
     [not found]     ` <CAE_Gge34HCroSgNgiXL1j7Le3CNKRXR=7TZQhJSmY+wfWniKug@mail.gmail.com>
2013-06-17 21:57       ` [lttng-dev] " Andrew Morton
2013-06-18  2:15         ` Mathieu Desnoyers
2013-06-18  2:44           ` Andrew Morton [this message]
2013-06-18  9:29     ` Mel Gorman
2013-06-18 10:11       ` Mel Gorman
2013-06-19 19:25         ` Mathieu Desnoyers
2013-06-20  6:36           ` Rob van der Heij
     [not found]           ` <CAJCc=kijujORhPUmPvzHj-MMdyVbf-iHEK0Jx-VHbTO8q4ESFA@mail.gmail.com>
2013-06-20 12:20             ` Mathieu Desnoyers
2013-06-25  1:56               ` Dave Chinner
2013-07-02 13:58                 ` Mathieu Desnoyers
2013-07-03  0:55                   ` Mathieu Desnoyers
2013-07-03  8:47                     ` Mel Gorman
2013-07-03 14:53                       ` Jeff Moyer
2013-07-03 14:53                         ` Jeff Moyer
2013-07-04  0:03                         ` Dave Chinner
2013-07-04  0:31                           ` Mathieu Desnoyers
2013-07-04 21:11                             ` Rob van der Heij
2013-07-05  1:42                             ` Dave Chinner
2013-07-05  2:34                               ` Mathieu Desnoyers
2013-07-03 18:47                       ` Yannick Brosseau
2013-07-05 14:18                         ` Mel Gorman
2013-07-05 14:18                         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130617194430.16d8c3ea.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lttng-dev@lists.lttng.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@suse.de \
    --cc=raphael.beamonte@gmail.com \
    --cc=rvdheij@gmail.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.