linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <j.glisse@gmail.com>
To: Davidlohr Bueso <davidlohr@hp.com>
Cc: sagi grimberg <sagig@mellanox.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linda Wang <lwang@redhat.com>, Kevin E Martin <kem@redhat.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
	Jeff Law <law@redhat.com>, Brendan Conoboy <blc@redhat.com>,
	Joe Donohue <jdonohue@redhat.com>,
	Duncan Poole <dpoole@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Lucien Dunning <ldunning@nvidia.com>, Cameron Buschardt <
Subject: Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).
Date: Thu, 8 May 2014 21:45:45 -0400	[thread overview]
Message-ID: <20140509014544.GB2906@gmail.com> (raw)
In-Reply-To: <1399599734.2497.2.camel@buesod1.americas.hpqcorp.net>

On Thu, May 08, 2014 at 06:42:14PM -0700, Davidlohr Bueso wrote:
> On Thu, 2014-05-08 at 13:56 -0400, Jerome Glisse wrote:
> > On Thu, May 08, 2014 at 07:47:04PM +0300, sagi grimberg wrote:
> > > On 5/7/2014 5:33 AM, Davidlohr Bueso wrote:
> > > >On Tue, 2014-05-06 at 12:29 +0200, Peter Zijlstra wrote:
> > > >>So you forgot to CC Linus, Linus has expressed some dislike for
> > > >>preemptible mmu_notifiers in the recent past:
> > > >>
> > > >>   https://lkml.org/lkml/2013/9/30/385
> > > >I'm glad this came up again.
> > > >
> > > >So I've been running benchmarks (mostly aim7, which nicely exercises our
> > > >locks) comparing my recent v4 for rwsem optimistic spinning against
> > > >previous implementation ideas for the anon-vma lock, mostly:
> > > >
> > > >- rwsem (currently)
> > > >- rwlock_t
> > > >- qrwlock_t
> > > >- rwsem+optspin
> > > >
> > > >Of course, *any* change provides significant improvement in throughput
> > > >for several workloads, by avoiding to block -- there are more
> > > >performance numbers in the different patches. This is fairly obvious.
> > > >
> > > >What is perhaps not so obvious is that rwsem+optimistic spinning beats
> > > >all others, including the improved qrwlock from Waiman and Peter. This
> > > >is mostly because of the idea of cancelable MCS, which was mimic'ed from
> > > >mutexes. The delta in most cases is around +10-15%, which is non
> > > >trivial.
> > > 
> > > These are great news David!
> > > 
> > > >I mention this because from a performance PoV, we'll stop caring so much
> > > >about the type of lock we require in the notifier related code. So while
> > > >this is not conclusive, I'm not as opposed to keeping the locks blocking
> > > >as I once was. Now this might still imply things like poor design
> > > >choices, but that's neither here nor there.
> > > 
> > > So is the rwsem+opt strategy the way to go Given it keeps everyone happy?
> > > We will be more than satisfied with it as it will allow us to
> > > guarantee device
> > > MMU update.
> > > 
> > > >/me sees Sagi smiling ;)
> > > 
> > > :)
> > 
> > So i started doing thing with tlb flush but i must say things looks ugly.
> > I need a new page flag (goodbye 32bits platform) and i need my own lru and
> > page reclaimation for any page in use by a device, i need to hook up inside
> > try_to_unmap or migrate (but i will do the former). I am trying to be smart
> > by trying to schedule a worker on another cpu before before sending the ipi
> > so that while the ipi is in progress hopefully another cpu might schedule
> > the invalidation on the GPU and the wait after ipi for the gpu will be quick.
> > 
> > So all in all this is looking ugly and it does not change the fact that i
> > sleep (well need to be able to sleep). It just move the sleeping to another
> > part.
> > 
> > Maybe i should stress that with the mmu_notifier version it only sleep for
> > process that are using the GPU those process are using userspace API like
> > OpenCL which are not playing well with fork, ie read do not use fork if
> > you are using such API.
> > 
> > So for my case if a process has mm->hmm set to something that would mean
> > that there is a GPU using that address space and that it is unlikely to
> > go under the massive workload that people try to optimize the anon_vma
> > lock for.
> > 
> > My point is that with rwsem+optspin it could try spinning if mm->hmm
> > was NULL and make the massive fork workload go fast, or it could sleep
> > directly if mm->hmm is set.
> 
> Sorry? Unless I'm misunderstanding you, we don't do such things. Our
> locks are generic and need to work for any circumstance, no special
> cases here and there... _specially_ with these kind of things. So no,
> rwsem will spin as long as the owner is set, just like any other users.
> 
> Thanks,
> Davidlohr
> 

I do not mind spining all time i was just thinking that it could be optimize
away in case there is hmm for the current mm as it means that any way there
very much likely gonna be a schedule inside the mmu_notifier.

But if you prefer keep code generic i am fine with wasting cpu cycle.

Cheers,
Jérôme Glisse

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2014-05-09  1:45 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-02 13:51 [RFC] Heterogeneous memory management (mirror process address space on a device mmu) j.glisse
2014-05-02 13:52 ` [PATCH 01/11] mm: differentiate unmap for vmscan from other unmap j.glisse
2014-05-02 13:52 ` [PATCH 02/11] mmu_notifier: add action information to address invalidation j.glisse
2014-05-02 13:52 ` [PATCH 03/11] mmu_notifier: pass through vma to invalidate_range and invalidate_page j.glisse
2014-05-02 13:52 ` [PATCH 04/11] interval_tree: helper to find previous item of a node in rb interval tree j.glisse
2014-05-02 13:52 ` [PATCH 05/11] mm/memcg: support accounting null page and transfering null charge to new page j.glisse
2014-05-02 13:52 ` [PATCH 06/11] hmm: heterogeneous memory management j.glisse
2014-05-02 13:52 ` [PATCH 07/11] hmm: support moving anonymous page to remote memory j.glisse
2014-05-02 13:52 ` [PATCH 08/11] hmm: support for migrate file backed pages " j.glisse
2014-05-02 13:52 ` [PATCH 09/11] fs/ext4: add support for hmm migration to remote memory of pagecache j.glisse
2014-05-02 13:52 ` [PATCH 10/11] hmm/dummy: dummy driver to showcase the hmm api j.glisse
2014-05-02 13:52 ` [PATCH 11/11] hmm/dummy_driver: add support for fake remote memory using pages j.glisse
2014-05-06 10:29 ` [RFC] Heterogeneous memory management (mirror process address space on a device mmu) Peter Zijlstra
2014-05-06 14:57   ` Linus Torvalds
2014-05-06 15:00     ` Jerome Glisse
2014-05-06 15:18       ` Linus Torvalds
2014-05-06 15:33         ` Jerome Glisse
2014-05-06 15:42           ` Rik van Riel
2014-05-06 15:47           ` Linus Torvalds
2014-05-06 16:18             ` Jerome Glisse
2014-05-06 16:32               ` Linus Torvalds
2014-05-06 16:49                 ` Jerome Glisse
2014-05-06 17:28                 ` Jerome Glisse
2014-05-06 17:43                   ` Linus Torvalds
2014-05-06 18:13                     ` Jerome Glisse
2014-05-06 18:22                       ` Linus Torvalds
2014-05-06 18:38                         ` Jerome Glisse
2014-05-07  7:18                 ` Benjamin Herrenschmidt
2014-05-07  7:14               ` Benjamin Herrenschmidt
2014-05-07 12:39                 ` Jerome Glisse
2014-05-09  1:26                 ` Jerome Glisse
2014-05-10  4:28                   ` Benjamin Herrenschmidt
2014-05-11  0:48                     ` Jerome Glisse
2014-05-06 16:30             ` Rik van Riel
2014-05-06 16:34               ` Linus Torvalds
2014-05-06 16:47                 ` Rik van Riel
2014-05-06 16:54                   ` Jerome Glisse
2014-05-06 18:02                     ` H. Peter Anvin
2014-05-06 18:26                       ` Jerome Glisse
2014-05-06 22:44                 ` David Airlie
2014-05-07  2:33   ` Davidlohr Bueso
2014-05-07 13:00     ` Peter Zijlstra
2014-05-07 17:34       ` Davidlohr Bueso
2014-05-07 16:21     ` Linus Torvalds
2014-05-08 16:47     ` sagi grimberg
2014-05-08 17:56       ` Jerome Glisse
2014-05-09  1:42         ` Davidlohr Bueso
2014-05-09  1:45           ` Jerome Glisse [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140509014544.GB2906@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=SCheung@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=blc@redhat.com \
    --cc=davidlohr@hp.com \
    --cc=dpoole@nvidia.com \
    --cc=hpa@zytor.com \
    --cc=jdonohue@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=jweiner@redhat.com \
    --cc=kem@redhat.com \
    --cc=law@redhat.com \
    --cc=ldunning@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwang@redhat.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhairgrove@nvidia.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sagig@mellanox.com \
    --cc=sgutti@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).