Re: [PATCH 3/5] lib: lockless generic and arch independent page table (gpt) v2.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jerome Glisse <j.glisse@gmail.com>
To: Rik van Riel <riel@redhat.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	joro@8bytes.org, "Mel Gorman" <mgorman@suse.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Johannes Weiner" <jweiner@redhat.com>,
	"Larry Woodman" <lwoodman@redhat.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Brendan Conoboy" <blc@redhat.com>,
	"Joe Donohue" <jdonohue@redhat.com>,
	"Duncan Poole" <dpoole@nvidia.com>,
	"Sherry Cheung" <SCheung@nvidia.com>,
	"Subhash Gutti" <sgutti@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Mark Hairgrove" <mhairgrove@nvidia.com>,
	"Lucien Dunning" <ldunning@nvidia.com>,
	"Cameron Buschardt" <cabuschardt@nvidia.com>,
	"Arvind Gopalakrishnan" <arvindg@nvidia.com>,
	"Shachar Raindel" <raindel@mellanox.com>,
	"Liran Liss" <liranl@mellanox.com>,
	"Roland Dreier" <roland@purestorage.com>,
	"Ben Sander" <ben.sander@amd.com>,
	"Greg Stoner" <Greg.Stoner@amd.com>,
	"John Bridgman" <John.Bridgman@amd.com>,
	"Michael Mantor" <Michael.Mantor@amd.com>,
	"Paul Blinzer" <Paul.Blinzer@amd.com>,
	"Laurent Morichetti" <Laurent.Morichetti@amd.com>,
	"Alexander Deucher" <Alexander.Deucher@amd.com>,
	"Oded Gabbay" <Oded.Gabbay@amd.com>,
	"Jérôme Glisse" <jglisse@redhat.com>
Subject: Re: [PATCH 3/5] lib: lockless generic and arch independent page table (gpt) v2.
Date: Thu, 6 Nov 2014 17:40:53 -0500	[thread overview]
Message-ID: <20141106224051.GA6877@gmail.com> (raw)
In-Reply-To: <545BF6E0.8060001@redhat.com>

On Thu, Nov 06, 2014 at 05:32:00PM -0500, Rik van Riel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 11/03/2014 03:42 PM, j.glisse@gmail.com wrote:
> > From: Jerome Glisse <jglisse@redhat.com>
> > 
> > Page table is a common structure format most notably use by cpu
> > mmu. The arch depend page table code has strong tie to the
> > architecture which makes it unsuitable to be use by other non arch
> > specific code.
> > 
> > This patch implement a generic and arch independent page table. It
> > is generic in the sense that entry size can be u64 or unsigned long
> > (or u32 too on 32bits arch).
> > 
> > It is lockless in the sense that at any point in time you can have
> > concurrent thread updating the page table (removing or changing
> > entry) and faulting in the page table (adding new entry). This is
> > achieve by enforcing each updater and each faulter to take a range
> > lock. There is no exclusion on range lock, ie several thread can
> > fault or update the same range concurrently and it is the
> > responsability of the user to synchronize update to the page table
> > entry (pte), update to the page table directory (pdp) is under gpt
> > responsability.
> > 
> > API usage pattern is : gpt_init()
> > 
> > gpt_lock_update(lock_range) // User can update pte for instance by
> > using atomic bit operation // allowing complete lockless update. 
> > gpt_unlock_update(lock_range)
> > 
> > gpt_lock_fault(lock_range) // User can fault in pte but he is
> > responsible for avoiding thread // to concurrently fault the same
> > pte and for properly accounting // the number of pte faulted in the
> > pdp structure. gpt_unlock_fault(lock_range) // The new faulted pte
> > will only be visible to others updaters only // once all concurrent
> > faulter on the address unlock.
> > 
> > Details on how the lockless concurrent updater and faulter works is
> > provided in the header file.
> > 
> > Changed since v1: - Switch to macro implementation instead of using
> > arithmetic to accomodate the various size for table entry
> > (uint64_t, unsigned long, ...). This is somewhat less flexbile but
> > right now there is no use for the extra flexibility v1 was
> > offering.
> > 
> > Signed-off-by: Jerome Glisse <jglisse@redhat.com>
> 
> Never a fan of preprocessor magic, but I  see why it's needed.
> 
> Acked-by: Rik van Riel <riel@redhat.com>

v1 is not using preprocessor but has a bigger gpt struct footprint and also
more complex calculation for page table walking due to the fact that i just
rely more on runtime computation than on compile time shift define through
preprocessor magic.

Given i am not a fan either of preprocessor magic if it makes you feel any
better i can resort to use v1, both have seen same kind of testing and both
are functionaly equivalent (API they expose is obviously slightly different).

I am not convince that what the computation i save using preprocessor will
show up in anyway as being bottleneck for hot path.

Cheers,
Jerome

> 
> 
> - -- 
> All rights reversed
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> 
> iQEcBAEBAgAGBQJUW/bgAAoJEM553pKExN6Dl6IH/i9rSRtvdO9+lf1cUe686XJb
> GZ8KOp3Qa+ac0W63NqEaY5W+Fi7qyZJdoRFLCyOHBSP44qg9yoEJz8IbdPVNRjGG
> lXyyfyOP0PY3wSakSP/IS3OIvapav6YPXiOIX7FlzPTReL+RWJPDYpmvi6S6nXgS
> PuVTedVT5yaZwcqh0CyfDZ8pQqxEBSyvdVY/ntia7hxtUk9Or/sWVaRn8RE1u6EZ
> xA5DtjqTB+UHmNtmTNe2B5i2TmvhIFYr+/ydCs76osR2e+UBcqQtnN3cdudZWyj3
> Pk1c/7qtTqgS2pdiIkpjCKH5qXIszGM6vDSGCjM/4/7afX+vjk7UQDWeXGfzQFs=
> =ndqX
> -----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Jerome Glisse <j.glisse@gmail.com>
To: Rik van Riel <riel@redhat.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	joro@8bytes.org, "Mel Gorman" <mgorman@suse.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Johannes Weiner" <jweiner@redhat.com>,
	"Larry Woodman" <lwoodman@redhat.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Brendan Conoboy" <blc@redhat.com>,
	"Joe Donohue" <jdonohue@redhat.com>,
	"Duncan Poole" <dpoole@nvidia.com>,
	"Sherry Cheung" <SCheung@nvidia.com>,
	"Subhash Gutti" <sgutti@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Mark Hairgrove" <mhairgrove@nvidia.com>,
	"Lucien Dunning" <ldunning@nvidia.com>,
	"Cameron Buschardt" <cabuschardt@nvidia.com>,
	"Arvind Gopalakrishnan" <arvindg@nvidia.com>,
	"Shachar Raindel" <raindel@mellanox.com>,
	"Liran Liss" <liranl@mellanox.com>,
	"Roland Dreier" <roland@purestorage.com>,
	"Ben Sander" <ben.sander@amd.com>,
	"Greg Stoner" <Greg.Stoner@amd.com>,
	"John Bridgman" <John.Bridgman@amd.com>,
	"Michael Mantor" <Michael.Mantor@amd.com>,
	"Paul Blinzer" <Paul.Blinzer@amd.com>,
	"Laurent Morichetti" <Laurent.Morichetti@amd.com>,
	"Alexander Deucher" <Alexander.Deucher@amd.com>,
	"Oded Gabbay" <Oded.Gabbay@amd.com>,
	"Jérôme Glisse" <jglisse@redhat.com>
Subject: Re: [PATCH 3/5] lib: lockless generic and arch independent page table (gpt) v2.
Date: Thu, 6 Nov 2014 17:40:53 -0500	[thread overview]
Message-ID: <20141106224051.GA6877@gmail.com> (raw)
In-Reply-To: <545BF6E0.8060001@redhat.com>

On Thu, Nov 06, 2014 at 05:32:00PM -0500, Rik van Riel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 11/03/2014 03:42 PM, j.glisse@gmail.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > Page table is a common structure format most notably use by cpu
> > mmu. The arch depend page table code has strong tie to the
> > architecture which makes it unsuitable to be use by other non arch
> > specific code.
> > 
> > This patch implement a generic and arch independent page table. It
> > is generic in the sense that entry size can be u64 or unsigned long
> > (or u32 too on 32bits arch).
> > 
> > It is lockless in the sense that at any point in time you can have
> > concurrent thread updating the page table (removing or changing
> > entry) and faulting in the page table (adding new entry). This is
> > achieve by enforcing each updater and each faulter to take a range
> > lock. There is no exclusion on range lock, ie several thread can
> > fault or update the same range concurrently and it is the
> > responsability of the user to synchronize update to the page table
> > entry (pte), update to the page table directory (pdp) is under gpt
> > responsability.
> > 
> > API usage pattern is : gpt_init()
> > 
> > gpt_lock_update(lock_range) // User can update pte for instance by
> > using atomic bit operation // allowing complete lockless update. 
> > gpt_unlock_update(lock_range)
> > 
> > gpt_lock_fault(lock_range) // User can fault in pte but he is
> > responsible for avoiding thread // to concurrently fault the same
> > pte and for properly accounting // the number of pte faulted in the
> > pdp structure. gpt_unlock_fault(lock_range) // The new faulted pte
> > will only be visible to others updaters only // once all concurrent
> > faulter on the address unlock.
> > 
> > Details on how the lockless concurrent updater and faulter works is
> > provided in the header file.
> > 
> > Changed since v1: - Switch to macro implementation instead of using
> > arithmetic to accomodate the various size for table entry
> > (uint64_t, unsigned long, ...). This is somewhat less flexbile but
> > right now there is no use for the extra flexibility v1 was
> > offering.
> > 
> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> 
> Never a fan of preprocessor magic, but I  see why it's needed.
> 
> Acked-by: Rik van Riel <riel@redhat.com>

v1 is not using preprocessor but has a bigger gpt struct footprint and also
more complex calculation for page table walking due to the fact that i just
rely more on runtime computation than on compile time shift define through
preprocessor magic.

Given i am not a fan either of preprocessor magic if it makes you feel any
better i can resort to use v1, both have seen same kind of testing and both
are functionaly equivalent (API they expose is obviously slightly different).

I am not convince that what the computation i save using preprocessor will
show up in anyway as being bottleneck for hot path.

Cheers,
Jérôme

> 
> 
> - -- 
> All rights reversed
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> 
> iQEcBAEBAgAGBQJUW/bgAAoJEM553pKExN6Dl6IH/i9rSRtvdO9+lf1cUe686XJb
> GZ8KOp3Qa+ac0W63NqEaY5W+Fi7qyZJdoRFLCyOHBSP44qg9yoEJz8IbdPVNRjGG
> lXyyfyOP0PY3wSakSP/IS3OIvapav6YPXiOIX7FlzPTReL+RWJPDYpmvi6S6nXgS
> PuVTedVT5yaZwcqh0CyfDZ8pQqxEBSyvdVY/ntia7hxtUk9Or/sWVaRn8RE1u6EZ
> xA5DtjqTB+UHmNtmTNe2B5i2TmvhIFYr+/ydCs76osR2e+UBcqQtnN3cdudZWyj3
> Pk1c/7qtTqgS2pdiIkpjCKH5qXIszGM6vDSGCjM/4/7afX+vjk7UQDWeXGfzQFs=
> =ndqX
> -----END PGP SIGNATURE-----

next prev parent reply	other threads:[~2014-11-06 22:41 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-03 20:42 HMM (heterogeneous memory management) v5 j.glisse
2014-11-03 20:42 ` j.glisse
2014-11-03 20:42 ` j.glisse
2014-11-03 20:42 ` [PATCH 1/5] mmu_notifier: add event information to address invalidation v5 j.glisse
2014-11-03 20:42   ` j.glisse
2014-11-06 17:16   ` Rik van Riel
2014-11-06 17:16     ` Rik van Riel
2014-11-03 20:42 ` [PATCH 2/5] mmu_notifier: keep track of active invalidation ranges j.glisse
2014-11-03 20:42   ` j.glisse
2014-11-06 21:03   ` Rik van Riel
2014-11-06 21:03     ` Rik van Riel
2014-11-03 20:42 ` [PATCH 3/5] lib: lockless generic and arch independent page table (gpt) v2 j.glisse
2014-11-03 20:42   ` j.glisse
2014-11-06 22:32   ` Rik van Riel
2014-11-06 22:32     ` Rik van Riel
2014-11-06 22:40     ` Jerome Glisse [this message]
2014-11-06 22:40       ` Jerome Glisse
2014-11-06 22:56       ` Rik van Riel
2014-11-06 22:56         ` Rik van Riel
2014-11-03 20:42 ` [PATCH 4/5] hmm: heterogeneous memory management v6 j.glisse
2014-11-03 20:42   ` j.glisse
2014-11-07 21:35   ` Rik van Riel
2014-11-07 21:35     ` Rik van Riel
2014-11-03 20:42 ` [PATCH 5/5] hmm/dummy: dummy driver to showcase the hmm api v3 j.glisse
2014-11-03 20:42   ` j.glisse
2014-11-07 21:37   ` Rik van Riel
2014-11-07 21:37     ` Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2014-11-10 18:28 HMM (heterogeneous memory management) v6 j.glisse
2014-11-10 18:28 ` [PATCH 3/5] lib: lockless generic and arch independent page table (gpt) v2 j.glisse
2014-11-10 18:28   ` j.glisse
2014-11-10 20:22   ` Linus Torvalds
2014-11-10 20:22     ` Linus Torvalds
2014-11-10 20:58     ` Jerome Glisse
2014-11-10 20:58       ` Jerome Glisse
2014-11-10 21:35       ` Linus Torvalds
2014-11-10 21:35         ` Linus Torvalds
2014-11-10 21:47         ` Linus Torvalds
2014-11-10 21:47           ` Linus Torvalds
2014-11-10 22:58           ` Jerome Glisse
2014-11-10 22:58             ` Jerome Glisse
2014-11-10 22:50         ` Jerome Glisse
2014-11-10 22:50           ` Jerome Glisse
2014-11-10 23:53           ` Linus Torvalds
2014-11-10 23:53             ` Linus Torvalds
2014-11-11  2:45             ` Jerome Glisse
2014-11-11  2:45               ` Jerome Glisse
2014-11-11  3:16               ` Linus Torvalds
2014-11-11  3:16                 ` Linus Torvalds
2014-11-11  4:19                 ` Jerome Glisse
2014-11-11  4:19                   ` Jerome Glisse
2014-11-11  4:29                 ` Linus Torvalds
2014-11-11  4:29                   ` Linus Torvalds
2014-11-11  9:59               ` Peter Zijlstra
2014-11-11  9:59                 ` Peter Zijlstra
2014-11-11 13:42                 ` Jerome Glisse
2014-11-11 13:42                   ` Jerome Glisse
2014-11-11 21:01                 ` David Airlie
2014-11-11 21:01                   ` David Airlie
2014-11-13 23:50             ` Linus Torvalds
2014-11-14  0:58               ` Kirill A. Shutemov
2014-11-14  0:58                 ` Kirill A. Shutemov
2014-11-14  1:18                 ` Linus Torvalds
2014-11-14  1:18                   ` Linus Torvalds
2014-11-14  1:50                   ` Linus Torvalds
2014-11-14  1:50                     ` Linus Torvalds
2014-11-13 16:07     ` Rik van Riel
2014-11-13 16:07       ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141106224051.GA6877@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Greg.Stoner@amd.com \
    --cc=John.Bridgman@amd.com \
    --cc=Laurent.Morichetti@amd.com \
    --cc=Michael.Mantor@amd.com \
    --cc=Oded.Gabbay@amd.com \
    --cc=Paul.Blinzer@amd.com \
    --cc=SCheung@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arvindg@nvidia.com \
    --cc=ben.sander@amd.com \
    --cc=blc@redhat.com \
    --cc=cabuschardt@nvidia.com \
    --cc=dpoole@nvidia.com \
    --cc=hpa@zytor.com \
    --cc=jdonohue@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jweiner@redhat.com \
    --cc=ldunning@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liranl@mellanox.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhairgrove@nvidia.com \
    --cc=peterz@infradead.org \
    --cc=raindel@mellanox.com \
    --cc=riel@redhat.com \
    --cc=roland@purestorage.com \
    --cc=sgutti@nvidia.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.