linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <j.glisse@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, "Joerg Roedel" <joro@8bytes.org>,
	"Mel Gorman" <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Johannes Weiner" <jweiner@redhat.com>,
	"Larry Woodman" <lwoodman@redhat.com>,
	"Rik van Riel" <riel@redhat.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Brendan Conoboy" <blc@redhat.com>,
	"Joe Donohue" <jdonohue@redhat.com>,
	"Duncan Poole" <dpoole@nvidia.com>,
	"Sherry Cheung" <SCheung@nvidia.com>,
	"Subhash Gutti" <sgutti@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Mark Hairgrove" <mhairgrove@nvidia.com>,
	"Lucien Dunning" <ldunning@nvidia.com>,
	"Cameron Buschardt" <cabuschardt@nvidia.com>,
	"Arvind Gopalakrishnan" <arvindg@nvidia.com>,
	"Shachar Raindel" <raindel@mellanox.com>,
	"Liran Liss" <liranl@mellanox.com>,
	"Roland Dreier" <roland@purestorage.com>,
	"Ben Sander" <ben.sander@amd.com>,
	"Greg Stoner" <Greg.Stoner@amd.com>,
	"John Bridgman" <John.Bridgman@amd.com>,
	"Michael Mantor" <Michael.Mantor@amd.com>,
	"Paul Blinzer" <Paul.Blinzer@amd.com>,
	"Laurent Morichetti" <Laurent.Morichetti@amd.com>,
	"Alexander Deucher" <Alexander.Deucher@amd.com>,
	"Oded Gabbay" <Oded.Gabbay@amd.com>,
	"Jérôme Glisse" <jglisse@redhat.com>
Subject: Re: [PATCH 3/5] lib: lockless generic and arch independent page table (gpt) v2.
Date: Mon, 10 Nov 2014 15:58:15 -0500	[thread overview]
Message-ID: <20141110205814.GA4186@gmail.com> (raw)
In-Reply-To: <CA+55aFwHd4QYopHvd=H6hxoQeqDV3HT6=436LGU-FRb5A0p7Vg@mail.gmail.com>

On Mon, Nov 10, 2014 at 12:22:03PM -0800, Linus Torvalds wrote:
> Ok, so things are somewhat calm, and I'm trying to take time off to
> see what's going on. And I'm not happy.
> 
> On Mon, Nov 10, 2014 at 10:28 AM,  <j.glisse@gmail.com> wrote:
> >
> > Page table is a common structure format most notably use by cpu mmu. The
> > arch depend page table code has strong tie to the architecture which makes
> > it unsuitable to be use by other non arch specific code.
> 
> Please don't call this thing a "generic page table".
> 
> It is no such thing. The *real* page tables are page tables. This is
> some kind of "mapping lookup", and has nothing to do with page tables
> as far as I can see. Why do you call it a page table?

I did this because intention is to use it to implement hardware page
table for different hardware (in my case AMD, NVidia). So it would be
use for real page table just not for cpu but for gpu.

Also during Linux Plumber people working on IOMMU expressed there wish to
see some generic "page table" code that can be share among IOMMU as most
IOMMU use a page table directory hierarchy for mapping and it is not the
same as the one use by the CPU.

Those are the two main reasons why i named it page table. It simply full
fill same role as CPU page table but for other hardware block and it tries
to do it in a generic way.

> 
> Also, why isn't this just using our *existing* generic mapping
> functionality, which already uses a radix tree, and has a lot of
> lockless models? We already *have* something like that, and it's
> called a "struct address_space".
> 
> And if you *just* want the tree, why don't you use "struct radix_tree_root".

struct radix_tree_root would not have fields i need to implement a generic
"page table" as i need callback from user to build page directory entry.

> 
> And if it's generic, why do you have that odd insane conditional
> locking going on?
> 

I am not sure to which locking you are refering to here. The design is
to allow concurrent readers and faulters to operate at same time. For
this i need reader to ignore newly faulted|created directory. So during
table walk done there is a bit of trickery to achieve just that.

> In other words, looking at this, I just go "this is re-implementing
> existing models, and uses naming that is actively misleading".
> 
> I think it's actively horrible, in other words. The fact that you have
> one ACK on it already makes me go "Hmm". Is there some actual reason
> why this would be called a page table, when even your explanation very
> much clarifies that it is explicitly written to *not* be an actual
> page table.
> 
> I also find it absolutely disgusting how you use USE_SPLIT_PTE_PTLOCKS
> for this, which seems to make absolutely zero sense. So you're sharing
> the config with the *real* page tables for no reason I can see.
> 

Update to page directory are synchronize through the spinlock of each
page backing a directory this is why i rely on that option. As explained
above i am trying to adapt the design of CPU page table to other hw page
table. The only difference is that the page directory entry and the page
table entry are different from the CPU and vary from one hw to the other.

I wanted to have generic code that can accomodate different hw at runtime
and not target one specific single CPU format at build time.

> I'm also looking at the "locking". It's insane. It's wrong, and
> doesn't have any serialization. Using the bit operations for locking
> is not correct. We've gotten over that years ago.

Bit operation are not use for locking at least not for inter-thread sync.
They are use for intra-thread synchronization because walk down of one
directory often needs to go over entry of one directory several times there
is a need to remember btw of those loop which entry inside the current
directory the current thread needs to care about. All the bit operations
are use only for that. Everything else is using the struct page spinlock
or global common spinlock and atomic to keep directory page alive.

All wlock are struct local to a thread and not share.

> 
> Rik, the fact that you acked this just makes all your other ack's be
> suspect. Did you do it just because it was from Red Hat, or do you do
> it because you like seeing Acked-by's with your name?
> 
> Anyway, this gets a NAK from me. Maybe I'm missing something, but I
> think naming is supremely important, and I really don't see the point
> of this. At a minimum, it needs a *hell* of a lot more explanations
> for all it does. And quite frankly, I don't think that will be
> sufficient, since the whole "bitops for locking" looks downright
> buggy, and it's not at all clear why you want this in the first place
> as opposed to just using gang lookups on the radix trees that we
> already have, and that is well-tested and known to scale fine.
> 
> So really, it boils down to: why is this any better than radix trees
> that are well-named, tested, and work?

I hope all the above help clarify my intention and i apologize for lack
of clarity in my commit message and in the code comment. I can include
the above motivation to make this clear.

If you still dislike me reusing the page table name i am open to any
suggestion for a better name. But in my mind this is really intended to
be use to implement hw specific page table and i would like to share
the guts of it among different hw and possibly with IOMMU folks too.

Thanks for taking time to look at this, much appreciated.

Cheers,
Jerome

> 
>                 Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-11-10 20:58 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-10 18:28 HMM (heterogeneous memory management) v6 j.glisse
2014-11-10 18:28 ` [PATCH 1/5] mmu_notifier: add event information to address invalidation v6 j.glisse
2014-11-10 18:28 ` [PATCH 2/5] mmu_notifier: keep track of active invalidation ranges v2 j.glisse
2014-11-10 18:28 ` [PATCH 3/5] lib: lockless generic and arch independent page table (gpt) v2 j.glisse
2014-11-10 20:22   ` Linus Torvalds
2014-11-10 20:58     ` Jerome Glisse [this message]
2014-11-10 21:35       ` Linus Torvalds
2014-11-10 21:47         ` Linus Torvalds
2014-11-10 22:58           ` Jerome Glisse
2014-11-10 22:50         ` Jerome Glisse
2014-11-10 23:53           ` Linus Torvalds
2014-11-11  2:45             ` Jerome Glisse
2014-11-11  3:16               ` Linus Torvalds
2014-11-11  4:19                 ` Jerome Glisse
2014-11-11  4:29                 ` Linus Torvalds
2014-11-11  9:59               ` Peter Zijlstra
2014-11-11 13:42                 ` Jerome Glisse
2014-11-11 21:01                 ` David Airlie
2014-11-13 23:50             ` Linus Torvalds
2014-11-14  0:58               ` Kirill A. Shutemov
2014-11-14  1:18                 ` Linus Torvalds
2014-11-14  1:50                   ` Linus Torvalds
2014-11-13 16:07     ` Rik van Riel
2014-11-10 18:28 ` [PATCH 4/5] hmm: heterogeneous memory management v6 j.glisse
2014-11-11 19:00 ` HMM (heterogeneous memory management) v6 Christoph Lameter
2014-11-12 20:09   ` Jerome Glisse
2014-11-12 23:08     ` Christoph Lameter
2014-11-13  4:28       ` Jerome Glisse
  -- strict thread matches above, loose matches on Subject: below --
2014-11-03 20:42 HMM (heterogeneous memory management) v5 j.glisse
2014-11-03 20:42 ` [PATCH 3/5] lib: lockless generic and arch independent page table (gpt) v2 j.glisse
2014-11-06 22:32   ` Rik van Riel
2014-11-06 22:40     ` Jerome Glisse
2014-11-06 22:56       ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141110205814.GA4186@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Greg.Stoner@amd.com \
    --cc=John.Bridgman@amd.com \
    --cc=Laurent.Morichetti@amd.com \
    --cc=Michael.Mantor@amd.com \
    --cc=Oded.Gabbay@amd.com \
    --cc=Paul.Blinzer@amd.com \
    --cc=SCheung@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arvindg@nvidia.com \
    --cc=ben.sander@amd.com \
    --cc=blc@redhat.com \
    --cc=cabuschardt@nvidia.com \
    --cc=dpoole@nvidia.com \
    --cc=hpa@zytor.com \
    --cc=jdonohue@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jweiner@redhat.com \
    --cc=ldunning@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liranl@mellanox.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhairgrove@nvidia.com \
    --cc=peterz@infradead.org \
    --cc=raindel@mellanox.com \
    --cc=riel@redhat.com \
    --cc=roland@purestorage.com \
    --cc=sgutti@nvidia.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).