From: Jerome Glisse <j.glisse@gmail.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Boaz Harrosh <boaz@plexistor.com>, Jan Kara <jack@suse.cz>,
Mike Snitzer <snitzer@redhat.com>, Neil Brown <neilb@suse.de>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Chris Mason <clm@fb.com>, Paul Mackerras <paulus@samba.org>,
"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
Alasdair Kergon <agk@redhat.com>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
Mel Gorman <mgorman@suse.de>,
Matthew Wilcox <willy@linux.intel.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
Rik van Riel <riel@redhat.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Jens Axboe <axboe@kernel.dk>, Theodore Ts'o <tytso@mit.edu>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Julia Lawall <Julia.Lawall@lip6.fr>, Tejun Heo <tj@kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
paulmck@linux.vnet.ibm.com
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
Date: Thu, 7 May 2015 16:18:17 -0400 [thread overview]
Message-ID: <20150507201815.GD5966@gmail.com> (raw)
In-Reply-To: <20150507195313.GA23597@gmail.com>
On Thu, May 07, 2015 at 09:53:13PM +0200, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@kernel.org> wrote:
>
> > > Is handling kernel pagefault on the vmemmap completely out of the
> > > picture ? So we would carveout a chunck of kernel address space
> > > for those pfn and use it for vmemmap and handle pagefault on it.
> >
> > That's pretty clever. The page fault doesn't even have to do remote
> > TLB shootdown, because it only establishes mappings - so it's pretty
> > atomic, a bit like the minor vmalloc() area faults we are doing.
> >
> > Some sort of LRA (least recently allocated) scheme could unmap the
> > area in chunks if it's beyond a certain size, to keep a limit on
> > size. Done from the same context and would use remote TLB shootdown.
> >
> > The only limitation I can see is that such faults would have to be
> > able to sleep, to do the allocation. So pfn_to_page() could not be
> > used in arbitrary contexts.
>
> So another complication would be that we cannot just unmap such pages
> when we want to recycle them, because the struct page in them might be
> in use - so all struct page uses would have to refcount the underlying
> page. We don't really do that today: code just looks up struct pages
> and assumes they never go away.
I still think this is doable, like i said in another email, i think we
should introduce a special pfn_to_page_dev|pmem|waffle|somethingyoulike()
to place that are allowed to allocate the underlying struct page.
For instance we can use a default page to backup all this special vmem
range with some specialy crafted struct page that says that its is
invalid memory (make this mapping read only so all write to this
special struct page is forbidden).
Now once an authorized user comes along and need a real struct page it
trigger a page allocation that replace the page full of fake invalid
struct page with a page with correct valid struct page that can be
manipulated by other part of the kernel.
So regular pfn_to_page() would test against special vmemmap and if
special test the content of struct page for some flag. If it's the
invalid page flag it returns 0.
But once a proper struct page is allocated then pfn_page would return
the struct page as expected.
That way you will catch all invalid user of such page ie user that use
the page after its lifetime is done. You will also limit the creation
of the underlying proper struct page to only code that are legitimate
to ask for a proper struct page for given pfn.
Also you would get kernel write fault on the page full of fake struct
page and that would allow to catch further wrong use.
Anyway this is how i envision this and i think it would work for my
usecase too (GPU it is for me :))
Cheers,
J�r�me
next prev parent reply other threads:[~2015-05-07 20:18 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-06 20:04 [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Dan Williams
2015-05-06 20:04 ` [PATCH v2 01/10] arch: introduce __pfn_t for persistent memory i/o Dan Williams
2015-05-07 14:55 ` Stephen Rothwell
2015-05-08 0:21 ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 02/10] block: add helpers for accessing a bio_vec page Dan Williams
2015-05-08 15:59 ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 03/10] block: convert .bv_page to .bv_pfn bio_vec Dan Williams
2015-05-06 20:05 ` [PATCH v2 04/10] dma-mapping: allow archs to optionally specify a ->map_pfn() operation Dan Williams
2015-05-06 20:05 ` [PATCH v2 05/10] scatterlist: use sg_phys() Dan Williams
2015-05-06 20:05 ` [PATCH v2 06/10] scatterlist: support "page-less" (__pfn_t only) entries Dan Williams
2015-05-06 20:05 ` [PATCH v2 07/10] x86: support dma_map_pfn() Dan Williams
2015-05-06 20:05 ` [PATCH v2 08/10] x86: support kmap_atomic_pfn_t() for persistent memory Dan Williams
2015-05-06 20:20 ` [Linux-nvdimm] " Dan Williams
2015-05-06 20:05 ` [PATCH v2 09/10] dax: convert to __pfn_t Dan Williams
2015-05-06 20:05 ` [PATCH v2 10/10] block: base support for pfn i/o Dan Williams
2015-05-06 22:10 ` [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Linus Torvalds
2015-05-06 23:47 ` Dan Williams
2015-05-07 0:19 ` Linus Torvalds
2015-05-07 2:36 ` Dan Williams
2015-05-07 9:02 ` Ingo Molnar
2015-05-07 14:42 ` Ingo Molnar
2015-05-07 15:52 ` Dan Williams
2015-05-07 17:52 ` Ingo Molnar
2015-05-07 15:00 ` Linus Torvalds
2015-05-07 15:40 ` Dan Williams
2015-05-07 15:58 ` Linus Torvalds
2015-05-07 16:03 ` Dan Williams
2015-05-07 17:36 ` Ingo Molnar
2015-05-07 17:42 ` Dan Williams
2015-05-07 17:56 ` Dave Hansen
2015-05-07 19:11 ` Ingo Molnar
2015-05-07 19:36 ` Jerome Glisse
2015-05-07 19:48 ` Ingo Molnar
2015-05-07 19:53 ` Ingo Molnar
2015-05-07 20:18 ` Jerome Glisse [this message]
2015-05-08 5:37 ` Ingo Molnar
2015-05-08 9:20 ` Al Viro
2015-05-08 9:26 ` Ingo Molnar
2015-05-08 10:00 ` Al Viro
2015-05-08 13:45 ` Rik van Riel
2015-05-08 14:05 ` Ingo Molnar
2015-05-08 14:54 ` Rik van Riel
[not found] ` <21836.51957.715473.780762@quad.stoffel.home>
2015-05-08 15:54 ` Linus Torvalds
2015-05-08 16:28 ` Al Viro
2015-05-08 16:59 ` Rik van Riel
2015-05-09 1:14 ` Linus Torvalds
2015-05-09 3:02 ` Rik van Riel
2015-05-09 3:52 ` Linus Torvalds
2015-05-09 21:56 ` Dave Chinner
2015-05-09 8:45 ` "Directly mapped persistent memory page cache" Ingo Molnar
2015-05-09 18:24 ` Dan Williams
2015-05-10 9:46 ` Ingo Molnar
2015-05-10 17:29 ` Dan Williams
[not found] ` <87r3qpyciy.fsf@x220.int.ebiederm.org>
2015-05-10 10:07 ` Ingo Molnar
2015-05-11 8:25 ` Dave Chinner
2015-05-11 9:18 ` Ingo Molnar
2015-05-11 10:12 ` Zuckerman, Boris
2015-05-11 10:38 ` Ingo Molnar
2015-05-12 0:53 ` Dave Chinner
2015-05-12 14:47 ` Jerome Glisse
2015-06-05 5:43 ` Dan Williams
2015-05-11 14:31 ` Matthew Wilcox
2015-05-11 20:01 ` Jerome Glisse
2015-05-07 17:43 ` [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Linus Torvalds
2015-05-07 20:06 ` Dan Williams
2015-05-07 16:18 ` Christoph Hellwig
2015-05-07 16:41 ` Dan Williams
2015-05-07 18:40 ` Ingo Molnar
2015-05-07 19:44 ` Dan Williams
2015-05-07 17:30 ` Jerome Glisse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150507201815.GD5966@gmail.com \
--to=j.glisse@gmail.com \
--cc=Julia.Lawall@lip6.fr \
--cc=agk@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=benh@kernel.crashing.org \
--cc=boaz@plexistor.com \
--cc=clm@fb.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=hch@lst.de \
--cc=heiko.carstens@de.ibm.com \
--cc=hpa@zytor.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=martin.petersen@oracle.com \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=neilb@suse.de \
--cc=paulmck@linux.vnet.ibm.com \
--cc=paulus@samba.org \
--cc=riel@redhat.com \
--cc=ross.zwisler@linux.intel.com \
--cc=schwidefsky@de.ibm.com \
--cc=snitzer@redhat.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=willy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).