From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Dan Magenheimer <dan.magenheimer@oracle.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Nitin Gupta <ngupta@vflare.org>, Minchan Kim <minchan@kernel.org>,
Robert Jennings <rcj@linux.vnet.ibm.com>,
Jenifer Hopper <jhopper@us.ibm.com>, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <jweiner@redhat.com>,
Rik van Riel <riel@redhat.com>,
Larry Woodman <lwoodman@redhat.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Dave Hansen <dave@sr71.net>, Joe Perches <joe@perches.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Cody P Schafer <cody@linux.vnet.ibm.com>,
Hugh Dickens <hughd@google.com>,
Paul Mackerras <paulus@samba.org>,
Heesub Shin <heesub.shin@samsung.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
devel@driverdev.osuosl.org
Subject: Re: [PATCHv12 2/4] zbud: add to mm/
Date: Mon, 3 Jun 2013 09:48:43 -0400 [thread overview]
Message-ID: <20130603134843.GO6893@phenom.dumpdata.com> (raw)
In-Reply-To: <20130530174344.GA15837@medulla>
On Thu, May 30, 2013 at 12:43:44PM -0500, Seth Jennings wrote:
> On Wed, May 29, 2013 at 02:29:04PM -0700, Andrew Morton wrote:
> > On Wed, 29 May 2013 14:09:02 -0700 (PDT) Dan Magenheimer <dan.magenheimer@oracle.com> wrote:
> >
> > > > memory_failure() is merely an example of a general problem: code which
> > > > reads from the memmap[] array and expects its elements to be of type
> > > > `struct page'. Other examples might be memory hotplugging, memory leak
> > > > checkers etc. I have vague memories of out-of-tree patches
> > > > (bigphysarea?) doing this as well.
> > > >
> > > > It's a general problem to which we need a general solution.
> > >
> > > <Obi-tmem Kenobe slowly materializes... "use the force, Luke!">
> > >
> > > One could reasonably argue that any code that makes incorrect
> > > assumptions about the contents of a struct page structure is buggy
> > > and should be fixed.
> >
> > Well it has type "struct page" and all code has a right to expect the
> > contents to match that type.
> >
> > > Isn't the "general solution" already described
> > > in the following comment, excerpted from include/linux/mm.h, which
> > > implies that "scribbling on existing pageframes" [carefully], is fine?
> > > (And, if not, shouldn't that comment be fixed, or am I misreading
> > > it?)
> > >
> > > <start excerpt>
> > > * For the non-reserved pages, page_count(page) denotes a reference count.
> > > * page_count() == 0 means the page is free. page->lru is then used for
> > > * freelist management in the buddy allocator.
> > > * page_count() > 0 means the page has been allocated.
> >
> > Well kinda maybe. How all the random memmap-peekers handle this I do
> > not know. Setting PageReserved is a big hammer which should keep other
> > little paws out of there, although I guess it's abusive of whatever
> > PageReserved is supposed to mean.
> >
> > It's what we used to call a variant record. The tag is page.flags and
> > the protocol is, umm,
> >
> > PageReserved: doesn't refer to a page at all - don't touch
> > PageSlab: belongs to slab or slub
> > !PageSlab: regular kernel/user/pagecache page
>
> In the !PageSlab case, the page _count has to be considered to determine if the
> page is a free page or if it is an allocated non-slab page.
>
> So looking at the fields that need to remained untouched in the struct page for
> the memmap-peekers, they are
> - page->flags
> - page->_count
>
> Is this correct?
>
> >
> > Are there any more?
> >
> > So what to do here? How about
> >
> > - Position the zbud fields within struct page via the preferred
> > means: editing its definition.
> >
> > - Decide upon and document the means by which the zbud variant is tagged
>
> I'm not sure if there is going to be a way to tag zbud pages in particular
> without using a page flag. However, if we can tag it as a non-slab allocated
> kernel page with no userspace mappings, that could be sufficient. I think this
> can be done with:
>
> !PageSlab(p) && page_count(p) > 0 && page_mapcount(p) <= 0
>
> An alternative is to set PG_slab for zbud pages then we get all the same
> treatment as slab pages, which is basically what we want. Setting PG_slab
> also conveys that no assumption can be made about the contents of _mapcount.
>
> However, a memmap-peeker could call slab functions on the page which obviously
> won't be under the control of the slab allocator. Afaict though, it doesn't
> seem that any of them do this since there aren't any functions in the slab
> allocator API that take raw struct pages. The worst I've seen is calling
> shrink_slab in an effort to get the slab allocator to free up the page.
And does it error out properly on non-slab-pages-but-have-PG_slab set?
>
> In summary, I think that maintaining a positive page->_count and setting
> PG_slab on zbud pages should provide safety against existing memmap-peekers.
>
> Do you agree?
The page_>_count will thwart memmap-peeker from fiddling around right?
>
> Seth
>
> >
> > - Demonstrate how this is safe against existing memmap-peekers
> >
> > - Do all this without consuming another page flag :)
> >
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Dan Magenheimer <dan.magenheimer@oracle.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Nitin Gupta <ngupta@vflare.org>, Minchan Kim <minchan@kernel.org>,
Robert Jennings <rcj@linux.vnet.ibm.com>,
Jenifer Hopper <jhopper@us.ibm.com>, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <jweiner@redhat.com>,
Rik van Riel <riel@redhat.com>,
Larry Woodman <lwoodman@redhat.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Dave Hansen <dave@sr71.net>, Joe Perches <joe@perches.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Cody P Schafer <cody@linux.vnet.ibm.com>,
Hugh Dickens <hughd@google.com>,
Paul Mackerras <paulus@samba.org>,
Heesub Shin <heesub.shin@samsung.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
devel@driverdev.osuosl.org
Subject: Re: [PATCHv12 2/4] zbud: add to mm/
Date: Mon, 3 Jun 2013 09:48:43 -0400 [thread overview]
Message-ID: <20130603134843.GO6893@phenom.dumpdata.com> (raw)
In-Reply-To: <20130530174344.GA15837@medulla>
On Thu, May 30, 2013 at 12:43:44PM -0500, Seth Jennings wrote:
> On Wed, May 29, 2013 at 02:29:04PM -0700, Andrew Morton wrote:
> > On Wed, 29 May 2013 14:09:02 -0700 (PDT) Dan Magenheimer <dan.magenheimer@oracle.com> wrote:
> >
> > > > memory_failure() is merely an example of a general problem: code which
> > > > reads from the memmap[] array and expects its elements to be of type
> > > > `struct page'. Other examples might be memory hotplugging, memory leak
> > > > checkers etc. I have vague memories of out-of-tree patches
> > > > (bigphysarea?) doing this as well.
> > > >
> > > > It's a general problem to which we need a general solution.
> > >
> > > <Obi-tmem Kenobe slowly materializes... "use the force, Luke!">
> > >
> > > One could reasonably argue that any code that makes incorrect
> > > assumptions about the contents of a struct page structure is buggy
> > > and should be fixed.
> >
> > Well it has type "struct page" and all code has a right to expect the
> > contents to match that type.
> >
> > > Isn't the "general solution" already described
> > > in the following comment, excerpted from include/linux/mm.h, which
> > > implies that "scribbling on existing pageframes" [carefully], is fine?
> > > (And, if not, shouldn't that comment be fixed, or am I misreading
> > > it?)
> > >
> > > <start excerpt>
> > > * For the non-reserved pages, page_count(page) denotes a reference count.
> > > * page_count() == 0 means the page is free. page->lru is then used for
> > > * freelist management in the buddy allocator.
> > > * page_count() > 0 means the page has been allocated.
> >
> > Well kinda maybe. How all the random memmap-peekers handle this I do
> > not know. Setting PageReserved is a big hammer which should keep other
> > little paws out of there, although I guess it's abusive of whatever
> > PageReserved is supposed to mean.
> >
> > It's what we used to call a variant record. The tag is page.flags and
> > the protocol is, umm,
> >
> > PageReserved: doesn't refer to a page at all - don't touch
> > PageSlab: belongs to slab or slub
> > !PageSlab: regular kernel/user/pagecache page
>
> In the !PageSlab case, the page _count has to be considered to determine if the
> page is a free page or if it is an allocated non-slab page.
>
> So looking at the fields that need to remained untouched in the struct page for
> the memmap-peekers, they are
> - page->flags
> - page->_count
>
> Is this correct?
>
> >
> > Are there any more?
> >
> > So what to do here? How about
> >
> > - Position the zbud fields within struct page via the preferred
> > means: editing its definition.
> >
> > - Decide upon and document the means by which the zbud variant is tagged
>
> I'm not sure if there is going to be a way to tag zbud pages in particular
> without using a page flag. However, if we can tag it as a non-slab allocated
> kernel page with no userspace mappings, that could be sufficient. I think this
> can be done with:
>
> !PageSlab(p) && page_count(p) > 0 && page_mapcount(p) <= 0
>
> An alternative is to set PG_slab for zbud pages then we get all the same
> treatment as slab pages, which is basically what we want. Setting PG_slab
> also conveys that no assumption can be made about the contents of _mapcount.
>
> However, a memmap-peeker could call slab functions on the page which obviously
> won't be under the control of the slab allocator. Afaict though, it doesn't
> seem that any of them do this since there aren't any functions in the slab
> allocator API that take raw struct pages. The worst I've seen is calling
> shrink_slab in an effort to get the slab allocator to free up the page.
And does it error out properly on non-slab-pages-but-have-PG_slab set?
>
> In summary, I think that maintaining a positive page->_count and setting
> PG_slab on zbud pages should provide safety against existing memmap-peekers.
>
> Do you agree?
The page_>_count will thwart memmap-peeker from fiddling around right?
>
> Seth
>
> >
> > - Demonstrate how this is safe against existing memmap-peekers
> >
> > - Do all this without consuming another page flag :)
> >
>
next prev parent reply other threads:[~2013-06-03 13:49 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-20 16:26 [PATCHv12 0/4] zswap: compressed swap caching Seth Jennings
2013-05-20 16:26 ` Seth Jennings
2013-05-20 16:26 ` [PATCHv12 1/4] debugfs: add get/set for atomic types Seth Jennings
2013-05-20 16:26 ` Seth Jennings
2013-05-29 17:42 ` Konrad Rzeszutek Wilk
2013-05-29 17:42 ` Konrad Rzeszutek Wilk
2013-05-20 16:26 ` [PATCHv12 2/4] zbud: add to mm/ Seth Jennings
2013-05-20 16:26 ` Seth Jennings
2013-05-21 3:37 ` Bob Liu
2013-05-21 3:37 ` Bob Liu
2013-05-28 21:59 ` Andrew Morton
2013-05-28 21:59 ` Andrew Morton
2013-05-29 15:45 ` Seth Jennings
2013-05-29 15:45 ` Seth Jennings
2013-05-29 18:34 ` Andrew Morton
2013-05-29 18:34 ` Andrew Morton
2013-05-29 20:42 ` Seth Jennings
2013-05-29 20:42 ` Seth Jennings
2013-05-29 20:48 ` Andrew Morton
2013-05-29 20:48 ` Andrew Morton
2013-05-29 21:09 ` Dan Magenheimer
2013-05-29 21:09 ` Dan Magenheimer
2013-05-29 21:29 ` Andrew Morton
2013-05-29 21:29 ` Andrew Morton
2013-05-30 17:43 ` Seth Jennings
2013-05-30 17:43 ` Seth Jennings
2013-05-30 21:20 ` Seth Jennings
2013-05-30 21:20 ` Seth Jennings
2013-05-31 1:48 ` Bob Liu
2013-05-31 1:48 ` Bob Liu
2013-06-03 13:48 ` Konrad Rzeszutek Wilk [this message]
2013-06-03 13:48 ` Konrad Rzeszutek Wilk
2013-05-29 20:45 ` Seth Jennings
2013-05-29 20:45 ` Seth Jennings
2013-05-20 16:26 ` [PATCHv12 3/4] zswap: " Seth Jennings
2013-05-20 16:26 ` Seth Jennings
2013-05-21 3:31 ` Bob Liu
2013-05-21 3:31 ` Bob Liu
2013-05-28 21:59 ` Andrew Morton
2013-05-28 21:59 ` Andrew Morton
2013-05-29 14:57 ` Seth Jennings
2013-05-29 14:57 ` Seth Jennings
2013-05-29 18:29 ` Andrew Morton
2013-05-29 18:29 ` Andrew Morton
2013-05-29 19:50 ` Seth Jennings
2013-05-29 19:50 ` Seth Jennings
2013-05-29 19:57 ` Andrew Morton
2013-05-29 19:57 ` Andrew Morton
2013-05-29 21:08 ` Seth Jennings
2013-05-29 21:08 ` Seth Jennings
2013-05-29 21:16 ` Andrew Morton
2013-05-29 21:16 ` Andrew Morton
2013-05-20 16:26 ` [PATCHv12 4/4] zswap: add documentation Seth Jennings
2013-05-20 16:26 ` Seth Jennings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130603134843.GO6893@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=cody@linux.vnet.ibm.com \
--cc=dan.magenheimer@oracle.com \
--cc=dave@sr71.net \
--cc=devel@driverdev.osuosl.org \
--cc=gregkh@linuxfoundation.org \
--cc=heesub.shin@samsung.com \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=jhopper@us.ibm.com \
--cc=joe@perches.com \
--cc=jweiner@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=minchan@kernel.org \
--cc=ngupta@vflare.org \
--cc=paulus@samba.org \
--cc=rcj@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=sjenning@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.