From: kirill@shutemov.name (Kirill A. Shutemov)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC V2] mm:add zero_page _mapcount when mapped into user space
Date: Thu, 4 Dec 2014 14:28:13 +0200 [thread overview]
Message-ID: <20141204122813.GA523@node.dhcp.inet.fi> (raw)
In-Reply-To: <35FD53F367049845BC99AC72306C23D103E688B313E6@CNBJMBX05.corpusers.net>
On Thu, Dec 04, 2014 at 02:10:53PM +0800, Wang, Yalin wrote:
> > -----Original Message-----
> > From: Kirill A. Shutemov [mailto:kirill at shutemov.name]
> > Sent: Tuesday, December 02, 2014 7:30 PM
> > To: Wang, Yalin
> > Cc: 'linux-kernel at vger.kernel.org'; 'linux-mm at kvack.org'; 'linux-arm-
> > kernel at lists.infradead.org'
> > Subject: Re: [RFC V2] mm:add zero_page _mapcount when mapped into user
> > space
> >
> > On Tue, Dec 02, 2014 at 05:27:36PM +0800, Wang, Yalin wrote:
> > > This patch add/dec zero_page's _mapcount to make sure the mapcount is
> > > correct for zero_page, so that when read from /proc/kpagecount,
> > > zero_page's mapcount is also correct, userspace process like procrank
> > > can calculate PSS correctly.
> >
> > I don't have specific code path to point to, but I would expect zero page
> > with non-zero mapcount would cause a problem with rmap.
> >
> > How do you test the change?
> >
> I just test it to see the mapcount from /proc/pid/pagemap and /proc/kpagecount ,
> It works well,
I took a closer look and your patch is broken in multiple places:
- on zap_pte_range() you don't decrement mapcount;
- you don't update rss counters for mm;
- copy_one_pte() doesn't increase mapcount;
- ...
Basically, each and every vm_normal_page() call must be audited. As first
step. And you totally skip huge zero page.
Proper mapcount handling for zero page would require a lot more work and I
don't think it worth it. Gain is too small.
NAK.
> The problem is that when I see /proc/pid/smaps ,
> The Rss / Pss don't calculate zero_page map,
> Because smaps_pte_entry() --> vm_normal_page( ),
> Will return NULL for zero_page,
>
> But when userspace process cat /proc/pid/pagemap ,
> It will see zero_page mapped,
> And will treat as Rss ,
> This is weird, should we also omit zero_page in /proc/pid/pagemap ?
> Or add zero_page as Rss in /proc/pid/smaps ?
>
> I think we should add zero_page into Rss ,
> Because it is really mapped into userspace address space.
> And will let userspace memory analysis more accurate .
It would be easier for userspace to find out pfn of zero page and take it
into account.
Note: some architectures have multiple zero page due to coloring.
--
Kirill A. Shutemov
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: "Wang, Yalin" <Yalin.Wang@sonymobile.com>
Cc: "'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
"'linux-mm@kvack.org'" <linux-mm@kvack.org>,
"'linux-arm-kernel@lists.infradead.org'"
<linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC V2] mm:add zero_page _mapcount when mapped into user space
Date: Thu, 4 Dec 2014 14:28:13 +0200 [thread overview]
Message-ID: <20141204122813.GA523@node.dhcp.inet.fi> (raw)
In-Reply-To: <35FD53F367049845BC99AC72306C23D103E688B313E6@CNBJMBX05.corpusers.net>
On Thu, Dec 04, 2014 at 02:10:53PM +0800, Wang, Yalin wrote:
> > -----Original Message-----
> > From: Kirill A. Shutemov [mailto:kirill@shutemov.name]
> > Sent: Tuesday, December 02, 2014 7:30 PM
> > To: Wang, Yalin
> > Cc: 'linux-kernel@vger.kernel.org'; 'linux-mm@kvack.org'; 'linux-arm-
> > kernel@lists.infradead.org'
> > Subject: Re: [RFC V2] mm:add zero_page _mapcount when mapped into user
> > space
> >
> > On Tue, Dec 02, 2014 at 05:27:36PM +0800, Wang, Yalin wrote:
> > > This patch add/dec zero_page's _mapcount to make sure the mapcount is
> > > correct for zero_page, so that when read from /proc/kpagecount,
> > > zero_page's mapcount is also correct, userspace process like procrank
> > > can calculate PSS correctly.
> >
> > I don't have specific code path to point to, but I would expect zero page
> > with non-zero mapcount would cause a problem with rmap.
> >
> > How do you test the change?
> >
> I just test it to see the mapcount from /proc/pid/pagemap and /proc/kpagecount ,
> It works well,
I took a closer look and your patch is broken in multiple places:
- on zap_pte_range() you don't decrement mapcount;
- you don't update rss counters for mm;
- copy_one_pte() doesn't increase mapcount;
- ...
Basically, each and every vm_normal_page() call must be audited. As first
step. And you totally skip huge zero page.
Proper mapcount handling for zero page would require a lot more work and I
don't think it worth it. Gain is too small.
NAK.
> The problem is that when I see /proc/pid/smaps ,
> The Rss / Pss don't calculate zero_page map,
> Because smaps_pte_entry() --> vm_normal_page( ),
> Will return NULL for zero_page,
>
> But when userspace process cat /proc/pid/pagemap ,
> It will see zero_page mapped,
> And will treat as Rss ,
> This is weird, should we also omit zero_page in /proc/pid/pagemap ?
> Or add zero_page as Rss in /proc/pid/smaps ?
>
> I think we should add zero_page into Rss ,
> Because it is really mapped into userspace address space.
> And will let userspace memory analysis more accurate .
It would be easier for userspace to find out pfn of zero page and take it
into account.
Note: some architectures have multiple zero page due to coloring.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: "Wang, Yalin" <Yalin.Wang@sonymobile.com>
Cc: "'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
"'linux-mm@kvack.org'" <linux-mm@kvack.org>,
"'linux-arm-kernel@lists.infradead.org'"
<linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC V2] mm:add zero_page _mapcount when mapped into user space
Date: Thu, 4 Dec 2014 14:28:13 +0200 [thread overview]
Message-ID: <20141204122813.GA523@node.dhcp.inet.fi> (raw)
In-Reply-To: <35FD53F367049845BC99AC72306C23D103E688B313E6@CNBJMBX05.corpusers.net>
On Thu, Dec 04, 2014 at 02:10:53PM +0800, Wang, Yalin wrote:
> > -----Original Message-----
> > From: Kirill A. Shutemov [mailto:kirill@shutemov.name]
> > Sent: Tuesday, December 02, 2014 7:30 PM
> > To: Wang, Yalin
> > Cc: 'linux-kernel@vger.kernel.org'; 'linux-mm@kvack.org'; 'linux-arm-
> > kernel@lists.infradead.org'
> > Subject: Re: [RFC V2] mm:add zero_page _mapcount when mapped into user
> > space
> >
> > On Tue, Dec 02, 2014 at 05:27:36PM +0800, Wang, Yalin wrote:
> > > This patch add/dec zero_page's _mapcount to make sure the mapcount is
> > > correct for zero_page, so that when read from /proc/kpagecount,
> > > zero_page's mapcount is also correct, userspace process like procrank
> > > can calculate PSS correctly.
> >
> > I don't have specific code path to point to, but I would expect zero page
> > with non-zero mapcount would cause a problem with rmap.
> >
> > How do you test the change?
> >
> I just test it to see the mapcount from /proc/pid/pagemap and /proc/kpagecount ,
> It works well,
I took a closer look and your patch is broken in multiple places:
- on zap_pte_range() you don't decrement mapcount;
- you don't update rss counters for mm;
- copy_one_pte() doesn't increase mapcount;
- ...
Basically, each and every vm_normal_page() call must be audited. As first
step. And you totally skip huge zero page.
Proper mapcount handling for zero page would require a lot more work and I
don't think it worth it. Gain is too small.
NAK.
> The problem is that when I see /proc/pid/smaps ,
> The Rss / Pss don't calculate zero_page map,
> Because smaps_pte_entry() --> vm_normal_page( ),
> Will return NULL for zero_page,
>
> But when userspace process cat /proc/pid/pagemap ,
> It will see zero_page mapped,
> And will treat as Rss ,
> This is weird, should we also omit zero_page in /proc/pid/pagemap ?
> Or add zero_page as Rss in /proc/pid/smaps ?
>
> I think we should add zero_page into Rss ,
> Because it is really mapped into userspace address space.
> And will let userspace memory analysis more accurate .
It would be easier for userspace to find out pfn of zero page and take it
into account.
Note: some architectures have multiple zero page due to coloring.
--
Kirill A. Shutemov
next prev parent reply other threads:[~2014-12-04 12:28 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-02 9:27 [RFC V2] mm:add zero_page _mapcount when mapped into user space Wang, Yalin
2014-12-02 9:27 ` Wang, Yalin
2014-12-02 9:27 ` Wang, Yalin
2014-12-02 11:30 ` Kirill A. Shutemov
2014-12-02 11:30 ` Kirill A. Shutemov
2014-12-02 11:30 ` Kirill A. Shutemov
2014-12-04 6:10 ` Wang, Yalin
2014-12-04 6:10 ` Wang, Yalin
2014-12-04 6:10 ` Wang, Yalin
2014-12-04 12:28 ` Kirill A. Shutemov [this message]
2014-12-04 12:28 ` Kirill A. Shutemov
2014-12-04 12:28 ` Kirill A. Shutemov
2014-12-05 6:39 ` Konstantin Khlebnikov
2014-12-05 6:39 ` Konstantin Khlebnikov
2014-12-05 6:39 ` Konstantin Khlebnikov
2014-12-05 8:08 ` Wang, Yalin
2014-12-05 8:08 ` Wang, Yalin
2014-12-05 8:08 ` Wang, Yalin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141204122813.GA523@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.