All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [RFC][PATCH] proc: export more page flags in /proc/kpageflags
Date: Thu, 16 Apr 2009 11:49:18 +0800	[thread overview]
Message-ID: <20090416034918.GB20162@localhost> (raw)
In-Reply-To: <20090416111108.AC55.A69D9226@jp.fujitsu.com>

On Thu, Apr 16, 2009 at 10:26:51AM +0800, KOSAKI Motohiro wrote:
> tatus: RO
> Content-Length: 13245
> Lines: 380
> 
> Hi
> 
> > > > > On Tue, Apr 14, 2009 at 12:37:10PM +0800, KOSAKI Motohiro wrote:
> > > > > > > Export the following page flags in /proc/kpageflags,
> > > > > > > just in case they will be useful to someone:
> > > > > > >
> > > > > > > - PG_swapcache
> > > > > > > - PG_swapbacked
> > > > > > > - PG_mappedtodisk
> > > > > > > - PG_reserved
> > >
> > > PG_reserved should be exported as PG_KERNEL or somesuch.
> >
> > PG_KERNEL could be misleading. PG_reserved obviously do not cover all
> > (or most) kernel pages. So I'd prefer to export PG_reserved as it is.
> >
> > It seems that the vast amount of free pages are marked PG_reserved:
> 
> Can I review the document at first?
> if no good document for administrator, I can't ack exposing PG_reserved.

btw, is this the expected behavior to mark so many free pages as PG_reserved?
Last time I looked at it, in 2.6.27, the free pages simply don't have
any flags set.

//Or maybe it's a false reporting of my tool. Will double check.

> > # uname -a
> > Linux hp 2.6.30-rc2 #157 SMP Wed Apr 15 19:37:49 CST 2009 x86_64 GNU/Linux
> > # echo 1 > /proc/sys/vm/drop_caches
> > # ./page-types
> >    flags        page-count       MB  symbolic-flags             long-symbolic-flags
> > 0x004000            497474     1943  ______________r_____       reserved
> > 0x008000              4454       17  _______________o____       compound
> > 0x008014                 5        0  __R_D__________o____       referenced,dirty,compound
> > 0x000020                 1        0  _____l______________       lru
> > 0x000028               310        1  ___U_l______________       uptodate,lru
> > 0x00002c                18        0  __RU_l______________       referenced,uptodate,lru
> > 0x000068                80        0  ___U_lA_____________       uptodate,lru,active
> > 0x00006c               157        0  __RU_lA_____________       referenced,uptodate,lru,active
> > 0x002078                 1        0  ___UDlA______b______       uptodate,dirty,lru,active,swapbacked
> > 0x00207c                17        0  __RUDlA______b______       referenced,uptodate,dirty,lru,active,swapbacked
> > 0x000228                13        0  ___U_l___x__________       uptodate,lru,reclaim
> > 0x000400              2085        8  __________B_________       buddy
> 
> "freed" is better?
> buddy is implementation technique name.

Not compellingly better :-)  I'd expect BUDDY to be a well recognized
technique, something close to LRU.  PG_BUDDY could be documented as:
this page is owned by the buddy system, which manages free memory.

PG_FREED may seem more newbie friendly, but there will be the classical
newbie question: "Why so few freed pages?!" ;-)

It's not likely that an administrator not understanding BUDDY will
understand many of the other exported page flags. He will have to
query the document anyway.  And exporting PG_buddy as it is could
be the best option for proficient users.

> > 0x000804                 1        0  __R________m________       referenced,mmap
> > 0x002808                10        0  ___U_______m_b______       uptodate,mmap,swapbacked
> > 0x000828              1060        4  ___U_l_____m________       uptodate,lru,mmap
> > 0x00082c               215        0  __RU_l_____m________       referenced,uptodate,lru,mmap
> > 0x000868               189        0  ___U_lA____m________       uptodate,lru,active,mmap
> > 0x002868              4187       16  ___U_lA____m_b______       uptodate,lru,active,mmap,swapbacked
> > 0x00286c                30        0  __RU_lA____m_b______       referenced,uptodate,lru,active,mmap,swapbacked
> > 0x00086c              1012        3  __RU_lA____m________       referenced,uptodate,lru,active,mmap
> > 0x002878                 3        0  ___UDlA____m_b______       uptodate,dirty,lru,active,mmap,swapbacked
> > 0x008880               936        3  _______S___m___o____       slab,mmap,compound
> > 0x000880              1602        6  _______S___m________       slab,mmap
> 
> please don't display mmap and coumpound. it expose SLUB implentation detail.
> IOW, if slab flag on, please ignore following flags and mapcount.
>         - PG_active
>         - PG_error
>         - PG_private
>         - PG_compound
> 
> BTW, if the page don't have PG_lru, following member and flags can be used another meanings.
>         - PG_active
>         - PG_referenced
>         - page::_mapcount
>         - PG_swapbacked
>         - PG_reclaim
>         - PG_unevictable
>         - PG_mlocked
> 
> and, if the page never interact IO layer, following flags can be used another meanings.
>         - PG_uptodate
>         - PG_dirty

Good point. I also noticed many of these conditional flags.
The perceived solution would be to do some filtering if
!CONFIG_DEBUG_KERNEL, to not confuse too many administrators.
For kernel developers we want to be faithful :-)

> 
> > 0x0088c0                59        0  ______AS___m___o____       active,slab,mmap,compound
> > 0x0008c0                49        0  ______AS___m________       active,slab,mmap
> >    total            513968     2007
> 
> 
> And, PageAnon() result seems provide good information if the page stay in lru.

Good point! Will add this bit.

> > # ./page-areas 0x004000
> >     offset      len         KB
> >          0       15       60KB
> >         31        4       16KB
> >        159       97      388KB
> >       4096     2213     8852KB
> >       6899     2385     9540KB
> >       9497        3       12KB
> >       9728    14528    58112KB
> >
> > > > > > > - PG_private
> > > > > > > - PG_private_2
> > > > > > > - PG_owner_priv_1
> > > > > > >
> > > > > > > - PG_head
> > > > > > > - PG_tail
> > > > > > > - PG_compound
> > >
> > > I would combine these three into a pseudo "large page" flag.
> >
> > Very neat idea! Patch updated accordingly.
> >
> > However - one pity I observed:
> >
> > # ./page-areas 0x008000
> >     offset      len         KB
> >       3088        4       16KB
> >
> > We can no longer tell if the above line means one 4-page hugepage, or two
> > 2-page hugepages... Adding PG_COMPOUND_TAIL into the CONFIG_DEBUG_KERNEL block
> > can help kernel developers. Or will it be ever cared by administrators?
> >
> >     341196        2        8KB
> >     341202        2        8KB
> >     341262        2        8KB
> >     341272        8       32KB
> >     341296        8       32KB
> >     488448       24       96KB
> >     488490        2        8KB
> >     488496      320     1280KB
> >     488842        2        8KB
> >     488848       40      160KB
> >
> > > > > > >
> > > > > > > - PG_unevictable
> > > > > > > - PG_mlocked
> > > > > > >
> > > > > > > - PG_poison
> > >
> > > PG_poison is also useful to export. But since it depends on my
> > > patchkit I will pull a patch for that into the HWPOISON series.
> >
> > That's not a problem - since the PG_poison line is be protected by
> > #ifdef CONFIG_MEMORY_FAILURE :-)
> >
> > > > > > > - PG_unevictable
> > > > > > > - PG_mlocked
> > > >
> > > > this 9 flags shouldn't exported.
> > > > I can't imazine administrator use what purpose those flags.
> > >
> > > I think an abstraced "PG_pinned" or somesuch flag that combines
> > > page lock, unevictable, mlocked would be useful for the administrator.
> >
> > The PG_PINNED abstraction risks hiding useful information.
> > The administrator may not only care about the pinned pages,
> > but also care _why_ they are pinned, i.e. ramfs.. or mlock?
> >
> > So it might be good to export them as is, with proper document.
> >
> > Here is the v2 patch, with flags for kernel hackers numbered from 32.
> > Comments are welcome!
> 
> if you can write good document, PG_unevictable is exportable.
> but PG_mlock isn't.
> 
> that's implementation tecknique of efficient unevictable pages for mlock.
> we can change the future.

Yup. That's in line with my vague feeling. For PG_unevictable we can
say that the page is owned by the unevictable (non-)lru and not a
candidate for LRU page reclaims. But for PG_mlock it's more about an
assistant for kernel optimizations and there are no guarantees...

Thanks,
Fengguang


WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [RFC][PATCH] proc: export more page flags in /proc/kpageflags
Date: Thu, 16 Apr 2009 11:49:18 +0800	[thread overview]
Message-ID: <20090416034918.GB20162@localhost> (raw)
In-Reply-To: <20090416111108.AC55.A69D9226@jp.fujitsu.com>

On Thu, Apr 16, 2009 at 10:26:51AM +0800, KOSAKI Motohiro wrote:
> tatus: RO
> Content-Length: 13245
> Lines: 380
> 
> Hi
> 
> > > > > On Tue, Apr 14, 2009 at 12:37:10PM +0800, KOSAKI Motohiro wrote:
> > > > > > > Export the following page flags in /proc/kpageflags,
> > > > > > > just in case they will be useful to someone:
> > > > > > >
> > > > > > > - PG_swapcache
> > > > > > > - PG_swapbacked
> > > > > > > - PG_mappedtodisk
> > > > > > > - PG_reserved
> > >
> > > PG_reserved should be exported as PG_KERNEL or somesuch.
> >
> > PG_KERNEL could be misleading. PG_reserved obviously do not cover all
> > (or most) kernel pages. So I'd prefer to export PG_reserved as it is.
> >
> > It seems that the vast amount of free pages are marked PG_reserved:
> 
> Can I review the document at first?
> if no good document for administrator, I can't ack exposing PG_reserved.

btw, is this the expected behavior to mark so many free pages as PG_reserved?
Last time I looked at it, in 2.6.27, the free pages simply don't have
any flags set.

//Or maybe it's a false reporting of my tool. Will double check.

> > # uname -a
> > Linux hp 2.6.30-rc2 #157 SMP Wed Apr 15 19:37:49 CST 2009 x86_64 GNU/Linux
> > # echo 1 > /proc/sys/vm/drop_caches
> > # ./page-types
> >    flags        page-count       MB  symbolic-flags             long-symbolic-flags
> > 0x004000            497474     1943  ______________r_____       reserved
> > 0x008000              4454       17  _______________o____       compound
> > 0x008014                 5        0  __R_D__________o____       referenced,dirty,compound
> > 0x000020                 1        0  _____l______________       lru
> > 0x000028               310        1  ___U_l______________       uptodate,lru
> > 0x00002c                18        0  __RU_l______________       referenced,uptodate,lru
> > 0x000068                80        0  ___U_lA_____________       uptodate,lru,active
> > 0x00006c               157        0  __RU_lA_____________       referenced,uptodate,lru,active
> > 0x002078                 1        0  ___UDlA______b______       uptodate,dirty,lru,active,swapbacked
> > 0x00207c                17        0  __RUDlA______b______       referenced,uptodate,dirty,lru,active,swapbacked
> > 0x000228                13        0  ___U_l___x__________       uptodate,lru,reclaim
> > 0x000400              2085        8  __________B_________       buddy
> 
> "freed" is better?
> buddy is implementation technique name.

Not compellingly better :-)  I'd expect BUDDY to be a well recognized
technique, something close to LRU.  PG_BUDDY could be documented as:
this page is owned by the buddy system, which manages free memory.

PG_FREED may seem more newbie friendly, but there will be the classical
newbie question: "Why so few freed pages?!" ;-)

It's not likely that an administrator not understanding BUDDY will
understand many of the other exported page flags. He will have to
query the document anyway.  And exporting PG_buddy as it is could
be the best option for proficient users.

> > 0x000804                 1        0  __R________m________       referenced,mmap
> > 0x002808                10        0  ___U_______m_b______       uptodate,mmap,swapbacked
> > 0x000828              1060        4  ___U_l_____m________       uptodate,lru,mmap
> > 0x00082c               215        0  __RU_l_____m________       referenced,uptodate,lru,mmap
> > 0x000868               189        0  ___U_lA____m________       uptodate,lru,active,mmap
> > 0x002868              4187       16  ___U_lA____m_b______       uptodate,lru,active,mmap,swapbacked
> > 0x00286c                30        0  __RU_lA____m_b______       referenced,uptodate,lru,active,mmap,swapbacked
> > 0x00086c              1012        3  __RU_lA____m________       referenced,uptodate,lru,active,mmap
> > 0x002878                 3        0  ___UDlA____m_b______       uptodate,dirty,lru,active,mmap,swapbacked
> > 0x008880               936        3  _______S___m___o____       slab,mmap,compound
> > 0x000880              1602        6  _______S___m________       slab,mmap
> 
> please don't display mmap and coumpound. it expose SLUB implentation detail.
> IOW, if slab flag on, please ignore following flags and mapcount.
>         - PG_active
>         - PG_error
>         - PG_private
>         - PG_compound
> 
> BTW, if the page don't have PG_lru, following member and flags can be used another meanings.
>         - PG_active
>         - PG_referenced
>         - page::_mapcount
>         - PG_swapbacked
>         - PG_reclaim
>         - PG_unevictable
>         - PG_mlocked
> 
> and, if the page never interact IO layer, following flags can be used another meanings.
>         - PG_uptodate
>         - PG_dirty

Good point. I also noticed many of these conditional flags.
The perceived solution would be to do some filtering if
!CONFIG_DEBUG_KERNEL, to not confuse too many administrators.
For kernel developers we want to be faithful :-)

> 
> > 0x0088c0                59        0  ______AS___m___o____       active,slab,mmap,compound
> > 0x0008c0                49        0  ______AS___m________       active,slab,mmap
> >    total            513968     2007
> 
> 
> And, PageAnon() result seems provide good information if the page stay in lru.

Good point! Will add this bit.

> > # ./page-areas 0x004000
> >     offset      len         KB
> >          0       15       60KB
> >         31        4       16KB
> >        159       97      388KB
> >       4096     2213     8852KB
> >       6899     2385     9540KB
> >       9497        3       12KB
> >       9728    14528    58112KB
> >
> > > > > > > - PG_private
> > > > > > > - PG_private_2
> > > > > > > - PG_owner_priv_1
> > > > > > >
> > > > > > > - PG_head
> > > > > > > - PG_tail
> > > > > > > - PG_compound
> > >
> > > I would combine these three into a pseudo "large page" flag.
> >
> > Very neat idea! Patch updated accordingly.
> >
> > However - one pity I observed:
> >
> > # ./page-areas 0x008000
> >     offset      len         KB
> >       3088        4       16KB
> >
> > We can no longer tell if the above line means one 4-page hugepage, or two
> > 2-page hugepages... Adding PG_COMPOUND_TAIL into the CONFIG_DEBUG_KERNEL block
> > can help kernel developers. Or will it be ever cared by administrators?
> >
> >     341196        2        8KB
> >     341202        2        8KB
> >     341262        2        8KB
> >     341272        8       32KB
> >     341296        8       32KB
> >     488448       24       96KB
> >     488490        2        8KB
> >     488496      320     1280KB
> >     488842        2        8KB
> >     488848       40      160KB
> >
> > > > > > >
> > > > > > > - PG_unevictable
> > > > > > > - PG_mlocked
> > > > > > >
> > > > > > > - PG_poison
> > >
> > > PG_poison is also useful to export. But since it depends on my
> > > patchkit I will pull a patch for that into the HWPOISON series.
> >
> > That's not a problem - since the PG_poison line is be protected by
> > #ifdef CONFIG_MEMORY_FAILURE :-)
> >
> > > > > > > - PG_unevictable
> > > > > > > - PG_mlocked
> > > >
> > > > this 9 flags shouldn't exported.
> > > > I can't imazine administrator use what purpose those flags.
> > >
> > > I think an abstraced "PG_pinned" or somesuch flag that combines
> > > page lock, unevictable, mlocked would be useful for the administrator.
> >
> > The PG_PINNED abstraction risks hiding useful information.
> > The administrator may not only care about the pinned pages,
> > but also care _why_ they are pinned, i.e. ramfs.. or mlock?
> >
> > So it might be good to export them as is, with proper document.
> >
> > Here is the v2 patch, with flags for kernel hackers numbered from 32.
> > Comments are welcome!
> 
> if you can write good document, PG_unevictable is exportable.
> but PG_mlock isn't.
> 
> that's implementation tecknique of efficient unevictable pages for mlock.
> we can change the future.

Yup. That's in line with my vague feeling. For PG_unevictable we can
say that the page is owned by the unevictable (non-)lru and not a
candidate for LRU page reclaims. But for PG_mlock it's more about an
assistant for kernel optimizations and there are no guarantees...

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-04-16  3:49 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-14  4:22 [RFC][PATCH] proc: export more page flags in /proc/kpageflags Wu Fengguang
2009-04-14  4:22 ` Wu Fengguang
2009-04-14  4:36 ` Wu Fengguang
2009-04-14  4:37 ` KOSAKI Motohiro
2009-04-14  4:37   ` KOSAKI Motohiro
2009-04-14  6:41   ` Wu Fengguang
2009-04-14  6:41     ` Wu Fengguang
2009-04-14  6:54     ` KOSAKI Motohiro
2009-04-14  6:54       ` KOSAKI Motohiro
2009-04-14  7:11       ` Andi Kleen
2009-04-14  7:11         ` Andi Kleen
2009-04-14  7:17         ` KOSAKI Motohiro
2009-04-14  7:17           ` KOSAKI Motohiro
2009-04-15 13:18         ` Wu Fengguang
2009-04-15 13:18           ` Wu Fengguang
2009-04-15 13:57           ` Andi Kleen
2009-04-15 13:57             ` Andi Kleen
2009-04-16  2:41             ` Wu Fengguang
2009-04-16  2:41               ` Wu Fengguang
2009-04-16  3:54               ` Andi Kleen
2009-04-16  3:54                 ` Andi Kleen
2009-04-16  4:43                 ` Wu Fengguang
2009-04-16  4:43                   ` Wu Fengguang
2009-04-16  2:26           ` KOSAKI Motohiro
2009-04-16  2:26             ` KOSAKI Motohiro
2009-04-16  3:49             ` Wu Fengguang [this message]
2009-04-16  3:49               ` Wu Fengguang
2009-04-16  6:30               ` Wu Fengguang
2009-04-16  6:30                 ` Wu Fengguang
2009-04-23  2:26             ` [RFC][PATCH] proc: export more page flags in /proc/kpageflags (take 3) Wu Fengguang
2009-04-23  2:26               ` Wu Fengguang
2009-04-23  7:48               ` Andi Kleen
2009-04-23  7:48                 ` Andi Kleen
2009-04-23  8:10                 ` Wu Fengguang
2009-04-23  8:10                   ` Wu Fengguang
2009-04-23  8:54                   ` Andi Kleen
2009-04-23  8:54                     ` Andi Kleen
2009-04-23 11:21                     ` Wu Fengguang
2009-04-23 11:21                       ` Wu Fengguang
2009-04-25  1:59               ` Wu Fengguang
2009-04-14  7:22       ` [RFC][PATCH] proc: export more page flags in /proc/kpageflags Wu Fengguang
2009-04-14  7:22         ` Wu Fengguang
2009-04-14  7:42         ` KOSAKI Motohiro
2009-04-14  7:42           ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090416034918.GB20162@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.