linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
@ 2015-05-21 13:27 Michal Hocko
  2015-05-21 13:52 ` Mel Gorman
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Michal Hocko @ 2015-05-21 13:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Naoya Horiguchi, linux-mm, LKML

hugetlb pages uses add_to_page_cache to track shared mappings. This
is OK from the data structure point of view but it is less so from the
NR_FILE_PAGES accounting:
	- huge pages are accounted as 4k which is clearly wrong
	- this counter is used as the amount of the reclaimable page
	  cache which is incorrect as well because hugetlb pages are
	  special and not reclaimable
	- the counter is then exported to userspace via /proc/meminfo
	  (in Cached:), /proc/vmstat and /proc/zoneinfo as
	  nr_file_pages which is confusing at least:
	  Cached:          8883504 kB
	  HugePages_Free:     8348
	  ...
	  Cached:          8916048 kB
	  HugePages_Free:      156
	  ...
	  thats 8192 huge pages allocated which is ~16G accounted as 32M

There are usually not that many huge pages in the system for this to
make any visible difference e.g. by fooling __vm_enough_memory or
zone_pagecache_reclaimable.

Fix this by special casing huge pages in both __delete_from_page_cache
and __add_to_page_cache_locked. replace_page_cache_page is currently
only used by fuse and that shouldn't touch hugetlb pages AFAICS but it
is more robust to check for special casing there as well.

Hugetlb pages shouldn't get to any other paths where we do accounting:
	- migration - we have a special handling via
	  hugetlbfs_migrate_page
	- shmem - doesn't handle hugetlb pages directly even for
	  SHM_HUGETLB resp. MAP_HUGETLB
	- swapcache - hugetlb is not swapable

This has a user visible effect but I believe it is reasonable because
the previously exported number is simply bogus.

An alternative would be to account hugetlb pages with their real size
and treat them similar to shmem. But this has some drawbacks.

First we would have to special case in kernel users of NR_FILE_PAGES and
considering how hugetlb is special we would have to do it everywhere. We
do not want Cached exported by /proc/meminfo to include it because the
value would be even more misleading.
__vm_enough_memory and zone_pagecache_reclaimable would have to do
the same thing because those pages are simply not reclaimable. The
correction is even not trivial because we would have to consider all
active hugetlb page sizes properly. Users of the counter outside of the
kernel would have to do the same.
So the question is why to account something that needs to be basically
excluded for each reasonable usage. This doesn't make much sense to me.

It seems that this has been broken since hugetlb was introduced but I
haven't checked the whole history.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/filemap.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 413eb20abfa7..249bde91c273 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -197,7 +197,9 @@ void __delete_from_page_cache(struct page *page, void *shadow)
 	page->mapping = NULL;
 	/* Leave page->index set: truncation lookup relies upon it */
 
-	__dec_zone_page_state(page, NR_FILE_PAGES);
+	/* hugetlb pages do not participate into page cache accounting. */
+	if (!PageHuge(page))
+		__dec_zone_page_state(page, NR_FILE_PAGES);
 	if (PageSwapBacked(page))
 		__dec_zone_page_state(page, NR_SHMEM);
 	BUG_ON(page_mapped(page));
@@ -484,7 +486,13 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
 		error = radix_tree_insert(&mapping->page_tree, offset, new);
 		BUG_ON(error);
 		mapping->nrpages++;
-		__inc_zone_page_state(new, NR_FILE_PAGES);
+
+		/*
+		 * hugetlb pages do not participate into page cache
+		 * accounting.
+		 */
+		if (!PageHuge(new))
+			__inc_zone_page_state(new, NR_FILE_PAGES);
 		if (PageSwapBacked(new))
 			__inc_zone_page_state(new, NR_SHMEM);
 		spin_unlock_irq(&mapping->tree_lock);
@@ -576,7 +584,10 @@ static int __add_to_page_cache_locked(struct page *page,
 	radix_tree_preload_end();
 	if (unlikely(error))
 		goto err_insert;
-	__inc_zone_page_state(page, NR_FILE_PAGES);
+
+	/* hugetlb pages do not participate into page cache accounting. */
+	if (!huge)
+		__inc_zone_page_state(page, NR_FILE_PAGES);
 	spin_unlock_irq(&mapping->tree_lock);
 	if (!huge)
 		mem_cgroup_commit_charge(page, memcg, false);
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-05-21 13:27 [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES Michal Hocko
@ 2015-05-21 13:52 ` Mel Gorman
  2015-05-21 16:18 ` Mike Kravetz
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2015-05-21 13:52 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Naoya Horiguchi, linux-mm, LKML

On Thu, May 21, 2015 at 03:27:22PM +0200, Michal Hocko wrote:
> hugetlb pages uses add_to_page_cache to track shared mappings. This
> is OK from the data structure point of view but it is less so from the
> NR_FILE_PAGES accounting:
> 	- huge pages are accounted as 4k which is clearly wrong
> 	- this counter is used as the amount of the reclaimable page
> 	  cache which is incorrect as well because hugetlb pages are
> 	  special and not reclaimable
> 	- the counter is then exported to userspace via /proc/meminfo
> 	  (in Cached:), /proc/vmstat and /proc/zoneinfo as
> 	  nr_file_pages which is confusing at least:
> 	  Cached:          8883504 kB
> 	  HugePages_Free:     8348
> 	  ...
> 	  Cached:          8916048 kB
> 	  HugePages_Free:      156
> 	  ...
> 	  thats 8192 huge pages allocated which is ~16G accounted as 32M
> 
> There are usually not that many huge pages in the system for this to
> make any visible difference e.g. by fooling __vm_enough_memory or
> zone_pagecache_reclaimable.
> 
> Fix this by special casing huge pages in both __delete_from_page_cache
> and __add_to_page_cache_locked. replace_page_cache_page is currently
> only used by fuse and that shouldn't touch hugetlb pages AFAICS but it
> is more robust to check for special casing there as well.
> 
> Hugetlb pages shouldn't get to any other paths where we do accounting:
> 	- migration - we have a special handling via
> 	  hugetlbfs_migrate_page
> 	- shmem - doesn't handle hugetlb pages directly even for
> 	  SHM_HUGETLB resp. MAP_HUGETLB
> 	- swapcache - hugetlb is not swapable
> 
> This has a user visible effect but I believe it is reasonable because
> the previously exported number is simply bogus.
> 
> An alternative would be to account hugetlb pages with their real size
> and treat them similar to shmem. But this has some drawbacks.
> 
> First we would have to special case in kernel users of NR_FILE_PAGES and
> considering how hugetlb is special we would have to do it everywhere. We
> do not want Cached exported by /proc/meminfo to include it because the
> value would be even more misleading.
> __vm_enough_memory and zone_pagecache_reclaimable would have to do
> the same thing because those pages are simply not reclaimable. The
> correction is even not trivial because we would have to consider all
> active hugetlb page sizes properly. Users of the counter outside of the
> kernel would have to do the same.
> So the question is why to account something that needs to be basically
> excluded for each reasonable usage. This doesn't make much sense to me.
> 
> It seems that this has been broken since hugetlb was introduced but I
> haven't checked the whole history.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-05-21 13:27 [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES Michal Hocko
  2015-05-21 13:52 ` Mel Gorman
@ 2015-05-21 16:18 ` Mike Kravetz
  2015-05-22  6:01   ` Michal Hocko
  2015-05-21 17:09 ` Johannes Weiner
  2015-05-22  5:09 ` Naoya Horiguchi
  3 siblings, 1 reply; 14+ messages in thread
From: Mike Kravetz @ 2015-05-21 16:18 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton; +Cc: Mel Gorman, Naoya Horiguchi, linux-mm, LKML

On 05/21/2015 06:27 AM, Michal Hocko wrote:
> hugetlb pages uses add_to_page_cache to track shared mappings. This
> is OK from the data structure point of view but it is less so from the
> NR_FILE_PAGES accounting:
> 	- huge pages are accounted as 4k which is clearly wrong
> 	- this counter is used as the amount of the reclaimable page
> 	  cache which is incorrect as well because hugetlb pages are
> 	  special and not reclaimable
> 	- the counter is then exported to userspace via /proc/meminfo
> 	  (in Cached:), /proc/vmstat and /proc/zoneinfo as
> 	  nr_file_pages which is confusing at least:
> 	  Cached:          8883504 kB
> 	  HugePages_Free:     8348
> 	  ...
> 	  Cached:          8916048 kB
> 	  HugePages_Free:      156
> 	  ...
> 	  thats 8192 huge pages allocated which is ~16G accounted as 32M
>
> There are usually not that many huge pages in the system for this to
> make any visible difference e.g. by fooling __vm_enough_memory or
> zone_pagecache_reclaimable.
>
> Fix this by special casing huge pages in both __delete_from_page_cache
> and __add_to_page_cache_locked. replace_page_cache_page is currently
> only used by fuse and that shouldn't touch hugetlb pages AFAICS but it
> is more robust to check for special casing there as well.
>
> Hugetlb pages shouldn't get to any other paths where we do accounting:
> 	- migration - we have a special handling via
> 	  hugetlbfs_migrate_page
> 	- shmem - doesn't handle hugetlb pages directly even for
> 	  SHM_HUGETLB resp. MAP_HUGETLB
> 	- swapcache - hugetlb is not swapable
>
> This has a user visible effect but I believe it is reasonable because
> the previously exported number is simply bogus.
>
> An alternative would be to account hugetlb pages with their real size
> and treat them similar to shmem. But this has some drawbacks.
>
> First we would have to special case in kernel users of NR_FILE_PAGES and
> considering how hugetlb is special we would have to do it everywhere. We
> do not want Cached exported by /proc/meminfo to include it because the
> value would be even more misleading.
> __vm_enough_memory and zone_pagecache_reclaimable would have to do
> the same thing because those pages are simply not reclaimable. The
> correction is even not trivial because we would have to consider all
> active hugetlb page sizes properly. Users of the counter outside of the
> kernel would have to do the same.
> So the question is why to account something that needs to be basically
> excluded for each reasonable usage. This doesn't make much sense to me.
>
> It seems that this has been broken since hugetlb was introduced but I
> haven't checked the whole history.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Just for grins, I added this to my hugetlbfs fallocate stress testing
which really exercises hugetlb add and delete from page cache.
Everything is as expected.

Tested-by: Mike Kravetz <mike.kravetz@oracle.com>

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-05-21 13:27 [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES Michal Hocko
  2015-05-21 13:52 ` Mel Gorman
  2015-05-21 16:18 ` Mike Kravetz
@ 2015-05-21 17:09 ` Johannes Weiner
  2015-05-22 14:21   ` Michal Hocko
  2015-05-22  5:09 ` Naoya Horiguchi
  3 siblings, 1 reply; 14+ messages in thread
From: Johannes Weiner @ 2015-05-21 17:09 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Mel Gorman, Naoya Horiguchi, linux-mm, LKML

On Thu, May 21, 2015 at 03:27:22PM +0200, Michal Hocko wrote:
> hugetlb pages uses add_to_page_cache to track shared mappings. This
> is OK from the data structure point of view but it is less so from the
> NR_FILE_PAGES accounting:
> 	- huge pages are accounted as 4k which is clearly wrong
> 	- this counter is used as the amount of the reclaimable page
> 	  cache which is incorrect as well because hugetlb pages are
> 	  special and not reclaimable
> 	- the counter is then exported to userspace via /proc/meminfo
> 	  (in Cached:), /proc/vmstat and /proc/zoneinfo as
> 	  nr_file_pages which is confusing at least:
> 	  Cached:          8883504 kB
> 	  HugePages_Free:     8348
> 	  ...
> 	  Cached:          8916048 kB
> 	  HugePages_Free:      156
> 	  ...
> 	  thats 8192 huge pages allocated which is ~16G accounted as 32M
> 
> There are usually not that many huge pages in the system for this to
> make any visible difference e.g. by fooling __vm_enough_memory or
> zone_pagecache_reclaimable.
> 
> Fix this by special casing huge pages in both __delete_from_page_cache
> and __add_to_page_cache_locked. replace_page_cache_page is currently
> only used by fuse and that shouldn't touch hugetlb pages AFAICS but it
> is more robust to check for special casing there as well.
> 
> Hugetlb pages shouldn't get to any other paths where we do accounting:
> 	- migration - we have a special handling via
> 	  hugetlbfs_migrate_page
> 	- shmem - doesn't handle hugetlb pages directly even for
> 	  SHM_HUGETLB resp. MAP_HUGETLB
> 	- swapcache - hugetlb is not swapable
> 
> This has a user visible effect but I believe it is reasonable because
> the previously exported number is simply bogus.
> 
> An alternative would be to account hugetlb pages with their real size
> and treat them similar to shmem. But this has some drawbacks.
> 
> First we would have to special case in kernel users of NR_FILE_PAGES and
> considering how hugetlb is special we would have to do it everywhere. We
> do not want Cached exported by /proc/meminfo to include it because the
> value would be even more misleading.
> __vm_enough_memory and zone_pagecache_reclaimable would have to do
> the same thing because those pages are simply not reclaimable. The
> correction is even not trivial because we would have to consider all
> active hugetlb page sizes properly. Users of the counter outside of the
> kernel would have to do the same.
> So the question is why to account something that needs to be basically
> excluded for each reasonable usage. This doesn't make much sense to me.
> 
> It seems that this has been broken since hugetlb was introduced but I
> haven't checked the whole history.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

This makes a lot of sense to me.  The only thing I worry about is the
proliferation of PageHuge(), a function call, in relatively hot paths.

Naoya-san, would there be a strong reason to make this function a
static inline in the header?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-05-21 13:27 [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES Michal Hocko
                   ` (2 preceding siblings ...)
  2015-05-21 17:09 ` Johannes Weiner
@ 2015-05-22  5:09 ` Naoya Horiguchi
  3 siblings, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2015-05-22  5:09 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Mel Gorman, linux-mm@kvack.org, LKML

On Thu, May 21, 2015 at 03:27:22PM +0200, Michal Hocko wrote:
> hugetlb pages uses add_to_page_cache to track shared mappings. This
> is OK from the data structure point of view but it is less so from the
> NR_FILE_PAGES accounting:
> 	- huge pages are accounted as 4k which is clearly wrong
> 	- this counter is used as the amount of the reclaimable page
> 	  cache which is incorrect as well because hugetlb pages are
> 	  special and not reclaimable
> 	- the counter is then exported to userspace via /proc/meminfo
> 	  (in Cached:), /proc/vmstat and /proc/zoneinfo as
> 	  nr_file_pages which is confusing at least:
> 	  Cached:          8883504 kB
> 	  HugePages_Free:     8348
> 	  ...
> 	  Cached:          8916048 kB
> 	  HugePages_Free:      156
> 	  ...
> 	  thats 8192 huge pages allocated which is ~16G accounted as 32M
> 
> There are usually not that many huge pages in the system for this to
> make any visible difference e.g. by fooling __vm_enough_memory or
> zone_pagecache_reclaimable.
> 
> Fix this by special casing huge pages in both __delete_from_page_cache
> and __add_to_page_cache_locked. replace_page_cache_page is currently
> only used by fuse and that shouldn't touch hugetlb pages AFAICS but it
> is more robust to check for special casing there as well.
> 
> Hugetlb pages shouldn't get to any other paths where we do accounting:
> 	- migration - we have a special handling via
> 	  hugetlbfs_migrate_page
> 	- shmem - doesn't handle hugetlb pages directly even for
> 	  SHM_HUGETLB resp. MAP_HUGETLB
> 	- swapcache - hugetlb is not swapable
> 
> This has a user visible effect but I believe it is reasonable because
> the previously exported number is simply bogus.
> 
> An alternative would be to account hugetlb pages with their real size
> and treat them similar to shmem. But this has some drawbacks.
> 
> First we would have to special case in kernel users of NR_FILE_PAGES and
> considering how hugetlb is special we would have to do it everywhere. We
> do not want Cached exported by /proc/meminfo to include it because the
> value would be even more misleading.
> __vm_enough_memory and zone_pagecache_reclaimable would have to do
> the same thing because those pages are simply not reclaimable. The
> correction is even not trivial because we would have to consider all
> active hugetlb page sizes properly. Users of the counter outside of the
> kernel would have to do the same.
> So the question is why to account something that needs to be basically
> excluded for each reasonable usage. This doesn't make much sense to me.
> 
> It seems that this has been broken since hugetlb was introduced but I
> haven't checked the whole history.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

looks good to me,

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-05-21 16:18 ` Mike Kravetz
@ 2015-05-22  6:01   ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2015-05-22  6:01 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Andrew Morton, Mel Gorman, Naoya Horiguchi, linux-mm, LKML

On Thu 21-05-15 09:18:59, Mike Kravetz wrote:
> On 05/21/2015 06:27 AM, Michal Hocko wrote:
> >hugetlb pages uses add_to_page_cache to track shared mappings. This
> >is OK from the data structure point of view but it is less so from the
> >NR_FILE_PAGES accounting:
> >	- huge pages are accounted as 4k which is clearly wrong
> >	- this counter is used as the amount of the reclaimable page
> >	  cache which is incorrect as well because hugetlb pages are
> >	  special and not reclaimable
> >	- the counter is then exported to userspace via /proc/meminfo
> >	  (in Cached:), /proc/vmstat and /proc/zoneinfo as
> >	  nr_file_pages which is confusing at least:
> >	  Cached:          8883504 kB
> >	  HugePages_Free:     8348
> >	  ...
> >	  Cached:          8916048 kB
> >	  HugePages_Free:      156
> >	  ...
> >	  thats 8192 huge pages allocated which is ~16G accounted as 32M
> >
> >There are usually not that many huge pages in the system for this to
> >make any visible difference e.g. by fooling __vm_enough_memory or
> >zone_pagecache_reclaimable.
> >
> >Fix this by special casing huge pages in both __delete_from_page_cache
> >and __add_to_page_cache_locked. replace_page_cache_page is currently
> >only used by fuse and that shouldn't touch hugetlb pages AFAICS but it
> >is more robust to check for special casing there as well.
> >
> >Hugetlb pages shouldn't get to any other paths where we do accounting:
> >	- migration - we have a special handling via
> >	  hugetlbfs_migrate_page
> >	- shmem - doesn't handle hugetlb pages directly even for
> >	  SHM_HUGETLB resp. MAP_HUGETLB
> >	- swapcache - hugetlb is not swapable
> >
> >This has a user visible effect but I believe it is reasonable because
> >the previously exported number is simply bogus.
> >
> >An alternative would be to account hugetlb pages with their real size
> >and treat them similar to shmem. But this has some drawbacks.
> >
> >First we would have to special case in kernel users of NR_FILE_PAGES and
> >considering how hugetlb is special we would have to do it everywhere. We
> >do not want Cached exported by /proc/meminfo to include it because the
> >value would be even more misleading.
> >__vm_enough_memory and zone_pagecache_reclaimable would have to do
> >the same thing because those pages are simply not reclaimable. The
> >correction is even not trivial because we would have to consider all
> >active hugetlb page sizes properly. Users of the counter outside of the
> >kernel would have to do the same.
> >So the question is why to account something that needs to be basically
> >excluded for each reasonable usage. This doesn't make much sense to me.
> >
> >It seems that this has been broken since hugetlb was introduced but I
> >haven't checked the whole history.
> >
> >Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> Just for grins, I added this to my hugetlbfs fallocate stress testing
> which really exercises hugetlb add and delete from page cache.
> Everything is as expected.
> 
> Tested-by: Mike Kravetz <mike.kravetz@oracle.com>

Thanks for your testing!

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-05-21 17:09 ` Johannes Weiner
@ 2015-05-22 14:21   ` Michal Hocko
  2015-05-22 14:28     ` Michal Hocko
  2015-05-22 14:35     ` Mel Gorman
  0 siblings, 2 replies; 14+ messages in thread
From: Michal Hocko @ 2015-05-22 14:21 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Mel Gorman, Naoya Horiguchi, linux-mm, LKML

On Thu 21-05-15 13:09:09, Johannes Weiner wrote:
> On Thu, May 21, 2015 at 03:27:22PM +0200, Michal Hocko wrote:
> > hugetlb pages uses add_to_page_cache to track shared mappings. This
> > is OK from the data structure point of view but it is less so from the
> > NR_FILE_PAGES accounting:
> > 	- huge pages are accounted as 4k which is clearly wrong
> > 	- this counter is used as the amount of the reclaimable page
> > 	  cache which is incorrect as well because hugetlb pages are
> > 	  special and not reclaimable
> > 	- the counter is then exported to userspace via /proc/meminfo
> > 	  (in Cached:), /proc/vmstat and /proc/zoneinfo as
> > 	  nr_file_pages which is confusing at least:
> > 	  Cached:          8883504 kB
> > 	  HugePages_Free:     8348
> > 	  ...
> > 	  Cached:          8916048 kB
> > 	  HugePages_Free:      156
> > 	  ...
> > 	  thats 8192 huge pages allocated which is ~16G accounted as 32M
> > 
> > There are usually not that many huge pages in the system for this to
> > make any visible difference e.g. by fooling __vm_enough_memory or
> > zone_pagecache_reclaimable.
> > 
> > Fix this by special casing huge pages in both __delete_from_page_cache
> > and __add_to_page_cache_locked. replace_page_cache_page is currently
> > only used by fuse and that shouldn't touch hugetlb pages AFAICS but it
> > is more robust to check for special casing there as well.
> > 
> > Hugetlb pages shouldn't get to any other paths where we do accounting:
> > 	- migration - we have a special handling via
> > 	  hugetlbfs_migrate_page
> > 	- shmem - doesn't handle hugetlb pages directly even for
> > 	  SHM_HUGETLB resp. MAP_HUGETLB
> > 	- swapcache - hugetlb is not swapable
> > 
> > This has a user visible effect but I believe it is reasonable because
> > the previously exported number is simply bogus.
> > 
> > An alternative would be to account hugetlb pages with their real size
> > and treat them similar to shmem. But this has some drawbacks.
> > 
> > First we would have to special case in kernel users of NR_FILE_PAGES and
> > considering how hugetlb is special we would have to do it everywhere. We
> > do not want Cached exported by /proc/meminfo to include it because the
> > value would be even more misleading.
> > __vm_enough_memory and zone_pagecache_reclaimable would have to do
> > the same thing because those pages are simply not reclaimable. The
> > correction is even not trivial because we would have to consider all
> > active hugetlb page sizes properly. Users of the counter outside of the
> > kernel would have to do the same.
> > So the question is why to account something that needs to be basically
> > excluded for each reasonable usage. This doesn't make much sense to me.
> > 
> > It seems that this has been broken since hugetlb was introduced but I
> > haven't checked the whole history.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Thanks!

> This makes a lot of sense to me.  The only thing I worry about is the
> proliferation of PageHuge(), a function call, in relatively hot paths.

I've tried that (see the patch below) but it enlarged the code by almost
1k
   text    data     bss     dec     hex filename
 510323   74273   44440  629036   9992c mm/built-in.o.before
 511248   74273   44440  629961   99cc9 mm/built-in.o.after

I am not sure the code size increase is worth it. Maybe we can reduce
the check to only PageCompound(page) as huge pages are no in the page
cache (yet).

---
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 205026175c42..2e36251ad31b 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -84,7 +84,6 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
 int dequeue_hwpoisoned_huge_page(struct page *page);
 bool isolate_huge_page(struct page *page, struct list_head *list);
 void putback_active_hugepage(struct page *page);
-void free_huge_page(struct page *page);
 
 #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
 pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2923a51979e9..5b6c49e55f80 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -531,23 +531,6 @@ void put_pages_list(struct list_head *pages);
 void split_page(struct page *page, unsigned int order);
 int split_free_page(struct page *page);
 
-/*
- * Compound pages have a destructor function.  Provide a
- * prototype for that function and accessor functions.
- * These are _only_ valid on the head of a PG_compound page.
- */
-
-static inline void set_compound_page_dtor(struct page *page,
-						compound_page_dtor *dtor)
-{
-	page[1].compound_dtor = dtor;
-}
-
-static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
-{
-	return page[1].compound_dtor;
-}
-
 static inline int compound_order(struct page *page)
 {
 	if (!PageHead(page))
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8d37e26a1007..a9ecaebb5392 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -226,6 +226,23 @@ struct page_frag {
 #endif
 };
 
+/*
+ * Compound pages have a destructor function.  Provide a
+ * prototype for that function and accessor functions.
+ * These are _only_ valid on the head of a PG_compound page.
+ */
+
+static inline void set_compound_page_dtor(struct page *page,
+						compound_page_dtor *dtor)
+{
+	page[1].compound_dtor = dtor;
+}
+
+static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
+{
+	return page[1].compound_dtor;
+}
+
 typedef unsigned long __nocast vm_flags_t;
 
 /*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 91b7f9b2b774..41329fbb5890 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -547,7 +547,21 @@ static inline void ClearPageCompound(struct page *page)
 #endif /* !PAGEFLAGS_EXTENDED */
 
 #ifdef CONFIG_HUGETLB_PAGE
-int PageHuge(struct page *page);
+void free_huge_page(struct page *page);
+
+/*
+ * PageHuge() only returns true for hugetlbfs pages, but not for normal or
+ * transparent huge pages.  See the PageTransHuge() documentation for more
+ * details.
+ */
+static inline int PageHuge(struct page *page)
+{
+	if (!PageCompound(page))
+		return 0;
+
+	page = compound_head(page);
+	return get_compound_page_dtor(page) == free_huge_page;
+}
 int PageHeadHuge(struct page *page);
 bool page_huge_active(struct page *page);
 #else
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 54f129dc37f6..406913f3b234 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1041,21 +1041,6 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
 }
 
 /*
- * PageHuge() only returns true for hugetlbfs pages, but not for normal or
- * transparent huge pages.  See the PageTransHuge() documentation for more
- * details.
- */
-int PageHuge(struct page *page)
-{
-	if (!PageCompound(page))
-		return 0;
-
-	page = compound_head(page);
-	return get_compound_page_dtor(page) == free_huge_page;
-}
-EXPORT_SYMBOL_GPL(PageHuge);
-
-/*
  * PageHeadHuge() only returns true for hugetlbfs head page, but not for
  * normal or transparent huge pages.
  */

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-05-22 14:21   ` Michal Hocko
@ 2015-05-22 14:28     ` Michal Hocko
  2015-05-22 14:35     ` Mel Gorman
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2015-05-22 14:28 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Mel Gorman, Naoya Horiguchi, linux-mm, LKML

On Fri 22-05-15 16:21:43, Michal Hocko wrote:
> On Thu 21-05-15 13:09:09, Johannes Weiner wrote:
[...]
> > This makes a lot of sense to me.  The only thing I worry about is the
> > proliferation of PageHuge(), a function call, in relatively hot paths.
> 
> I've tried that (see the patch below) but it enlarged the code by almost
> 1k
>    text    data     bss     dec     hex filename
>  510323   74273   44440  629036   9992c mm/built-in.o.before
>  511248   74273   44440  629961   99cc9 mm/built-in.o.after
> 
> I am not sure the code size increase is worth it. Maybe we can reduce
> the check to only PageCompound(page) as huge pages are no in the page
> cache (yet).

Just to prevent from confusion. I means to reduce the check only for
this particular case. But that is probably not worth the troubles
either...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-05-22 14:21   ` Michal Hocko
  2015-05-22 14:28     ` Michal Hocko
@ 2015-05-22 14:35     ` Mel Gorman
  2015-05-25 15:24       ` Vlastimil Babka
  1 sibling, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2015-05-22 14:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Andrew Morton, Naoya Horiguchi, linux-mm, LKML

On Fri, May 22, 2015 at 04:21:43PM +0200, Michal Hocko wrote:
> On Thu 21-05-15 13:09:09, Johannes Weiner wrote:
> > On Thu, May 21, 2015 at 03:27:22PM +0200, Michal Hocko wrote:
> > > hugetlb pages uses add_to_page_cache to track shared mappings. This
> > > is OK from the data structure point of view but it is less so from the
> > > NR_FILE_PAGES accounting:
> > > 	- huge pages are accounted as 4k which is clearly wrong
> > > 	- this counter is used as the amount of the reclaimable page
> > > 	  cache which is incorrect as well because hugetlb pages are
> > > 	  special and not reclaimable
> > > 	- the counter is then exported to userspace via /proc/meminfo
> > > 	  (in Cached:), /proc/vmstat and /proc/zoneinfo as
> > > 	  nr_file_pages which is confusing at least:
> > > 	  Cached:          8883504 kB
> > > 	  HugePages_Free:     8348
> > > 	  ...
> > > 	  Cached:          8916048 kB
> > > 	  HugePages_Free:      156
> > > 	  ...
> > > 	  thats 8192 huge pages allocated which is ~16G accounted as 32M
> > > 
> > > There are usually not that many huge pages in the system for this to
> > > make any visible difference e.g. by fooling __vm_enough_memory or
> > > zone_pagecache_reclaimable.
> > > 
> > > Fix this by special casing huge pages in both __delete_from_page_cache
> > > and __add_to_page_cache_locked. replace_page_cache_page is currently
> > > only used by fuse and that shouldn't touch hugetlb pages AFAICS but it
> > > is more robust to check for special casing there as well.
> > > 
> > > Hugetlb pages shouldn't get to any other paths where we do accounting:
> > > 	- migration - we have a special handling via
> > > 	  hugetlbfs_migrate_page
> > > 	- shmem - doesn't handle hugetlb pages directly even for
> > > 	  SHM_HUGETLB resp. MAP_HUGETLB
> > > 	- swapcache - hugetlb is not swapable
> > > 
> > > This has a user visible effect but I believe it is reasonable because
> > > the previously exported number is simply bogus.
> > > 
> > > An alternative would be to account hugetlb pages with their real size
> > > and treat them similar to shmem. But this has some drawbacks.
> > > 
> > > First we would have to special case in kernel users of NR_FILE_PAGES and
> > > considering how hugetlb is special we would have to do it everywhere. We
> > > do not want Cached exported by /proc/meminfo to include it because the
> > > value would be even more misleading.
> > > __vm_enough_memory and zone_pagecache_reclaimable would have to do
> > > the same thing because those pages are simply not reclaimable. The
> > > correction is even not trivial because we would have to consider all
> > > active hugetlb page sizes properly. Users of the counter outside of the
> > > kernel would have to do the same.
> > > So the question is why to account something that needs to be basically
> > > excluded for each reasonable usage. This doesn't make much sense to me.
> > > 
> > > It seems that this has been broken since hugetlb was introduced but I
> > > haven't checked the whole history.
> > > 
> > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > 
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Thanks!
> 
> > This makes a lot of sense to me.  The only thing I worry about is the
> > proliferation of PageHuge(), a function call, in relatively hot paths.
> 
> I've tried that (see the patch below) but it enlarged the code by almost
> 1k
>    text    data     bss     dec     hex filename
>  510323   74273   44440  629036   9992c mm/built-in.o.before
>  511248   74273   44440  629961   99cc9 mm/built-in.o.after
> 
> I am not sure the code size increase is worth it. Maybe we can reduce
> the check to only PageCompound(page) as huge pages are no in the page
> cache (yet).
> 

That would be a more sensible route because it also avoids exposing the
hugetlbfs destructor unnecessarily.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-05-22 14:35     ` Mel Gorman
@ 2015-05-25 15:24       ` Vlastimil Babka
  2015-06-02  9:25         ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Vlastimil Babka @ 2015-05-25 15:24 UTC (permalink / raw)
  To: Mel Gorman, Michal Hocko
  Cc: Johannes Weiner, Andrew Morton, Naoya Horiguchi, linux-mm, LKML

On 05/22/2015 04:35 PM, Mel Gorman wrote:
>> 
>> Thanks!
>> 
>> > This makes a lot of sense to me.  The only thing I worry about is the
>> > proliferation of PageHuge(), a function call, in relatively hot paths.
>> 
>> I've tried that (see the patch below) but it enlarged the code by almost
>> 1k
>>    text    data     bss     dec     hex filename
>>  510323   74273   44440  629036   9992c mm/built-in.o.before
>>  511248   74273   44440  629961   99cc9 mm/built-in.o.after
>> 
>> I am not sure the code size increase is worth it. Maybe we can reduce
>> the check to only PageCompound(page) as huge pages are no in the page
>> cache (yet).
>> 
> 
> That would be a more sensible route because it also avoids exposing the
> hugetlbfs destructor unnecessarily.

You could maybe do test such as (PageCompound(page) && PageHuge(page)) to
short-circuit the call while remaining future-proof.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-05-25 15:24       ` Vlastimil Babka
@ 2015-06-02  9:25         ` Michal Hocko
  2015-06-02  9:33           ` Vlastimil Babka
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2015-06-02  9:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Mel Gorman, Johannes Weiner, Andrew Morton, Naoya Horiguchi,
	linux-mm, LKML

On Mon 25-05-15 17:24:28, Vlastimil Babka wrote:
> On 05/22/2015 04:35 PM, Mel Gorman wrote:
> >> 
> >> Thanks!
> >> 
> >> > This makes a lot of sense to me.  The only thing I worry about is the
> >> > proliferation of PageHuge(), a function call, in relatively hot paths.
> >> 
> >> I've tried that (see the patch below) but it enlarged the code by almost
> >> 1k
> >>    text    data     bss     dec     hex filename
> >>  510323   74273   44440  629036   9992c mm/built-in.o.before
> >>  511248   74273   44440  629961   99cc9 mm/built-in.o.after
> >> 
> >> I am not sure the code size increase is worth it. Maybe we can reduce
> >> the check to only PageCompound(page) as huge pages are no in the page
> >> cache (yet).
> >> 
> > 
> > That would be a more sensible route because it also avoids exposing the
> > hugetlbfs destructor unnecessarily.
> 
> You could maybe do test such as (PageCompound(page) && PageHuge(page)) to
> short-circuit the call while remaining future-proof.

How about this?
---
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 91b7f9b2b774..bb8a70e8fc77 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -547,7 +547,13 @@ static inline void ClearPageCompound(struct page *page)
 #endif /* !PAGEFLAGS_EXTENDED */
 
 #ifdef CONFIG_HUGETLB_PAGE
-int PageHuge(struct page *page);
+int __PageHuge(struct page *page);
+static inline int PageHuge(struct page *page)
+{
+	if (!PageCompound(page))
+		return 0;
+	return __PageHuge(page);
+}
 int PageHeadHuge(struct page *page);
 bool page_huge_active(struct page *page);
 #else
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 33defbe1897f..648c0c32857c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1107,19 +1107,17 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
 }
 
 /*
- * PageHuge() only returns true for hugetlbfs pages, but not for normal or
+ * __PageHuge() only returns true for hugetlbfs pages, but not for normal or
  * transparent huge pages.  See the PageTransHuge() documentation for more
  * details.
  */
-int PageHuge(struct page *page)
+int __PageHuge(struct page *page)
 {
-	if (!PageCompound(page))
-		return 0;
-
+	VM_BUG_ON(!PageCompound(page));
 	page = compound_head(page);
 	return get_compound_page_dtor(page) == free_huge_page;
 }
-EXPORT_SYMBOL_GPL(PageHuge);
+EXPORT_SYMBOL_GPL(__PageHuge);
 
 /*
  * PageHeadHuge() only returns true for hugetlbfs head page, but not for

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-06-02  9:25         ` Michal Hocko
@ 2015-06-02  9:33           ` Vlastimil Babka
  2015-06-02  9:38             ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Vlastimil Babka @ 2015-06-02  9:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Johannes Weiner, Andrew Morton, Naoya Horiguchi,
	linux-mm, LKML

On 06/02/2015 11:25 AM, Michal Hocko wrote:
> On Mon 25-05-15 17:24:28, Vlastimil Babka wrote:
>> On 05/22/2015 04:35 PM, Mel Gorman wrote:
>>>>
>>>> Thanks!
>>>>
>>>>> This makes a lot of sense to me.  The only thing I worry about is the
>>>>> proliferation of PageHuge(), a function call, in relatively hot paths.
>>>>
>>>> I've tried that (see the patch below) but it enlarged the code by almost
>>>> 1k
>>>>     text    data     bss     dec     hex filename
>>>>   510323   74273   44440  629036   9992c mm/built-in.o.before
>>>>   511248   74273   44440  629961   99cc9 mm/built-in.o.after
>>>>
>>>> I am not sure the code size increase is worth it. Maybe we can reduce
>>>> the check to only PageCompound(page) as huge pages are no in the page
>>>> cache (yet).
>>>>
>>>
>>> That would be a more sensible route because it also avoids exposing the
>>> hugetlbfs destructor unnecessarily.
>>
>> You could maybe do test such as (PageCompound(page) && PageHuge(page)) to
>> short-circuit the call while remaining future-proof.
>
> How about this?

Yeah (see below)

> ---
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 91b7f9b2b774..bb8a70e8fc77 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -547,7 +547,13 @@ static inline void ClearPageCompound(struct page *page)
>   #endif /* !PAGEFLAGS_EXTENDED */
>
>   #ifdef CONFIG_HUGETLB_PAGE
> -int PageHuge(struct page *page);
> +int __PageHuge(struct page *page);
> +static inline int PageHuge(struct page *page)
> +{
> +	if (!PageCompound(page))

Perhaps the above as likely()?

[...]

> -EXPORT_SYMBOL_GPL(PageHuge);
> +EXPORT_SYMBOL_GPL(__PageHuge);
>
>   /*
>    * PageHeadHuge() only returns true for hugetlbfs head page, but not for
>

Do the same thing here by inlining the PageHead() test?
I guess the page_to_pgoff and __compound_tail_refcounted callers are 
rather hot?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-06-02  9:33           ` Vlastimil Babka
@ 2015-06-02  9:38             ` Michal Hocko
  2015-06-02 10:00               ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2015-06-02  9:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Mel Gorman, Johannes Weiner, Andrew Morton, Naoya Horiguchi,
	linux-mm, LKML

On Tue 02-06-15 11:33:05, Vlastimil Babka wrote:
> On 06/02/2015 11:25 AM, Michal Hocko wrote:
> >On Mon 25-05-15 17:24:28, Vlastimil Babka wrote:
> >>On 05/22/2015 04:35 PM, Mel Gorman wrote:
> >>>>
> >>>>Thanks!
> >>>>
> >>>>>This makes a lot of sense to me.  The only thing I worry about is the
> >>>>>proliferation of PageHuge(), a function call, in relatively hot paths.
> >>>>
> >>>>I've tried that (see the patch below) but it enlarged the code by almost
> >>>>1k
> >>>>    text    data     bss     dec     hex filename
> >>>>  510323   74273   44440  629036   9992c mm/built-in.o.before
> >>>>  511248   74273   44440  629961   99cc9 mm/built-in.o.after
> >>>>
> >>>>I am not sure the code size increase is worth it. Maybe we can reduce
> >>>>the check to only PageCompound(page) as huge pages are no in the page
> >>>>cache (yet).
> >>>>
> >>>
> >>>That would be a more sensible route because it also avoids exposing the
> >>>hugetlbfs destructor unnecessarily.
> >>
> >>You could maybe do test such as (PageCompound(page) && PageHuge(page)) to
> >>short-circuit the call while remaining future-proof.
> >
> >How about this?
> 
> Yeah (see below)
> 
> >---
> >diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> >index 91b7f9b2b774..bb8a70e8fc77 100644
> >--- a/include/linux/page-flags.h
> >+++ b/include/linux/page-flags.h
> >@@ -547,7 +547,13 @@ static inline void ClearPageCompound(struct page *page)
> >  #endif /* !PAGEFLAGS_EXTENDED */
> >
> >  #ifdef CONFIG_HUGETLB_PAGE
> >-int PageHuge(struct page *page);
> >+int __PageHuge(struct page *page);
> >+static inline int PageHuge(struct page *page)
> >+{
> >+	if (!PageCompound(page))
> 
> Perhaps the above as likely()?

I have added it already when writing the changelog.

> [...]
> 
> >-EXPORT_SYMBOL_GPL(PageHuge);
> >+EXPORT_SYMBOL_GPL(__PageHuge);
> >
> >  /*
> >   * PageHeadHuge() only returns true for hugetlbfs head page, but not for
> >
> 
> Do the same thing here by inlining the PageHead() test?
> I guess the page_to_pgoff and __compound_tail_refcounted callers are rather
> hot?

Yes, that sounds like a good idea.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES
  2015-06-02  9:38             ` Michal Hocko
@ 2015-06-02 10:00               ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2015-06-02 10:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Mel Gorman, Johannes Weiner, Andrew Morton, Naoya Horiguchi,
	linux-mm, LKML

On Tue 02-06-15 11:38:05, Michal Hocko wrote:
> On Tue 02-06-15 11:33:05, Vlastimil Babka wrote:
> > On 06/02/2015 11:25 AM, Michal Hocko wrote:
[...]
> > >diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > >index 91b7f9b2b774..bb8a70e8fc77 100644
> > >--- a/include/linux/page-flags.h
> > >+++ b/include/linux/page-flags.h
> > >@@ -547,7 +547,13 @@ static inline void ClearPageCompound(struct page *page)
> > >  #endif /* !PAGEFLAGS_EXTENDED */
> > >
> > >  #ifdef CONFIG_HUGETLB_PAGE
> > >-int PageHuge(struct page *page);
> > >+int __PageHuge(struct page *page);
> > >+static inline int PageHuge(struct page *page)
> > >+{
> > >+	if (!PageCompound(page))
> > 
> > Perhaps the above as likely()?
> 
> I have added it already when writing the changelog.
> 
> > [...]
> > 
> > >-EXPORT_SYMBOL_GPL(PageHuge);
> > >+EXPORT_SYMBOL_GPL(__PageHuge);
> > >
> > >  /*
> > >   * PageHeadHuge() only returns true for hugetlbfs head page, but not for
> > >
> > 
> > Do the same thing here by inlining the PageHead() test?
> > I guess the page_to_pgoff and __compound_tail_refcounted callers are rather
> > hot?
> 
> Yes, that sounds like a good idea.

So the overal codesize (with defconfig) has still grown with the patch:
   text    data     bss     dec     hex filename
 443075   59217   25604  527896   80e18 mm/built-in.o.before
 443477   59217   25604  528298   80faa mm/built-in.o.PageHuge
 443653   59217   25604  528474   8105a mm/built-in.o.both

It is still not ~1K with the full inline but quite large on its own.
So I am not sure it makes sense to fiddle with this without actually
seeing some penalty in profiles.

Here is what I have if somebody wants to play with it:
---

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-06-02 10:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-21 13:27 [PATCH] hugetlb: Do not account hugetlb pages as NR_FILE_PAGES Michal Hocko
2015-05-21 13:52 ` Mel Gorman
2015-05-21 16:18 ` Mike Kravetz
2015-05-22  6:01   ` Michal Hocko
2015-05-21 17:09 ` Johannes Weiner
2015-05-22 14:21   ` Michal Hocko
2015-05-22 14:28     ` Michal Hocko
2015-05-22 14:35     ` Mel Gorman
2015-05-25 15:24       ` Vlastimil Babka
2015-06-02  9:25         ` Michal Hocko
2015-06-02  9:33           ` Vlastimil Babka
2015-06-02  9:38             ` Michal Hocko
2015-06-02 10:00               ` Michal Hocko
2015-05-22  5:09 ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).