Transparent huge page collapse and NUMA

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Transparent huge page collapse and NUMA
@ 2013-08-20 16:05 Andrew Davidoff
  2013-08-21 15:25 ` Kirill A. Shutemov
  0 siblings, 1 reply; 2+ messages in thread
From: Andrew Davidoff @ 2013-08-20 16:05 UTC (permalink / raw)
  To: linux-mm

Hi,

In an effort to learn more about transparent huge pages and NUMA, I
have written a very simple C snippet that malloc()s in a loop. I am
running this under numactl with an interleave policy across both the
NUMA nodes in the system. To make watching allocation progress easier,
I am malloc()ing 4k (1 page) at a time.

If I watch node usage for the process (numa_maps) allocation looks
correct (interleave), but then allocation will drop on one node and
increase on another, at the same time as I see an increase in
pages_collapsed. It appears as though pages are always migrating away
from and to the same nodes, resulting in allocation (again, by
examining numa_maps) being almost entirely on one node.

This leads me to believe that khugepaged's defrag is to blame, though
I am not certain. I tried to disable transparent huge page defrag
completely via the following under /sys:

/sys/kernel/mm/transparent_hugepage/defrag
/sys/kernel/mm/transparent_hugepage/khugepaged/defrag

but the same behavior persists. I am not sure if this is an indication
that I don't know how to control transparent huge page collapse, or or
that my issue isn't defrag/collapse related.

Do I understand what I am seeing? Does anyone have any thoughts on this?

The OS is CentOS5.8 running the Oracle Unbreakable Kernel 2,
2.6.39-400.109.4.el5uek.

Further questions:

The way I understand it, transparent_hugepage/defrag controls defrag
on page fault, and transparent_hugepage/khugepaged/defrag controls
maintenance defrag (time based). Is that correct?

Thanks.
Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Transparent huge page collapse and NUMA
  2013-08-20 16:05 Transparent huge page collapse and NUMA Andrew Davidoff
@ 2013-08-21 15:25 ` Kirill A. Shutemov
  0 siblings, 0 replies; 2+ messages in thread
From: Kirill A. Shutemov @ 2013-08-21 15:25 UTC (permalink / raw)
  To: Andrew Davidoff; +Cc: linux-mm, Andrea Arcangeli

On Tue, Aug 20, 2013 at 12:05:03PM -0400, Andrew Davidoff wrote:
> Hi,
> 
> In an effort to learn more about transparent huge pages and NUMA, I
> have written a very simple C snippet that malloc()s in a loop. I am
> running this under numactl with an interleave policy across both the
> NUMA nodes in the system. To make watching allocation progress easier,
> I am malloc()ing 4k (1 page) at a time.
> 
> If I watch node usage for the process (numa_maps) allocation looks
> correct (interleave), but then allocation will drop on one node and
> increase on another, at the same time as I see an increase in
> pages_collapsed. It appears as though pages are always migrating away
> from and to the same nodes, resulting in allocation (again, by
> examining numa_maps) being almost entirely on one node.

khugepaged strategy for NUMA is pretty simplistic: it tries to allocate on
the node the first small page is belong to. See khugepaged_scan_pmd().
It probably should be improved.

> This leads me to believe that khugepaged's defrag is to blame, though
> I am not certain. I tried to disable transparent huge page defrag
> completely via the following under /sys:
> 
> /sys/kernel/mm/transparent_hugepage/defrag
> /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
> 
> but the same behavior persists. I am not sure if this is an indication
> that I don't know how to control transparent huge page collapse, or or
> that my issue isn't defrag/collapse related.

defrag knob only affects whether we want to use __GFP_WAIT for huge page
allocation, but not collapse itself. It basically means whether we want
kernel to defrag the memory to find suitable huge page window.

The only way to stop collapse fully is

echo never > /sys/kernel/mm/transparent_hugepage/enabled

Probably, we should introduce a knob.

> 
> Do I understand what I am seeing? Does anyone have any thoughts on this?
> 
> The OS is CentOS5.8 running the Oracle Unbreakable Kernel 2,
> 2.6.39-400.109.4.el5uek.
> 
> Further questions:
> 
> The way I understand it, transparent_hugepage/defrag controls defrag
> on page fault, and transparent_hugepage/khugepaged/defrag controls
> maintenance defrag (time based). Is that correct?

Yes.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-08-21 15:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-20 16:05 Transparent huge page collapse and NUMA Andrew Davidoff
2013-08-21 15:25 ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).