* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree [not found] <200705040104.l4414YFR026322@shell0.pdx.osdl.net> @ 2007-05-04 1:38 ` Paul Jackson 2007-05-04 4:49 ` Ken Chen 2007-05-04 6:45 ` Bill Irwin 0 siblings, 2 replies; 10+ messages in thread From: Paul Jackson @ 2007-05-04 1:38 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, mm-commits, kenchen, agl, hermes, wli, clameter Adding Christoph Lameter <clameter@sgi.com> to the cc list, as he knows more about hugetlb pages than I do. This patch strikes me as a bit odd. Granted, it's solving what could be a touchy problem with a fairly simple solution, which is usually a Good Thing(tm). However, the idea that different tasks would see different values for the following fields in /proc/meminfo: HugePages_Total: 0 HugePages_Free: 0 strikes me as odd, and risky. I would have thought that usually, all tasks in the system should see the same values in the files in /proc (as opposed to the files in particular task subdirectories /proc/<pid>.) This patch strikes me as a bit of a hack, good for compatibility, but hiding a booby trap that will bite some user code in the long run. But I'm not enough of an expert to know what the right tradeoffs are in this matter. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree 2007-05-04 1:38 ` + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree Paul Jackson @ 2007-05-04 4:49 ` Ken Chen 2007-05-04 5:03 ` Andrew Morton 2007-05-04 5:12 ` Paul Jackson 2007-05-04 6:45 ` Bill Irwin 1 sibling, 2 replies; 10+ messages in thread From: Ken Chen @ 2007-05-04 4:49 UTC (permalink / raw) To: Paul Jackson; +Cc: linux-kernel, akpm, mm-commits, agl, hermes, wli, clameter On 5/3/07, Paul Jackson <pj@sgi.com> wrote: > Adding Christoph Lameter <clameter@sgi.com> to the cc list, as he knows > more about hugetlb pages than I do. > > This patch strikes me as a bit odd. > > Granted, it's solving what could be a touchy problem with a fairly > simple solution, which is usually a Good Thing(tm). > > However, the idea that different tasks would see different values for > the following fields in /proc/meminfo: > > HugePages_Total: 0 > HugePages_Free: 0 > > strikes me as odd, and risky. I would have thought that usually, all > tasks in the system should see the same values in the files in /proc > (as opposed to the files in particular task subdirectories /proc/<pid>.) > > This patch strikes me as a bit of a hack, good for compatibility, but > hiding a booby trap that will bite some user code in the long run. > > But I'm not enough of an expert to know what the right tradeoffs are > in this matter. Would annotating the Hugepages_* field with name of cpuset help? I orginally thought that since cpuset's mems are hirearchical in memory assignment, it is fairly straightforward to understand what's going on: parent cpuset stats include its and all of its children. For example, if root cpuset has two sub job1 and job2 cpusets, each has 20 and 30 htlb pages, when query at each level, we have: [root@box]# echo $$ > /dev/cpuset/tasks [root@box]# grep HugePages_Total /proc/meminfo HugePages_Total: 50 [root@box]# echo $$ > /dev/cpuset/job1/tasks [root@box]# grep HugePages_Total /proc/meminfo HugePages_Total: 20 [root@box]# echo $$ > /dev/cpuset/job2/tasks [root@box]# grep HugePages_Total /proc/meminfo HugePages_Total: 30 If this is odd, do you have any suggestions for alternative? - Ken ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree 2007-05-04 4:49 ` Ken Chen @ 2007-05-04 5:03 ` Andrew Morton 2007-05-04 5:58 ` Paul Jackson 2007-05-04 5:12 ` Paul Jackson 1 sibling, 1 reply; 10+ messages in thread From: Andrew Morton @ 2007-05-04 5:03 UTC (permalink / raw) To: Ken Chen; +Cc: Paul Jackson, linux-kernel, agl, hermes, wli, clameter On Thu, 3 May 2007 21:49:12 -0700 "Ken Chen" <kenchen@google.com> wrote: > On 5/3/07, Paul Jackson <pj@sgi.com> wrote: > > Adding Christoph Lameter <clameter@sgi.com> to the cc list, as he knows > > more about hugetlb pages than I do. > > > > This patch strikes me as a bit odd. > > > > Granted, it's solving what could be a touchy problem with a fairly > > simple solution, which is usually a Good Thing(tm). > > > > However, the idea that different tasks would see different values for > > the following fields in /proc/meminfo: > > > > HugePages_Total: 0 > > HugePages_Free: 0 > > > > strikes me as odd, and risky. I would have thought that usually, all > > tasks in the system should see the same values in the files in /proc > > (as opposed to the files in particular task subdirectories /proc/<pid>.) > > > > This patch strikes me as a bit of a hack, good for compatibility, but > > hiding a booby trap that will bite some user code in the long run. > > > > But I'm not enough of an expert to know what the right tradeoffs are > > in this matter. > > Would annotating the Hugepages_* field with name of cpuset help? There are existing programs which parse /proc/meminfo. If we're going to do any of this then it would need to be via new fields. I don't think we should be altering the meaning of the HugePages fields like this. One can imagine scenarios in which such a change would cause existing userspace scripts to fail. Plus it's Just Weird to use /proc/meminfo in this manner. > I > orginally thought that since cpuset's mems are hirearchical in memory > assignment, it is fairly straightforward to understand what's going > on: parent cpuset stats include its and all of its children. For > example, if root cpuset has two sub job1 and job2 cpusets, each has 20 > and 30 htlb pages, when query at each level, we have: > > [root@box]# echo $$ > /dev/cpuset/tasks > [root@box]# grep HugePages_Total /proc/meminfo > HugePages_Total: 50 > > [root@box]# echo $$ > /dev/cpuset/job1/tasks > [root@box]# grep HugePages_Total /proc/meminfo > HugePages_Total: 20 > > [root@box]# echo $$ > /dev/cpuset/job2/tasks > [root@box]# grep HugePages_Total /proc/meminfo > HugePages_Total: 30 > > If this is odd, do you have any suggestions for alternative? If it's per-cpuset information then shouldn't it be presented in /dev/cpuset/something? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree 2007-05-04 5:03 ` Andrew Morton @ 2007-05-04 5:58 ` Paul Jackson 2007-05-04 6:13 ` Ken Chen 0 siblings, 1 reply; 10+ messages in thread From: Paul Jackson @ 2007-05-04 5:58 UTC (permalink / raw) To: Andrew Morton; +Cc: kenchen, linux-kernel, agl, hermes, wli, clameter Andrew wrote: > If it's per-cpuset information then shouldn't it be presented in > /dev/cpuset/something? Yeah - if huge pages were mainline future, rather than the more controversial sideline they are now, then it would make more sense to put in these stats in each cpuset. Note, Ken, that if we did that, the calculation of these new Total and Free stats would be a little different than your new code. Instead of looping over the memory nodes in the current tasks mems_allowed mask, we would loop over the memory nodes allowed in the cpuset being queried (the cpuset whose 'hugepages_total' or 'hugepages_free' special file we were reading, not the current tasks cpuset.) But I'm reluctant to entertain such cpuset additions until I see more of where my colleague Christoph is going in related work. Clearly as can be seen on one of his posts on the parallel lkml thread: Re: + pretend-cpuset-has-some-form-of-hugetlb-page-reservation.patch added to -mm tree earlier today, Christoph is no great fan of the current implementation of huge pages. And clearly as memory continues to get bigger, we will be putting more stress on these page size related mechanisms. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree 2007-05-04 5:58 ` Paul Jackson @ 2007-05-04 6:13 ` Ken Chen 2007-05-04 7:39 ` Paul Jackson 0 siblings, 1 reply; 10+ messages in thread From: Ken Chen @ 2007-05-04 6:13 UTC (permalink / raw) To: Paul Jackson; +Cc: Andrew Morton, linux-kernel, agl, hermes, wli, clameter On 5/3/07, Paul Jackson <pj@sgi.com> wrote: > Note, Ken, that if we did that, the calculation of these new Total and > Free stats would be a little different than your new code. Instead of > looping over the memory nodes in the current tasks mems_allowed mask, > we would loop over the memory nodes allowed in the cpuset being queried > (the cpuset whose 'hugepages_total' or 'hugepages_free' special > file we were reading, not the current tasks cpuset.) This is even more controversial and messy. akpm already dropped the patch and expressed that he doesn't like it. And I won't go down another messy path. I will let this idea RIP. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree 2007-05-04 6:13 ` Ken Chen @ 2007-05-04 7:39 ` Paul Jackson 0 siblings, 0 replies; 10+ messages in thread From: Paul Jackson @ 2007-05-04 7:39 UTC (permalink / raw) To: Ken Chen; +Cc: akpm, linux-kernel, agl, hermes, wli, clameter > I will let this idea RIP. Agreed. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree 2007-05-04 4:49 ` Ken Chen 2007-05-04 5:03 ` Andrew Morton @ 2007-05-04 5:12 ` Paul Jackson 2007-05-04 5:35 ` David Rientjes 1 sibling, 1 reply; 10+ messages in thread From: Paul Jackson @ 2007-05-04 5:12 UTC (permalink / raw) To: Ken Chen; +Cc: linux-kernel, akpm, mm-commits, agl, hermes, wli, clameter Ken wrote: > If this is odd, do you have any suggestions for alternative? No, I don't. Sorry. It's a touchy problem, and I'm not enough of an expert to know what the right tradeoffs are in this matter. I agree with your point that if you realize what's going on, namely that what cpuset the task reading meminfo is in affects the HugePages values that are read, then one can use the interface easily enough. ... how about: 1) don't change the existing HugePages_* values - keep them system-wide, and 2) adding two new values, by such names as: Current_Cpuset_HugePages_Total: 0 Current_Cpuset_HugePages_Free: 0 That's certainly an uglier proposal than yours ;). But at least it seems clearer, and doesn't make incompatible changes to what's there. It does require user level code change to actually benefit from the new values, whereas your patch sort of sneaks them in, on the assumption that the majority of reads of these values would really prefer getting the cpuset relative totals instead. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree 2007-05-04 5:12 ` Paul Jackson @ 2007-05-04 5:35 ` David Rientjes 2007-05-04 6:12 ` Paul Jackson 0 siblings, 1 reply; 10+ messages in thread From: David Rientjes @ 2007-05-04 5:35 UTC (permalink / raw) To: Paul Jackson Cc: Ken Chen, linux-kernel, akpm, mm-commits, agl, hermes, wli, clameter On Thu, 3 May 2007, Paul Jackson wrote: > 2) adding two new values, by such names as: > > Current_Cpuset_HugePages_Total: 0 > Current_Cpuset_HugePages_Free: 0 > This information is already exported to userspace through sysfs. Simply grab the N-mems allowed to your task from /proc/pid/status, cat /sys/devices/system/node/nodeN/meminfo for each N, and add. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree 2007-05-04 5:35 ` David Rientjes @ 2007-05-04 6:12 ` Paul Jackson 0 siblings, 0 replies; 10+ messages in thread From: Paul Jackson @ 2007-05-04 6:12 UTC (permalink / raw) To: David Rientjes Cc: kenchen, linux-kernel, akpm, mm-commits, agl, hermes, wli, clameter David wrote: > This information is already exported to userspace through sysfs. Simply > grab the N-mems allowed to your task from /proc/pid/status, cat > /sys/devices/system/node/nodeN/meminfo for each N, and add. Good point. I don't see how this present patch, to change /proc/meminfo, can be justified, given this. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree 2007-05-04 1:38 ` + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree Paul Jackson 2007-05-04 4:49 ` Ken Chen @ 2007-05-04 6:45 ` Bill Irwin 1 sibling, 0 replies; 10+ messages in thread From: Bill Irwin @ 2007-05-04 6:45 UTC (permalink / raw) To: Paul Jackson Cc: linux-kernel, akpm, mm-commits, kenchen, agl, hermes, clameter On Thu, May 03, 2007 at 06:38:21PM -0700, Paul Jackson wrote: > Adding Christoph Lameter <clameter@sgi.com> to the cc list, as he knows > more about hugetlb pages than I do. > This patch strikes me as a bit odd. > Granted, it's solving what could be a touchy problem with a fairly > simple solution, which is usually a Good Thing(tm). > However, the idea that different tasks would see different values for > the following fields in /proc/meminfo: > HugePages_Total: 0 > HugePages_Free: 0 > strikes me as odd, and risky. I would have thought that usually, all > tasks in the system should see the same values in the files in /proc > (as opposed to the files in particular task subdirectories /proc/<pid>.) > This patch strikes me as a bit of a hack, good for compatibility, but > hiding a booby trap that will bite some user code in the long run. > But I'm not enough of an expert to know what the right tradeoffs are > in this matter. The semantics of the global /proc/meminfo should not change; a separate per-cpuset reporting mechanism should really be used. -- wli ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-05-04 7:39 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200705040104.l4414YFR026322@shell0.pdx.osdl.net>
2007-05-04 1:38 ` + per-cpuset-hugetlb-accounting-and-administration.patch added to -mm tree Paul Jackson
2007-05-04 4:49 ` Ken Chen
2007-05-04 5:03 ` Andrew Morton
2007-05-04 5:58 ` Paul Jackson
2007-05-04 6:13 ` Ken Chen
2007-05-04 7:39 ` Paul Jackson
2007-05-04 5:12 ` Paul Jackson
2007-05-04 5:35 ` David Rientjes
2007-05-04 6:12 ` Paul Jackson
2007-05-04 6:45 ` Bill Irwin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox