* 2.6.21 numa policy and huge pages not working
@ 2007-05-16 5:41 dean gaudet
2007-05-16 6:12 ` William Lee Irwin III
0 siblings, 1 reply; 16+ messages in thread
From: dean gaudet @ 2007-05-16 5:41 UTC (permalink / raw)
To: linux-kernel
prior to 2.6.21 i could "numactl --interleave=all" and use SHM_HUGETLB and
the interleave policy would be respected. as of 2.6.21 it doesn't seem to
respect the policy on SHM_HUGETLB request.
see test program below.
output from pre-2.6.21:
2ab196200000 interleave=0-3 file=/2\040(deleted) huge dirty=32 N0=8 N1=8 N2=8 N3=8
2ab19a200000 default file=/SYSV00000000\040(deleted) dirty=16384 active=0 N0=4096 N1=4096 N2=4096 N3=4096
output from 2.6.21:
2b49b1c00000 default file=/10\040(deleted) huge dirty=32 N3=32
2b49b5c00000 default file=/SYSV00000000\040(deleted) dirty=16384 active=0 N0=4096 N1=4096 N2=4096 N3=4096
was this an intentional behaviour change? it seems to be only affecting
SHM_HUGETLB allocations. (i haven't tested hugetlbfs yet.)
run with "numactl --interleave=all ./shmtest"
-dean
/* shmtest.c */
#include <sys/mman.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
static void *alloc_arena_shm(size_t arena_size, unsigned flags)
{
FILE *fh;
char buf[512];
size_t huge_page_size;
char *p;
int shmid;
void *arena;
// find Hugepagesize in /proc/meminfo
if ((fh = fopen("/proc/meminfo", "r")) == NULL) {
perror("open(/proc/meminfo)");
exit(1);
}
for (;;) {
if (fgets(buf, sizeof(buf)-1, fh) == NULL) {
fprintf(stderr, "didn't find Hugepagesize in /proc/meminfo");
exit(1);
}
buf[sizeof(buf)-1] = '\0';
if (strncmp(buf, "Hugepagesize:", 13) == 0) break;
}
p = strchr(buf, ':') + 1;
huge_page_size = strtoul(p, 0, 0) * 1024;
fclose(fh);
// round the size up to multiple of huge_page_size
arena_size = (arena_size + huge_page_size - 1) & ~(huge_page_size - 1);
shmid = shmget(IPC_PRIVATE, arena_size, IPC_CREAT|IPC_EXCL|flags|0600);
if (shmid == -1) {
perror("shmget");
exit(1);
}
arena = shmat(shmid, NULL, 0);
if (arena == (void *)-1) {
perror("shmat");
exit(1);
}
if (shmctl(shmid, IPC_RMID, 0) == -1) {
perror("shmctl warning");
}
return arena;
}
int main(int argc, char **argv)
{
char buf[1024];
const size_t sz = 64*1024*1024;
void *arena = alloc_arena_shm(sz, SHM_HUGETLB);
memset(arena, 0, sz);
snprintf(buf, sizeof(buf), "grep ^%llx /proc/%d/numa_maps", (unsigned long long)arena, (int)getpid());
system(buf);
arena = alloc_arena_shm(sz, 0);
memset(arena, 0, sz);
snprintf(buf, sizeof(buf), "grep ^%llx /proc/%d/numa_maps", (unsigned long long)arena, (int)getpid());
system(buf);
return 0;
}
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.21 numa policy and huge pages not working
2007-05-16 5:41 2.6.21 numa policy and huge pages not working dean gaudet
@ 2007-05-16 6:12 ` William Lee Irwin III
2007-06-09 18:06 ` dean gaudet
2007-06-10 4:10 ` dean gaudet
0 siblings, 2 replies; 16+ messages in thread
From: William Lee Irwin III @ 2007-05-16 6:12 UTC (permalink / raw)
To: dean gaudet; +Cc: linux-kernel
On Tue, May 15, 2007 at 10:41:06PM -0700, dean gaudet wrote:
> prior to 2.6.21 i could "numactl --interleave=all" and use SHM_HUGETLB and
> the interleave policy would be respected. as of 2.6.21 it doesn't seem to
> respect the policy on SHM_HUGETLB request.
> see test program below.
> output from pre-2.6.21:
> 2ab196200000 interleave=0-3 file=/2\040(deleted) huge dirty=32 N0=8 N1=8 N2=8 N3=8
> 2ab19a200000 default file=/SYSV00000000\040(deleted) dirty=16384 active=0 N0=4096 N1=4096 N2=4096 N3=4096
> output from 2.6.21:
> 2b49b1c00000 default file=/10\040(deleted) huge dirty=32 N3=32
> 2b49b5c00000 default file=/SYSV00000000\040(deleted) dirty=16384 active=0 N0=4096 N1=4096 N2=4096 N3=4096
> was this an intentional behaviour change? it seems to be only affecting
> SHM_HUGETLB allocations. (i haven't tested hugetlbfs yet.)
> run with "numactl --interleave=all ./shmtest"
This was not intentional. I'll search for where it broke.
-- wli
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.21 numa policy and huge pages not working
2007-05-16 6:12 ` William Lee Irwin III
@ 2007-06-09 18:06 ` dean gaudet
2007-06-10 4:10 ` dean gaudet
1 sibling, 0 replies; 16+ messages in thread
From: dean gaudet @ 2007-06-09 18:06 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: linux-kernel
On Tue, 15 May 2007, William Lee Irwin III wrote:
> On Tue, May 15, 2007 at 10:41:06PM -0700, dean gaudet wrote:
> > prior to 2.6.21 i could "numactl --interleave=all" and use SHM_HUGETLB and
> > the interleave policy would be respected. as of 2.6.21 it doesn't seem to
> > respect the policy on SHM_HUGETLB request.
> > see test program below.
> > output from pre-2.6.21:
> > 2ab196200000 interleave=0-3 file=/2\040(deleted) huge dirty=32 N0=8 N1=8 N2=8 N3=8
> > 2ab19a200000 default file=/SYSV00000000\040(deleted) dirty=16384 active=0 N0=4096 N1=4096 N2=4096 N3=4096
> > output from 2.6.21:
> > 2b49b1c00000 default file=/10\040(deleted) huge dirty=32 N3=32
> > 2b49b5c00000 default file=/SYSV00000000\040(deleted) dirty=16384 active=0 N0=4096 N1=4096 N2=4096 N3=4096
> > was this an intentional behaviour change? it seems to be only affecting
> > SHM_HUGETLB allocations. (i haven't tested hugetlbfs yet.)
> > run with "numactl --interleave=all ./shmtest"
>
> This was not intentional. I'll search for where it broke.
any luck? i just tested with 2.6.22-rc4 and it's still broken... hmm
maybe i should learn how to git bisect.
-dean
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.21 numa policy and huge pages not working
2007-05-16 6:12 ` William Lee Irwin III
2007-06-09 18:06 ` dean gaudet
@ 2007-06-10 4:10 ` dean gaudet
2007-06-10 4:51 ` William Lee Irwin III
` (2 more replies)
1 sibling, 3 replies; 16+ messages in thread
From: dean gaudet @ 2007-06-10 4:10 UTC (permalink / raw)
To: William Lee Irwin III, Eric W. Biederman, Adam Litke; +Cc: linux-kernel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1559 bytes --]
On Tue, 15 May 2007, William Lee Irwin III wrote:
> On Tue, May 15, 2007 at 10:41:06PM -0700, dean gaudet wrote:
> > prior to 2.6.21 i could "numactl --interleave=all" and use SHM_HUGETLB and
> > the interleave policy would be respected. as of 2.6.21 it doesn't seem to
> > respect the policy on SHM_HUGETLB request.
> > see test program below.
> > output from pre-2.6.21:
> > 2ab196200000 interleave=0-3 file=/2\040(deleted) huge dirty=32 N0=8 N1=8 N2=8 N3=8
> > 2ab19a200000 default file=/SYSV00000000\040(deleted) dirty=16384 active=0 N0=4096 N1=4096 N2=4096 N3=4096
> > output from 2.6.21:
> > 2b49b1c00000 default file=/10\040(deleted) huge dirty=32 N3=32
> > 2b49b5c00000 default file=/SYSV00000000\040(deleted) dirty=16384 active=0 N0=4096 N1=4096 N2=4096 N3=4096
> > was this an intentional behaviour change? it seems to be only affecting
> > SHM_HUGETLB allocations. (i haven't tested hugetlbfs yet.)
> > run with "numactl --interleave=all ./shmtest"
>
> This was not intentional. I'll search for where it broke.
ok i've narrowed it some... maybe.
in commit 8ef8286689c6b5bc76212437b85bdd2ba749ee44 things work fine, numa
policy is respected...
the very next commit bc56bba8f31bd99f350a5ebfd43d50f411b620c7 breaks shm
badly causing the test program to oops the kernel.
commit 516dffdcd8827a40532798602830dfcfc672294c fixes that breakage but
numa policy is no longer respected.
i've added the authors of those two commits to the recipient list and
reattached the test program. hopefully someone can shed light on the
problem.
-dean
[-- Attachment #2: Type: TEXT/x-csrc, Size: 1797 bytes --]
#include <sys/mman.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
static void *alloc_arena_shm(size_t arena_size, unsigned flags)
{
FILE *fh;
char buf[512];
size_t huge_page_size;
char *p;
int shmid;
void *arena;
// find Hugepagesize in /proc/meminfo
if ((fh = fopen("/proc/meminfo", "r")) == NULL) {
perror("open(/proc/meminfo)");
exit(1);
}
for (;;) {
if (fgets(buf, sizeof(buf)-1, fh) == NULL) {
fprintf(stderr, "didn't find Hugepagesize in /proc/meminfo");
exit(1);
}
buf[sizeof(buf)-1] = '\0';
if (strncmp(buf, "Hugepagesize:", 13) == 0) break;
}
p = strchr(buf, ':') + 1;
huge_page_size = strtoul(p, 0, 0) * 1024;
fclose(fh);
// round the size up to multiple of huge_page_size
arena_size = (arena_size + huge_page_size - 1) & ~(huge_page_size - 1);
shmid = shmget(IPC_PRIVATE, arena_size, IPC_CREAT|IPC_EXCL|flags|0600);
if (shmid == -1) {
perror("shmget");
exit(1);
}
arena = shmat(shmid, NULL, 0);
if (arena == (void *)-1) {
perror("shmat");
exit(1);
}
if (shmctl(shmid, IPC_RMID, 0) == -1) {
perror("shmctl warning");
}
return arena;
}
int main(int argc, char **argv)
{
char buf[1024];
const size_t sz = 64*1024*1024;
void *arena = alloc_arena_shm(sz, SHM_HUGETLB);
memset(arena, 0, sz);
snprintf(buf, sizeof(buf), "grep ^%llx /proc/%d/numa_maps", (unsigned long long)arena, (int)getpid());
system(buf);
arena = alloc_arena_shm(sz, 0);
memset(arena, 0, sz);
snprintf(buf, sizeof(buf), "grep ^%llx /proc/%d/numa_maps", (unsigned long long)arena, (int)getpid());
system(buf);
return 0;
}
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.21 numa policy and huge pages not working
2007-06-10 4:10 ` dean gaudet
@ 2007-06-10 4:51 ` William Lee Irwin III
2007-06-11 16:23 ` Adam Litke
2007-06-11 21:34 ` [shm][hugetlb] Fix get_policy for stacked shared memory files Adam Litke
2 siblings, 0 replies; 16+ messages in thread
From: William Lee Irwin III @ 2007-06-10 4:51 UTC (permalink / raw)
To: dean gaudet; +Cc: Eric W. Biederman, Adam Litke, linux-kernel
On Sat, Jun 09, 2007 at 09:10:51PM -0700, dean gaudet wrote:
> ok i've narrowed it some... maybe.
> in commit 8ef8286689c6b5bc76212437b85bdd2ba749ee44 things work fine, numa
> policy is respected...
> the very next commit bc56bba8f31bd99f350a5ebfd43d50f411b620c7 breaks shm
> badly causing the test program to oops the kernel.
> commit 516dffdcd8827a40532798602830dfcfc672294c fixes that breakage but
> numa policy is no longer respected.
> i've added the authors of those two commits to the recipient list and
> reattached the test program. hopefully someone can shed light on the
> problem.
Thanks, this helps a lot.
-- wli
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.21 numa policy and huge pages not working
2007-06-10 4:10 ` dean gaudet
2007-06-10 4:51 ` William Lee Irwin III
@ 2007-06-11 16:23 ` Adam Litke
2007-06-11 21:34 ` [shm][hugetlb] Fix get_policy for stacked shared memory files Adam Litke
2 siblings, 0 replies; 16+ messages in thread
From: Adam Litke @ 2007-06-11 16:23 UTC (permalink / raw)
To: dean gaudet; +Cc: William Lee Irwin III, Eric W. Biederman, linux-kernel
On Sat, 2007-06-09 at 21:10 -0700, dean gaudet wrote:
> On Tue, 15 May 2007, William Lee Irwin III wrote:
>
> > On Tue, May 15, 2007 at 10:41:06PM -0700, dean gaudet wrote:
> > > prior to 2.6.21 i could "numactl --interleave=all" and use SHM_HUGETLB and
> > > the interleave policy would be respected. as of 2.6.21 it doesn't seem to
> > > respect the policy on SHM_HUGETLB request.
> > > see test program below.
> > > output from pre-2.6.21:
> > > 2ab196200000 interleave=0-3 file=/2\040(deleted) huge dirty=32 N0=8 N1=8 N2=8 N3=8
> > > 2ab19a200000 default file=/SYSV00000000\040(deleted) dirty=16384 active=0 N0=4096 N1=4096 N2=4096 N3=4096
> > > output from 2.6.21:
> > > 2b49b1c00000 default file=/10\040(deleted) huge dirty=32 N3=32
> > > 2b49b5c00000 default file=/SYSV00000000\040(deleted) dirty=16384 active=0 N0=4096 N1=4096 N2=4096 N3=4096
> > > was this an intentional behaviour change? it seems to be only affecting
> > > SHM_HUGETLB allocations. (i haven't tested hugetlbfs yet.)
> > > run with "numactl --interleave=all ./shmtest"
> >
> > This was not intentional. I'll search for where it broke.
>
> ok i've narrowed it some... maybe.
Thanks a lot for the detailed information. I am on it.
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 16+ messages in thread
* [shm][hugetlb] Fix get_policy for stacked shared memory files
2007-06-10 4:10 ` dean gaudet
2007-06-10 4:51 ` William Lee Irwin III
2007-06-11 16:23 ` Adam Litke
@ 2007-06-11 21:34 ` Adam Litke
2007-06-12 3:36 ` dean gaudet
` (3 more replies)
2 siblings, 4 replies; 16+ messages in thread
From: Adam Litke @ 2007-06-11 21:34 UTC (permalink / raw)
To: dean gaudet
Cc: William Lee Irwin III, Eric W. Biederman, linux-kernel, ak,
clameter
Here's another breakage as a result of shared memory stacked files :(
The NUMA policy for a VMA is determined by checking the following (in the order
given):
1) vma->vm_ops->get_policy() (if defined)
2) vma->vm_policy (if defined)
3) task->mempolicy (if defined)
4) Fall back to default_policy
By switching to stacked files for shared memory, get_policy() is now always set
to shm_get_policy which is a wrapper function. This causes us to stop at step
1, which yields NULL for hugetlb instead of task->mempolicy which was the
previous (and correct) result.
This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the
wrapped vm_ops. Andi and Christoph, does this look right to you?
Signed-off-by: Adam Litke <agl@us.ibm.com>
diff --git a/ipc/shm.c b/ipc/shm.c
index 4fefbad..8d2672d 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -254,8 +254,10 @@ struct mempolicy *shm_get_policy(struct vm_area_struct *vma, unsigned long addr)
if (sfd->vm_ops->get_policy)
pol = sfd->vm_ops->get_policy(vma, addr);
- else
+ else if (vma->vm_policy)
pol = vma->vm_policy;
+ else
+ pol = current->mempolicy;
return pol;
}
#endif
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [shm][hugetlb] Fix get_policy for stacked shared memory files
2007-06-11 21:34 ` [shm][hugetlb] Fix get_policy for stacked shared memory files Adam Litke
@ 2007-06-12 3:36 ` dean gaudet
2007-06-12 3:51 ` William Lee Irwin III
` (2 subsequent siblings)
3 siblings, 0 replies; 16+ messages in thread
From: dean gaudet @ 2007-06-12 3:36 UTC (permalink / raw)
To: Adam Litke
Cc: William Lee Irwin III, Eric W. Biederman, linux-kernel, ak,
clameter
On Mon, 11 Jun 2007, Adam Litke wrote:
> Here's another breakage as a result of shared memory stacked files :(
>
> The NUMA policy for a VMA is determined by checking the following (in the order
> given):
>
> 1) vma->vm_ops->get_policy() (if defined)
> 2) vma->vm_policy (if defined)
> 3) task->mempolicy (if defined)
> 4) Fall back to default_policy
>
> By switching to stacked files for shared memory, get_policy() is now always set
> to shm_get_policy which is a wrapper function. This causes us to stop at step
> 1, which yields NULL for hugetlb instead of task->mempolicy which was the
> previous (and correct) result.
>
> This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the
> wrapped vm_ops. Andi and Christoph, does this look right to you?
thanks for the patch -- seems to do the trick for me. it seems like it
would be a candidate for stable series as well.
-dean
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [shm][hugetlb] Fix get_policy for stacked shared memory files
2007-06-11 21:34 ` [shm][hugetlb] Fix get_policy for stacked shared memory files Adam Litke
2007-06-12 3:36 ` dean gaudet
@ 2007-06-12 3:51 ` William Lee Irwin III
2007-06-12 4:30 ` Andrew Morton
2007-06-12 6:20 ` Eric W. Biederman
3 siblings, 0 replies; 16+ messages in thread
From: William Lee Irwin III @ 2007-06-12 3:51 UTC (permalink / raw)
To: Adam Litke; +Cc: dean gaudet, Eric W. Biederman, linux-kernel, ak, clameter
On Mon, Jun 11, 2007 at 04:34:54PM -0500, Adam Litke wrote:
> Here's another breakage as a result of shared memory stacked files :(
> The NUMA policy for a VMA is determined by checking the following (in
> the order given):
> 1) vma->vm_ops->get_policy() (if defined)
> 2) vma->vm_policy (if defined)
> 3) task->mempolicy (if defined)
> 4) Fall back to default_policy
> By switching to stacked files for shared memory, get_policy() is now
> always set to shm_get_policy which is a wrapper function. This
> causes us to stop at step 1, which yields NULL for hugetlb instead of
> task->mempolicy which was the previous (and correct) result.
> This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the
> wrapped vm_ops. Andi and Christoph, does this look right to you?
> Signed-off-by: Adam Litke <agl@us.ibm.com>
Thanks for fielding this. The fix is certainly clear enough.
Acked-by: William Irwin <bill.irwin@oracle.com>
-- wli
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [shm][hugetlb] Fix get_policy for stacked shared memory files
2007-06-11 21:34 ` [shm][hugetlb] Fix get_policy for stacked shared memory files Adam Litke
2007-06-12 3:36 ` dean gaudet
2007-06-12 3:51 ` William Lee Irwin III
@ 2007-06-12 4:30 ` Andrew Morton
2007-06-12 4:48 ` William Lee Irwin III
2007-06-12 10:20 ` Andi Kleen
2007-06-12 6:20 ` Eric W. Biederman
3 siblings, 2 replies; 16+ messages in thread
From: Andrew Morton @ 2007-06-12 4:30 UTC (permalink / raw)
To: Adam Litke
Cc: dean gaudet, William Lee Irwin III, Eric W. Biederman,
linux-kernel, ak, clameter
On Mon, 11 Jun 2007 16:34:54 -0500 Adam Litke <agl@us.ibm.com> wrote:
> Here's another breakage as a result of shared memory stacked files :(
>
> The NUMA policy for a VMA is determined by checking the following (in the order
> given):
>
> 1) vma->vm_ops->get_policy() (if defined)
> 2) vma->vm_policy (if defined)
> 3) task->mempolicy (if defined)
> 4) Fall back to default_policy
>
> By switching to stacked files for shared memory, get_policy() is now always set
> to shm_get_policy which is a wrapper function. This causes us to stop at step
> 1, which yields NULL for hugetlb instead of task->mempolicy which was the
> previous (and correct) result.
>
> This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the
> wrapped vm_ops. Andi and Christoph, does this look right to you?
>
Can we just double-check the refcounting please?
> index 4fefbad..8d2672d 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -254,8 +254,10 @@ struct mempolicy *shm_get_policy(struct vm_area_struct *vma, unsigned long addr)
>
> if (sfd->vm_ops->get_policy)
> pol = sfd->vm_ops->get_policy(vma, addr);
afacit this takes a ref on the underlying policy
> - else
> + else if (vma->vm_policy)
> pol = vma->vm_policy;
> + else
> + pol = current->mempolicy;
but these two do not.
> return pol;
> }
> #endif
Is is all correct?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [shm][hugetlb] Fix get_policy for stacked shared memory files
2007-06-12 4:30 ` Andrew Morton
@ 2007-06-12 4:48 ` William Lee Irwin III
2007-06-12 10:20 ` Andi Kleen
1 sibling, 0 replies; 16+ messages in thread
From: William Lee Irwin III @ 2007-06-12 4:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Adam Litke, dean gaudet, Eric W. Biederman, linux-kernel, ak,
clameter
On Mon, Jun 11, 2007 at 09:30:20PM -0700, Andrew Morton wrote:
> Can we just double-check the refcounting please?
The refcounting for mpol's doesn't look good in general. I'm more
curious as to what releases the refcounts. alloc_page_vma(), for
instance, does get_vma_policy() which eventually takes a reference,
without ever releasing the reference it acquires. get_vma_policy()
itself uses a similar idiom to that used in aglitke's patch. I think
mpol refcounting needs to be addressed elsewhere besides this patch.
-- wli
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [shm][hugetlb] Fix get_policy for stacked shared memory files
2007-06-11 21:34 ` [shm][hugetlb] Fix get_policy for stacked shared memory files Adam Litke
` (2 preceding siblings ...)
2007-06-12 4:30 ` Andrew Morton
@ 2007-06-12 6:20 ` Eric W. Biederman
2007-06-12 6:58 ` William Lee Irwin III
2007-06-12 14:32 ` Adam Litke
3 siblings, 2 replies; 16+ messages in thread
From: Eric W. Biederman @ 2007-06-12 6:20 UTC (permalink / raw)
To: Adam Litke; +Cc: dean gaudet, William Lee Irwin III, linux-kernel, ak, clameter
Adam Litke <agl@us.ibm.com> writes:
> Here's another breakage as a result of shared memory stacked files :(
>
> The NUMA policy for a VMA is determined by checking the following (in the order
> given):
>
> 1) vma->vm_ops->get_policy() (if defined)
> 2) vma->vm_policy (if defined)
> 3) task->mempolicy (if defined)
> 4) Fall back to default_policy
>
> By switching to stacked files for shared memory, get_policy() is now always set
> to shm_get_policy which is a wrapper function. This causes us to stop at step
> 1, which yields NULL for hugetlb instead of task->mempolicy which was the
> previous (and correct) result.
>
> This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the
> wrapped vm_ops. Andi and Christoph, does this look right to you?
I'm confused.
I agree that the behavior you describe is correct.
However I only see two code paths were get_policy is called and
both of them take a NULL result and change it to task->mempolicy:
>From mm/mempolicy.c
> long do_get_mempolicy(int *policy, nodemask_t *nmask,
> unsigned long addr, unsigned long flags)
> {
> int err;
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma = NULL;
> struct mempolicy *pol = current->mempolicy;
>
> cpuset_update_task_memory_state();
> if (flags & ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR))
> return -EINVAL;
> if (flags & MPOL_F_ADDR) {
> down_read(&mm->mmap_sem);
> vma = find_vma_intersection(mm, addr, addr+1);
> if (!vma) {
> up_read(&mm->mmap_sem);
> return -EFAULT;
> }
> if (vma->vm_ops && vma->vm_ops->get_policy)
> pol = vma->vm_ops->get_policy(vma, addr);
> else
> pol = vma->vm_policy;
> } else if (addr)
> return -EINVAL;
>
> if (!pol)
> pol = &default_policy;
> /* Return effective policy for a VMA */
> static struct mempolicy * get_vma_policy(struct task_struct *task,
> struct vm_area_struct *vma, unsigned long addr)
> {
> struct mempolicy *pol = task->mempolicy;
>
> if (vma) {
> if (vma->vm_ops && vma->vm_ops->get_policy)
> pol = vma->vm_ops->get_policy(vma, addr);
> else if (vma->vm_policy &&
> vma->vm_policy->policy != MPOL_DEFAULT)
> pol = vma->vm_policy;
> }
> if (!pol)
> pol = &default_policy;
> return pol;
> }
Does this perhaps need to be:
> Signed-off-by: Adam Litke <agl@us.ibm.com>
>
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 4fefbad..8d2672d 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -254,8 +254,10 @@ struct mempolicy *shm_get_policy(struct vm_area_struct
> *vma, unsigned long addr)
+ pol = NULL;
>
> if (sfd->vm_ops->get_policy)
> pol = sfd->vm_ops->get_policy(vma, addr);
> - else
> + else if (vma->vm_policy && vma->vm_policy->policy != MPOL_DEFAULT)
> pol = vma->vm_policy;
> return pol;
> }
> #endif
Sorry I'm just a little dense at the moment.
Eric
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [shm][hugetlb] Fix get_policy for stacked shared memory files
2007-06-12 6:20 ` Eric W. Biederman
@ 2007-06-12 6:58 ` William Lee Irwin III
2007-06-12 14:32 ` Adam Litke
1 sibling, 0 replies; 16+ messages in thread
From: William Lee Irwin III @ 2007-06-12 6:58 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Adam Litke, dean gaudet, linux-kernel, ak, clameter
On Tue, Jun 12, 2007 at 12:20:52AM -0600, Eric W. Biederman wrote:
> Does this perhaps need to be:
>> diff --git a/ipc/shm.c b/ipc/shm.c
>> index 4fefbad..8d2672d 100644
>> --- a/ipc/shm.c
>> +++ b/ipc/shm.c
>> @@ -254,8 +254,10 @@ struct mempolicy *shm_get_policy(struct vm_area_struct
>> *vma, unsigned long addr)
>>
>> + pol = NULL;
>>
>> if (sfd->vm_ops->get_policy)
>> pol = sfd->vm_ops->get_policy(vma, addr);
>> - else
>> + else if (vma->vm_policy && vma->vm_policy->policy != MPOL_DEFAULT)
>> pol = vma->vm_policy;
>> return pol;
Those paths are above the level where shm_get_policy() is called.
It may be that shm_get_policy() doesn't need to recapitulate them
if it's only ever called through such codepaths. It's not clear to
me whether that's intended as an invariant or is coincidental and
not guaranteed for future callsites.
-- wli
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [shm][hugetlb] Fix get_policy for stacked shared memory files
2007-06-12 4:30 ` Andrew Morton
2007-06-12 4:48 ` William Lee Irwin III
@ 2007-06-12 10:20 ` Andi Kleen
1 sibling, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2007-06-12 10:20 UTC (permalink / raw)
To: Andrew Morton
Cc: Adam Litke, dean gaudet, William Lee Irwin III, Eric W. Biederman,
linux-kernel, clameter
> Can we just double-check the refcounting please?
>
> > index 4fefbad..8d2672d 100644
> > --- a/ipc/shm.c
> > +++ b/ipc/shm.c
> > @@ -254,8 +254,10 @@ struct mempolicy *shm_get_policy(struct vm_area_struct *vma, unsigned long addr)
> >
> > if (sfd->vm_ops->get_policy)
> > pol = sfd->vm_ops->get_policy(vma, addr);
>
> afacit this takes a ref on the underlying policy
>
> > - else
> > + else if (vma->vm_policy)
> > pol = vma->vm_policy;
> > + else
> > + pol = current->mempolicy;
>
> but these two do not.
>
> > return pol;
> > }
> > #endif
>
> Is is all correct?
No it looks broken.
-Andi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [shm][hugetlb] Fix get_policy for stacked shared memory files
2007-06-12 6:20 ` Eric W. Biederman
2007-06-12 6:58 ` William Lee Irwin III
@ 2007-06-12 14:32 ` Adam Litke
2007-06-12 18:22 ` Eric W. Biederman
1 sibling, 1 reply; 16+ messages in thread
From: Adam Litke @ 2007-06-12 14:32 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Adam Litke, dean gaudet, William Lee Irwin III, linux-kernel, ak,
clameter
On 6/12/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Adam Litke <agl@us.ibm.com> writes:
>
> > Here's another breakage as a result of shared memory stacked files :(
> >
> > The NUMA policy for a VMA is determined by checking the following (in the order
> > given):
> >
> > 1) vma->vm_ops->get_policy() (if defined)
> > 2) vma->vm_policy (if defined)
> > 3) task->mempolicy (if defined)
> > 4) Fall back to default_policy
> >
> > By switching to stacked files for shared memory, get_policy() is now always set
> > to shm_get_policy which is a wrapper function. This causes us to stop at step
> > 1, which yields NULL for hugetlb instead of task->mempolicy which was the
> > previous (and correct) result.
> >
> > This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the
> > wrapped vm_ops. Andi and Christoph, does this look right to you?
>
> I'm confused.
>
> I agree that the behavior you describe is correct.
> However I only see two code paths were get_policy is called and
> both of them take a NULL result and change it to task->mempolicy:
The coffee hasn't quite absorbed yet, but don't those two code paths
take a NULL result from get_policy() and turn it into default_policy,
not task->mempolicy?
> From mm/mempolicy.c
>
> > long do_get_mempolicy(int *policy, nodemask_t *nmask,
> > unsigned long addr, unsigned long flags)
> > {
> > int err;
> > struct mm_struct *mm = current->mm;
> > struct vm_area_struct *vma = NULL;
> > struct mempolicy *pol = current->mempolicy;
> >
> > cpuset_update_task_memory_state();
> > if (flags & ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR))
> > return -EINVAL;
> > if (flags & MPOL_F_ADDR) {
> > down_read(&mm->mmap_sem);
> > vma = find_vma_intersection(mm, addr, addr+1);
> > if (!vma) {
> > up_read(&mm->mmap_sem);
> > return -EFAULT;
> > }
> > if (vma->vm_ops && vma->vm_ops->get_policy)
> > pol = vma->vm_ops->get_policy(vma, addr);
> > else
> > pol = vma->vm_policy;
> > } else if (addr)
> > return -EINVAL;
> >
> > if (!pol)
> > pol = &default_policy;
>
>
>
> > /* Return effective policy for a VMA */
> > static struct mempolicy * get_vma_policy(struct task_struct *task,
> > struct vm_area_struct *vma, unsigned long addr)
> > {
> > struct mempolicy *pol = task->mempolicy;
> >
> > if (vma) {
> > if (vma->vm_ops && vma->vm_ops->get_policy)
> > pol = vma->vm_ops->get_policy(vma, addr);
> > else if (vma->vm_policy &&
> > vma->vm_policy->policy != MPOL_DEFAULT)
> > pol = vma->vm_policy;
> > }
> > if (!pol)
> > pol = &default_policy;
> > return pol;
> > }
>
>
> Does this perhaps need to be:
> > Signed-off-by: Adam Litke <agl@us.ibm.com>
> >
> > diff --git a/ipc/shm.c b/ipc/shm.c
> > index 4fefbad..8d2672d 100644
> > --- a/ipc/shm.c
> > +++ b/ipc/shm.c
> > @@ -254,8 +254,10 @@ struct mempolicy *shm_get_policy(struct vm_area_struct
> > *vma, unsigned long addr)
>
> + pol = NULL;
> >
> > if (sfd->vm_ops->get_policy)
> > pol = sfd->vm_ops->get_policy(vma, addr);
> > - else
> > + else if (vma->vm_policy && vma->vm_policy->policy != MPOL_DEFAULT)
> > pol = vma->vm_policy;
> > return pol;
> > }
> > #endif
afaict this would provide no way for pol to be set to task->mempolicy
for hugetlb per my comment above.
--
Adam Litke ( agl at us.ibm.com )
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [shm][hugetlb] Fix get_policy for stacked shared memory files
2007-06-12 14:32 ` Adam Litke
@ 2007-06-12 18:22 ` Eric W. Biederman
0 siblings, 0 replies; 16+ messages in thread
From: Eric W. Biederman @ 2007-06-12 18:22 UTC (permalink / raw)
To: Adam Litke
Cc: Adam Litke, dean gaudet, William Lee Irwin III, linux-kernel, ak,
clameter
"Adam Litke" <aglitke@gmail.com> writes:
> On 6/12/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Adam Litke <agl@us.ibm.com> writes:
>>
>> > Here's another breakage as a result of shared memory stacked files :(
>> >
>> > The NUMA policy for a VMA is determined by checking the following (in the
> order
>> > given):
Where that doesn't appear to be what the current code does.
If there is a VMA it doesn't appear that we check task->mempolicy.
>> > 1) vma->vm_ops->get_policy() (if defined)
>> > 2) vma->vm_policy (if defined)
>> > 3) task->mempolicy (if defined)
>> > 4) Fall back to default_policy
>> >
>> > By switching to stacked files for shared memory, get_policy() is now always
> set
>> > to shm_get_policy which is a wrapper function. This causes us to stop at
> step
>> > 1, which yields NULL for hugetlb instead of task->mempolicy which was the
>> > previous (and correct) result.
>> >
>> > This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for
> the
>> > wrapped vm_ops. Andi and Christoph, does this look right to you?
>>
>> I'm confused.
>>
>> I agree that the behavior you describe is correct.
>> However I only see two code paths were get_policy is called and
>> both of them take a NULL result and change it to task->mempolicy:
>
> The coffee hasn't quite absorbed yet, but don't those two code paths
> take a NULL result from get_policy() and turn it into default_policy,
> not task->mempolicy?
True, a NULL is turned into default_policy. However the basic
confusion remains. I don't see how we ever return task->mempolicy
on those paths if we have a vma, and those appear to be the only
call sites of get_policy.
>> From mm/mempolicy.c
>>
>> > long do_get_mempolicy(int *policy, nodemask_t *nmask,
>> > unsigned long addr, unsigned long flags)
>> > {
>> > int err;
>> > struct mm_struct *mm = current->mm;
>> > struct vm_area_struct *vma = NULL;
>> > struct mempolicy *pol = current->mempolicy;
>> >
>> > cpuset_update_task_memory_state();
>> > if (flags & ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR))
>> > return -EINVAL;
>> > if (flags & MPOL_F_ADDR) {
>> > down_read(&mm->mmap_sem);
>> > vma = find_vma_intersection(mm, addr, addr+1);
>> > if (!vma) {
>> > up_read(&mm->mmap_sem);
>> > return -EFAULT;
>> > }
>> > if (vma->vm_ops && vma->vm_ops->get_policy)
>> > pol = vma->vm_ops->get_policy(vma, addr);
>> > else
>> > pol = vma->vm_policy;
>> > } else if (addr)
>> > return -EINVAL;
>> >
>> > if (!pol)
>> > pol = &default_policy;
>>
>>
>>
>> > /* Return effective policy for a VMA */
>> > static struct mempolicy * get_vma_policy(struct task_struct *task,
>> > struct vm_area_struct *vma, unsigned long addr)
>> > {
>> > struct mempolicy *pol = task->mempolicy;
>> >
>> > if (vma) {
>> > if (vma->vm_ops && vma->vm_ops->get_policy)
>> > pol = vma->vm_ops->get_policy(vma, addr);
>> > else if (vma->vm_policy &&
>> > vma->vm_policy->policy != MPOL_DEFAULT)
>> > pol = vma->vm_policy;
>> > }
>> > if (!pol)
>> > pol = &default_policy;
>> > return pol;
>> > }
>>
>>
>> Does this perhaps need to be:
>> > Signed-off-by: Adam Litke <agl@us.ibm.com>
>> >
>> > diff --git a/ipc/shm.c b/ipc/shm.c
>> > index 4fefbad..8d2672d 100644
>> > --- a/ipc/shm.c
>> > +++ b/ipc/shm.c
>> > @@ -254,8 +254,10 @@ struct mempolicy *shm_get_policy(struct vm_area_struct
>> > *vma, unsigned long addr)
>>
>> + pol = NULL;
>> >
>> > if (sfd->vm_ops->get_policy)
>> > pol = sfd->vm_ops->get_policy(vma, addr);
>> > - else
>> > + else if (vma->vm_policy && vma->vm_policy->policy != MPOL_DEFAULT)
>> > pol = vma->vm_policy;
>> > return pol;
>> > }
>> > #endif
>
> afaict this would provide no way for pol to be set to task->mempolicy
> for hugetlb per my comment above.
Being dense I don't see a way for us to have ever returned task->mempolicy.
So perhaps something else changed. Or something was made more
consistent and another bug was revealed.
I don't truly see the bug in shm_get_policy. I can see why it would
not have the expected affect but that is different.
Eric
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-06-12 18:23 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-16 5:41 2.6.21 numa policy and huge pages not working dean gaudet
2007-05-16 6:12 ` William Lee Irwin III
2007-06-09 18:06 ` dean gaudet
2007-06-10 4:10 ` dean gaudet
2007-06-10 4:51 ` William Lee Irwin III
2007-06-11 16:23 ` Adam Litke
2007-06-11 21:34 ` [shm][hugetlb] Fix get_policy for stacked shared memory files Adam Litke
2007-06-12 3:36 ` dean gaudet
2007-06-12 3:51 ` William Lee Irwin III
2007-06-12 4:30 ` Andrew Morton
2007-06-12 4:48 ` William Lee Irwin III
2007-06-12 10:20 ` Andi Kleen
2007-06-12 6:20 ` Eric W. Biederman
2007-06-12 6:58 ` William Lee Irwin III
2007-06-12 14:32 ` Adam Litke
2007-06-12 18:22 ` Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox