* [bug report & help] arm64: ltp testcase "migrate_pages01" failed @ 2017-10-16 11:42 Tan Xiaojun 2017-10-17 2:58 ` Tan Xiaojun 0 siblings, 1 reply; 6+ messages in thread From: Tan Xiaojun @ 2017-10-16 11:42 UTC (permalink / raw) To: linux-arm-kernel Hi all, I test ltp in Hisilicon D05 board and get a failed result about the testcase "migrate_pages01". In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to find a invalid numa node and migrate memory pages to this node via syscall of "migrate_pages". The expected result of this case is returning "-1", but it actually return "0". -------------------------------------------------------- # ./migrate_pages01 migrate_pages01 0 TINFO : test_empty_mask migrate_pages01 1 TPASS : expected ret success: returned value = 0 migrate_pages01 0 TINFO : test_invalid_pid -1 migrate_pages01 2 TPASS : expected ret success: returned value = -1 migrate_pages01 3 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No such process migrate_pages01 0 TINFO : test_invalid_pid unused pid migrate_pages01 4 TPASS : expected ret success: returned value = -1 migrate_pages01 5 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No such process migrate_pages01 0 TINFO : test_invalid_masksize migrate_pages01 6 TPASS : expected ret success: returned value = -1 migrate_pages01 7 TPASS : expected failure: TEST_ERRNO=EINVAL(22): Invalid argument migrate_pages01 0 TINFO : test_invalid_mem -1 migrate_pages01 8 TPASS : expected ret success: returned value = -1 migrate_pages01 9 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address migrate_pages01 0 TINFO : test_invalid_mem invalid prot migrate_pages01 10 TPASS : expected ret success: returned value = -1 migrate_pages01 11 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address migrate_pages01 0 TINFO : test_invalid_mem unmmaped migrate_pages01 12 TPASS : expected ret success: returned value = -1 migrate_pages01 13 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address migrate_pages01 0 TINFO : test_invalid_nodes migrate_pages01 14 TFAIL : migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1 migrate_pages01 15 TFAIL : migrate_pages_common.c:55: call succeeded unexpectedly migrate_pages01 0 TINFO : test_invalid_perm migrate_pages01 16 TPASS : expected ret success: returned value = -1 migrate_pages01 17 TPASS : expected failure: TEST_ERRNO=EPERM(1): Operation not permitted -------------------------------------------------------- I debug and find a interesting thing, this case does not always fail. 1) If one or several numa nodes have no memory, this case will run successfully like below: -------------------- available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 65309 MB node 0 free: 61650 MB node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 65404 MB node 1 free: 61377 MB node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 node 2 size: 65401 MB node 2 free: 62316 MB node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 node 3 size: 0 MB node 3 free: 0 MB node distances: node 0 1 2 3 0: 10 15 20 20 1: 15 10 20 20 2: 20 20 10 15 3: 20 20 15 10 --------------------- This testcase will find node number 3 and migrate pages to node 3. And syscall of "migrate_pages" return -1, test succeeded. 2) In most cases, all nodes have memory, and the testcase will get non-existent node like node number 4. The syscall of "migrate_pages" should also return -1, but return 0 actually. So the testcase failed. I think it is a problem in arm64. But I am not familiar with numa, so I ask for help from you. Thanks. Xiaojun. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [bug report & help] arm64: ltp testcase "migrate_pages01" failed 2017-10-16 11:42 [bug report & help] arm64: ltp testcase "migrate_pages01" failed Tan Xiaojun @ 2017-10-17 2:58 ` Tan Xiaojun 2017-10-17 9:23 ` Will Deacon 0 siblings, 1 reply; 6+ messages in thread From: Tan Xiaojun @ 2017-10-17 2:58 UTC (permalink / raw) To: linux-arm-kernel Hi all, I'm not sure if this is the problem on arm64 numa. What do you think ? By the way, this testcase can be successful in any case on x86. Thanks. Xiaojun. On 2017/10/16 19:42, Tan Xiaojun wrote: > Hi all, > > I test ltp in Hisilicon D05 board and get a failed result about the testcase "migrate_pages01". > > In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to find a invalid numa node and migrate memory pages to this node via syscall of "migrate_pages". > The expected result of this case is returning "-1", but it actually return "0". > > -------------------------------------------------------- > # ./migrate_pages01 > migrate_pages01 0 TINFO : test_empty_mask > migrate_pages01 1 TPASS : expected ret success: returned value = 0 > migrate_pages01 0 TINFO : test_invalid_pid -1 > migrate_pages01 2 TPASS : expected ret success: returned value = -1 > migrate_pages01 3 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No such process > migrate_pages01 0 TINFO : test_invalid_pid unused pid > migrate_pages01 4 TPASS : expected ret success: returned value = -1 > migrate_pages01 5 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No such process > migrate_pages01 0 TINFO : test_invalid_masksize > migrate_pages01 6 TPASS : expected ret success: returned value = -1 > migrate_pages01 7 TPASS : expected failure: TEST_ERRNO=EINVAL(22): Invalid argument > migrate_pages01 0 TINFO : test_invalid_mem -1 > migrate_pages01 8 TPASS : expected ret success: returned value = -1 > migrate_pages01 9 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address > migrate_pages01 0 TINFO : test_invalid_mem invalid prot > migrate_pages01 10 TPASS : expected ret success: returned value = -1 > migrate_pages01 11 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address > migrate_pages01 0 TINFO : test_invalid_mem unmmaped > migrate_pages01 12 TPASS : expected ret success: returned value = -1 > migrate_pages01 13 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address > migrate_pages01 0 TINFO : test_invalid_nodes > migrate_pages01 14 TFAIL : migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1 > migrate_pages01 15 TFAIL : migrate_pages_common.c:55: call succeeded unexpectedly > migrate_pages01 0 TINFO : test_invalid_perm > migrate_pages01 16 TPASS : expected ret success: returned value = -1 > migrate_pages01 17 TPASS : expected failure: TEST_ERRNO=EPERM(1): Operation not permitted > -------------------------------------------------------- > > I debug and find a interesting thing, this case does not always fail. > > 1) If one or several numa nodes have no memory, this case will run successfully like below: > > -------------------- > available: 4 nodes (0-3) > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > node 0 size: 65309 MB > node 0 free: 61650 MB > node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 > node 1 size: 65404 MB > node 1 free: 61377 MB > node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 > node 2 size: 65401 MB > node 2 free: 62316 MB > node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 > node 3 size: 0 MB > node 3 free: 0 MB > node distances: > node 0 1 2 3 > 0: 10 15 20 20 > 1: 15 10 20 20 > 2: 20 20 10 15 > 3: 20 20 15 10 > --------------------- > > This testcase will find node number 3 and migrate pages to node 3. And syscall of "migrate_pages" return -1, test succeeded. > > 2) In most cases, all nodes have memory, and the testcase will get non-existent node like node number 4. The syscall of "migrate_pages" should also return -1, but return 0 actually. > So the testcase failed. > > I think it is a problem in arm64. But I am not familiar with numa, so I ask for help from you. > > Thanks. > Xiaojun. > > > . > ^ permalink raw reply [flat|nested] 6+ messages in thread
* [bug report & help] arm64: ltp testcase "migrate_pages01" failed 2017-10-17 2:58 ` Tan Xiaojun @ 2017-10-17 9:23 ` Will Deacon 2017-10-17 11:33 ` Tan Xiaojun 2017-10-17 13:19 ` Yisheng Xie 0 siblings, 2 replies; 6+ messages in thread From: Will Deacon @ 2017-10-17 9:23 UTC (permalink / raw) To: linux-arm-kernel On Tue, Oct 17, 2017 at 10:58:53AM +0800, Tan Xiaojun wrote: > I'm not sure if this is the problem on arm64 numa. What do you think ? > By the way, this testcase can be successful in any case on x86. To be honest, this isn't a particularly helpful bug report. I appreciate that a test is reporting failure, but it doesn't look like you've spent very much effort to understand what the test is trying to do and why it thinks it's failed to do it. All I can sensibly do with your bug report is run the test myself, and it passes on the systems I have available. So, you need to: 1. Understand what the test is doing. 2. Figure out which bit isn't doing what it's supposed to 3. See if that part can be isolated to trigger the problem At that point, it should be possible to describe the unexpected behaviour at a level which we can actually investigate if necessary. Will > On 2017/10/16 19:42, Tan Xiaojun wrote: > > Hi all, > > > > I test ltp in Hisilicon D05 board and get a failed result about the testcase "migrate_pages01". > > > > In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to find a invalid numa node and migrate memory pages to this node via syscall of "migrate_pages". > > The expected result of this case is returning "-1", but it actually return "0". > > > > -------------------------------------------------------- > > # ./migrate_pages01 > > migrate_pages01 0 TINFO : test_empty_mask > > migrate_pages01 1 TPASS : expected ret success: returned value = 0 > > migrate_pages01 0 TINFO : test_invalid_pid -1 > > migrate_pages01 2 TPASS : expected ret success: returned value = -1 > > migrate_pages01 3 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No such process > > migrate_pages01 0 TINFO : test_invalid_pid unused pid > > migrate_pages01 4 TPASS : expected ret success: returned value = -1 > > migrate_pages01 5 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No such process > > migrate_pages01 0 TINFO : test_invalid_masksize > > migrate_pages01 6 TPASS : expected ret success: returned value = -1 > > migrate_pages01 7 TPASS : expected failure: TEST_ERRNO=EINVAL(22): Invalid argument > > migrate_pages01 0 TINFO : test_invalid_mem -1 > > migrate_pages01 8 TPASS : expected ret success: returned value = -1 > > migrate_pages01 9 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address > > migrate_pages01 0 TINFO : test_invalid_mem invalid prot > > migrate_pages01 10 TPASS : expected ret success: returned value = -1 > > migrate_pages01 11 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address > > migrate_pages01 0 TINFO : test_invalid_mem unmmaped > > migrate_pages01 12 TPASS : expected ret success: returned value = -1 > > migrate_pages01 13 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address > > migrate_pages01 0 TINFO : test_invalid_nodes > > migrate_pages01 14 TFAIL : migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1 > > migrate_pages01 15 TFAIL : migrate_pages_common.c:55: call succeeded unexpectedly > > migrate_pages01 0 TINFO : test_invalid_perm > > migrate_pages01 16 TPASS : expected ret success: returned value = -1 > > migrate_pages01 17 TPASS : expected failure: TEST_ERRNO=EPERM(1): Operation not permitted > > -------------------------------------------------------- > > > > I debug and find a interesting thing, this case does not always fail. > > > > 1) If one or several numa nodes have no memory, this case will run successfully like below: > > > > -------------------- > > available: 4 nodes (0-3) > > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > > node 0 size: 65309 MB > > node 0 free: 61650 MB > > node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 > > node 1 size: 65404 MB > > node 1 free: 61377 MB > > node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 > > node 2 size: 65401 MB > > node 2 free: 62316 MB > > node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 > > node 3 size: 0 MB > > node 3 free: 0 MB > > node distances: > > node 0 1 2 3 > > 0: 10 15 20 20 > > 1: 15 10 20 20 > > 2: 20 20 10 15 > > 3: 20 20 15 10 > > --------------------- > > > > This testcase will find node number 3 and migrate pages to node 3. And syscall of "migrate_pages" return -1, test succeeded. > > > > 2) In most cases, all nodes have memory, and the testcase will get non-existent node like node number 4. The syscall of "migrate_pages" should also return -1, but return 0 actually. > > So the testcase failed. > > > > I think it is a problem in arm64. But I am not familiar with numa, so I ask for help from you. > > > > Thanks. > > Xiaojun. > > > > > > . > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* [bug report & help] arm64: ltp testcase "migrate_pages01" failed 2017-10-17 9:23 ` Will Deacon @ 2017-10-17 11:33 ` Tan Xiaojun 2017-10-17 13:19 ` Yisheng Xie 1 sibling, 0 replies; 6+ messages in thread From: Tan Xiaojun @ 2017-10-17 11:33 UTC (permalink / raw) To: linux-arm-kernel On 2017/10/17 17:23, Will Deacon wrote: > On Tue, Oct 17, 2017 at 10:58:53AM +0800, Tan Xiaojun wrote: >> I'm not sure if this is the problem on arm64 numa. What do you think ? >> By the way, this testcase can be successful in any case on x86. > > To be honest, this isn't a particularly helpful bug report. I appreciate > that a test is reporting failure, but it doesn't look like you've spent > very much effort to understand what the test is trying to do and why it > thinks it's failed to do it. All I can sensibly do with your bug report > is run the test myself, and it passes on the systems I have available. > > So, you need to: > > 1. Understand what the test is doing. > 2. Figure out which bit isn't doing what it's supposed to > 3. See if that part can be isolated to trigger the problem > > At that point, it should be possible to describe the unexpected behaviour > at a level which we can actually investigate if necessary. > > Will > OK. I will do these things. And can you tell me the model of your test board? If possible, run "numactl -H" to make sure all nodes have memory. Thanks. Xiaojun. >> On 2017/10/16 19:42, Tan Xiaojun wrote: >>> Hi all, >>> >>> I test ltp in Hisilicon D05 board and get a failed result about the testcase "migrate_pages01". >>> >>> In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to find a invalid numa node and migrate memory pages to this node via syscall of "migrate_pages". >>> The expected result of this case is returning "-1", but it actually return "0". >>> >>> -------------------------------------------------------- >>> # ./migrate_pages01 >>> migrate_pages01 0 TINFO : test_empty_mask >>> migrate_pages01 1 TPASS : expected ret success: returned value = 0 >>> migrate_pages01 0 TINFO : test_invalid_pid -1 >>> migrate_pages01 2 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 3 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No such process >>> migrate_pages01 0 TINFO : test_invalid_pid unused pid >>> migrate_pages01 4 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 5 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No such process >>> migrate_pages01 0 TINFO : test_invalid_masksize >>> migrate_pages01 6 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 7 TPASS : expected failure: TEST_ERRNO=EINVAL(22): Invalid argument >>> migrate_pages01 0 TINFO : test_invalid_mem -1 >>> migrate_pages01 8 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 9 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address >>> migrate_pages01 0 TINFO : test_invalid_mem invalid prot >>> migrate_pages01 10 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 11 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address >>> migrate_pages01 0 TINFO : test_invalid_mem unmmaped >>> migrate_pages01 12 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 13 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address >>> migrate_pages01 0 TINFO : test_invalid_nodes >>> migrate_pages01 14 TFAIL : migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1 >>> migrate_pages01 15 TFAIL : migrate_pages_common.c:55: call succeeded unexpectedly >>> migrate_pages01 0 TINFO : test_invalid_perm >>> migrate_pages01 16 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 17 TPASS : expected failure: TEST_ERRNO=EPERM(1): Operation not permitted >>> -------------------------------------------------------- >>> >>> I debug and find a interesting thing, this case does not always fail. >>> >>> 1) If one or several numa nodes have no memory, this case will run successfully like below: >>> >>> -------------------- >>> available: 4 nodes (0-3) >>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >>> node 0 size: 65309 MB >>> node 0 free: 61650 MB >>> node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 >>> node 1 size: 65404 MB >>> node 1 free: 61377 MB >>> node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 >>> node 2 size: 65401 MB >>> node 2 free: 62316 MB >>> node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 >>> node 3 size: 0 MB >>> node 3 free: 0 MB >>> node distances: >>> node 0 1 2 3 >>> 0: 10 15 20 20 >>> 1: 15 10 20 20 >>> 2: 20 20 10 15 >>> 3: 20 20 15 10 >>> --------------------- >>> >>> This testcase will find node number 3 and migrate pages to node 3. And syscall of "migrate_pages" return -1, test succeeded. >>> >>> 2) In most cases, all nodes have memory, and the testcase will get non-existent node like node number 4. The syscall of "migrate_pages" should also return -1, but return 0 actually. >>> So the testcase failed. >>> >>> I think it is a problem in arm64. But I am not familiar with numa, so I ask for help from you. >>> >>> Thanks. >>> Xiaojun. >>> >>> >>> . >>> >> >> > > . > ^ permalink raw reply [flat|nested] 6+ messages in thread
* [bug report & help] arm64: ltp testcase "migrate_pages01" failed 2017-10-17 9:23 ` Will Deacon 2017-10-17 11:33 ` Tan Xiaojun @ 2017-10-17 13:19 ` Yisheng Xie 2017-10-18 1:45 ` Yisheng Xie 1 sibling, 1 reply; 6+ messages in thread From: Yisheng Xie @ 2017-10-17 13:19 UTC (permalink / raw) To: linux-arm-kernel Hi Will, On 2017/10/17 17:23, Will Deacon wrote: > On Tue, Oct 17, 2017 at 10:58:53AM +0800, Tan Xiaojun wrote: >> I'm not sure if this is the problem on arm64 numa. What do you think ? >> By the way, this testcase can be successful in any case on x86. > > To be honest, this isn't a particularly helpful bug report. I appreciate > that a test is reporting failure, but it doesn't look like you've spent > very much effort to understand what the test is trying to do and why it > thinks it's failed to do it. All I can sensibly do with your bug report > is run the test myself, and it passes on the systems I have available. > > So, you need to: > > 1. Understand what the test is doing. > 2. Figure out which bit isn't doing what it's supposed to > 3. See if that part can be isolated to trigger the problem > > At that point, it should be possible to describe the unexpected behaviour > at a level which we can actually investigate if necessary. This test case is to test whether we should migrate successfully if user call SYSC_migrate_pages with a invalid node. eg, we should 4 node 0-3, and try to migrate to node 4. And this should return -EINVAL. however, the kernel will migrate the memory to node 0 and return ok(e.g. 0). The root cause is for nodes_subset(*new, node_states[N_MEMORY]) will return true when new = 0x10 and node_states[N_MEMORY]=0xf, MAX_NUMNODES=4. And this is common issue, and I also can reproduce at certain config on X86-64 e.g. CONFIG_NODES_SHIFT=3 and have 8 node in the system. IMO, if nbits=4, 0x0 or 0x10, 0xFF..F0 should not a subset of anything, so following patch may fix this problem: From: Yisheng Xie <xieyisheng1@huawei.com> Date: Tue, 17 Oct 2017 20:53:55 +0800 Subject: [PATCH] bitmap: fix corner case of bitmap_subset As Xiaojun reported the ltp of migrate_pages01 will failed in system whoes has 4 node with CONFIG_NODES_SHIFT=2: migrate_pages01 0 TINFO : test_invalid_nodes migrate_pages01 14 TFAIL : migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1 migrate_pages01 15 TFAIL : migrate_pages_common.c:55: call succeeded unexpectedly and the root cause is nodes_subset(*new, node_states[N_MEMORY]) will return true in the case like: new = 0x10 and node_states[N_MEMORY]=0xf, MAX_NUMNODES=4. Fix it by correct the corner case of bitmap_subset, which makes 0x0 or 0x10, 0xFF..F0 not a subset of bitmap when bitmap lenth is 4. Reported-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> --- include/linux/bitmap.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 700cf5f..bc66978 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -283,6 +283,8 @@ static inline int bitmap_intersects(const unsigned long *src1, static inline int bitmap_subset(const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { + if(!(*src1 & BITMAP_LAST_WORD_MASK(nbits))) + return false; if (small_const_nbits(nbits)) return ! ((*src1 & ~(*src2)) & BITMAP_LAST_WORD_MASK(nbits)); else -- 1.7.12.4 Thanks Yisheng Xie > > Will > >> On 2017/10/16 19:42, Tan Xiaojun wrote: >>> Hi all, >>> >>> I test ltp in Hisilicon D05 board and get a failed result about the testcase "migrate_pages01". >>> >>> In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to find a invalid numa node and migrate memory pages to this node via syscall of "migrate_pages". >>> The expected result of this case is returning "-1", but it actually return "0". >>> >>> -------------------------------------------------------- >>> # ./migrate_pages01 >>> migrate_pages01 0 TINFO : test_empty_mask >>> migrate_pages01 1 TPASS : expected ret success: returned value = 0 >>> migrate_pages01 0 TINFO : test_invalid_pid -1 >>> migrate_pages01 2 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 3 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No such process >>> migrate_pages01 0 TINFO : test_invalid_pid unused pid >>> migrate_pages01 4 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 5 TPASS : expected failure: TEST_ERRNO=ESRCH(3): No such process >>> migrate_pages01 0 TINFO : test_invalid_masksize >>> migrate_pages01 6 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 7 TPASS : expected failure: TEST_ERRNO=EINVAL(22): Invalid argument >>> migrate_pages01 0 TINFO : test_invalid_mem -1 >>> migrate_pages01 8 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 9 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address >>> migrate_pages01 0 TINFO : test_invalid_mem invalid prot >>> migrate_pages01 10 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 11 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address >>> migrate_pages01 0 TINFO : test_invalid_mem unmmaped >>> migrate_pages01 12 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 13 TPASS : expected failure: TEST_ERRNO=EFAULT(14): Bad address >>> migrate_pages01 0 TINFO : test_invalid_nodes >>> migrate_pages01 14 TFAIL : migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1 >>> migrate_pages01 15 TFAIL : migrate_pages_common.c:55: call succeeded unexpectedly >>> migrate_pages01 0 TINFO : test_invalid_perm >>> migrate_pages01 16 TPASS : expected ret success: returned value = -1 >>> migrate_pages01 17 TPASS : expected failure: TEST_ERRNO=EPERM(1): Operation not permitted >>> -------------------------------------------------------- >>> >>> I debug and find a interesting thing, this case does not always fail. >>> >>> 1) If one or several numa nodes have no memory, this case will run successfully like below: >>> >>> -------------------- >>> available: 4 nodes (0-3) >>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >>> node 0 size: 65309 MB >>> node 0 free: 61650 MB >>> node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 >>> node 1 size: 65404 MB >>> node 1 free: 61377 MB >>> node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 >>> node 2 size: 65401 MB >>> node 2 free: 62316 MB >>> node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 >>> node 3 size: 0 MB >>> node 3 free: 0 MB >>> node distances: >>> node 0 1 2 3 >>> 0: 10 15 20 20 >>> 1: 15 10 20 20 >>> 2: 20 20 10 15 >>> 3: 20 20 15 10 >>> --------------------- >>> >>> This testcase will find node number 3 and migrate pages to node 3. And syscall of "migrate_pages" return -1, test succeeded. >>> >>> 2) In most cases, all nodes have memory, and the testcase will get non-existent node like node number 4. The syscall of "migrate_pages" should also return -1, but return 0 actually. >>> So the testcase failed. >>> >>> I think it is a problem in arm64. But I am not familiar with numa, so I ask for help from you. >>> >>> Thanks. >>> Xiaojun. >>> >>> >>> . >>> >> >> > > . > ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [bug report & help] arm64: ltp testcase "migrate_pages01" failed 2017-10-17 13:19 ` Yisheng Xie @ 2017-10-18 1:45 ` Yisheng Xie 0 siblings, 0 replies; 6+ messages in thread From: Yisheng Xie @ 2017-10-18 1:45 UTC (permalink / raw) To: linux-arm-kernel Hi Will, On 2017/10/17 21:19, Yisheng Xie wrote: > Hi Will, > > On 2017/10/17 17:23, Will Deacon wrote: >> On Tue, Oct 17, 2017 at 10:58:53AM +0800, Tan Xiaojun wrote: >>> I'm not sure if this is the problem on arm64 numa. What do you think ? >>> By the way, this testcase can be successful in any case on x86. >> >> To be honest, this isn't a particularly helpful bug report. I appreciate >> that a test is reporting failure, but it doesn't look like you've spent >> very much effort to understand what the test is trying to do and why it >> thinks it's failed to do it. All I can sensibly do with your bug report >> is run the test myself, and it passes on the systems I have available. >> >> So, you need to: >> >> 1. Understand what the test is doing. >> 2. Figure out which bit isn't doing what it's supposed to >> 3. See if that part can be isolated to trigger the problem >> >> At that point, it should be possible to describe the unexpected behaviour >> at a level which we can actually investigate if necessary. > This test case is to test whether we should migrate successfully if user call > SYSC_migrate_pages with a invalid node. eg, we should 4 node 0-3, and try to > migrate to node 4. And this should return -EINVAL. > > however, the kernel will migrate the memory to node 0 and return ok(e.g. 0). > The root cause is for > nodes_subset(*new, node_states[N_MEMORY]) > > will return true when new = 0x10 and node_states[N_MEMORY]=0xf, MAX_NUMNODES=4. > > And this is common issue, and I also can reproduce at certain config on X86-64 > e.g. CONFIG_NODES_SHIFT=3 and have 8 node in the system. > > IMO, if nbits=4, 0x0 or 0x10, 0xFF..F0 should not a subset of anything, so following > patch may fix this problem: Sorry for having made a mathematics mistake, for empty set should be subset of any set. Please forget about following patch, and I will send a new one with node_empty check in SYSC_migrate_pages. Thanks Yisheng Xie > > From: Yisheng Xie <xieyisheng1@huawei.com> > Date: Tue, 17 Oct 2017 20:53:55 +0800 > Subject: [PATCH] bitmap: fix corner case of bitmap_subset > > As Xiaojun reported the ltp of migrate_pages01 will failed in system > whoes has 4 node with CONFIG_NODES_SHIFT=2: > > migrate_pages01 0 TINFO : test_invalid_nodes > migrate_pages01 14 TFAIL : migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1 > migrate_pages01 15 TFAIL : migrate_pages_common.c:55: call succeeded unexpectedly > > and the root cause is > nodes_subset(*new, node_states[N_MEMORY]) > > will return true in the case like: new = 0x10 and node_states[N_MEMORY]=0xf, > MAX_NUMNODES=4. > > Fix it by correct the corner case of bitmap_subset, which makes 0x0 or > 0x10, 0xFF..F0 not a subset of bitmap when bitmap lenth is 4. > > Reported-by: Tan Xiaojun <tanxiaojun@huawei.com> > Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> > --- > include/linux/bitmap.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h > index 700cf5f..bc66978 100644 > --- a/include/linux/bitmap.h > +++ b/include/linux/bitmap.h > @@ -283,6 +283,8 @@ static inline int bitmap_intersects(const unsigned long *src1, > static inline int bitmap_subset(const unsigned long *src1, > const unsigned long *src2, unsigned int nbits) > { > + if(!(*src1 & BITMAP_LAST_WORD_MASK(nbits))) > + return false; > if (small_const_nbits(nbits)) > return ! ((*src1 & ~(*src2)) & BITMAP_LAST_WORD_MASK(nbits)); > else > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-10-18 1:45 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-10-16 11:42 [bug report & help] arm64: ltp testcase "migrate_pages01" failed Tan Xiaojun 2017-10-17 2:58 ` Tan Xiaojun 2017-10-17 9:23 ` Will Deacon 2017-10-17 11:33 ` Tan Xiaojun 2017-10-17 13:19 ` Yisheng Xie 2017-10-18 1:45 ` Yisheng Xie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).