linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [bug report & help] arm64: ltp testcase "migrate_pages01" failed
@ 2017-10-16 11:42 Tan Xiaojun
  2017-10-17  2:58 ` Tan Xiaojun
  0 siblings, 1 reply; 6+ messages in thread
From: Tan Xiaojun @ 2017-10-16 11:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

I test ltp in Hisilicon D05 board and get a failed result about the testcase "migrate_pages01".

In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to find a invalid numa node and migrate memory pages to this node via syscall of "migrate_pages".
The expected result of this case is returning "-1", but it actually return "0".

--------------------------------------------------------
# ./migrate_pages01
migrate_pages01    0  TINFO  :  test_empty_mask
migrate_pages01    1  TPASS  :  expected ret success: returned value = 0
migrate_pages01    0  TINFO  :  test_invalid_pid -1
migrate_pages01    2  TPASS  :  expected ret success: returned value = -1
migrate_pages01    3  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
migrate_pages01    0  TINFO  :  test_invalid_pid unused pid
migrate_pages01    4  TPASS  :  expected ret success: returned value = -1
migrate_pages01    5  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
migrate_pages01    0  TINFO  :  test_invalid_masksize
migrate_pages01    6  TPASS  :  expected ret success: returned value = -1
migrate_pages01    7  TPASS  :  expected failure: TEST_ERRNO=EINVAL(22): Invalid argument
migrate_pages01    0  TINFO  :  test_invalid_mem -1
migrate_pages01    8  TPASS  :  expected ret success: returned value = -1
migrate_pages01    9  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
migrate_pages01    0  TINFO  :  test_invalid_mem invalid prot
migrate_pages01   10  TPASS  :  expected ret success: returned value = -1
migrate_pages01   11  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
migrate_pages01    0  TINFO  :  test_invalid_mem unmmaped
migrate_pages01   12  TPASS  :  expected ret success: returned value = -1
migrate_pages01   13  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
migrate_pages01    0  TINFO  :  test_invalid_nodes
migrate_pages01   14  TFAIL  :  migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1
migrate_pages01   15  TFAIL  :  migrate_pages_common.c:55: call succeeded unexpectedly
migrate_pages01    0  TINFO  :  test_invalid_perm
migrate_pages01   16  TPASS  :  expected ret success: returned value = -1
migrate_pages01   17  TPASS  :  expected failure: TEST_ERRNO=EPERM(1): Operation not permitted
--------------------------------------------------------

I debug and find a interesting thing, this case does not always fail.

1) If one or several numa nodes have no memory, this case will run successfully like below:

--------------------
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 65309 MB
node 0 free: 61650 MB
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 65404 MB
node 1 free: 61377 MB
node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 2 size: 65401 MB
node 2 free: 62316 MB
node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 3 size: 0 MB
node 3 free: 0 MB
node distances:
node   0   1   2   3
  0:  10  15  20  20
  1:  15  10  20  20
  2:  20  20  10  15
  3:  20  20  15  10
---------------------

This testcase will find node number 3 and migrate pages to node 3. And syscall of "migrate_pages" return -1, test succeeded.

2) In most cases, all nodes have memory, and the testcase will get non-existent node like node number 4. The syscall of "migrate_pages" should also return -1, but return 0 actually.
So the testcase failed.

I think it is a problem in arm64. But I am not familiar with numa, so I ask for help from you.

Thanks.
Xiaojun.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [bug report & help] arm64: ltp testcase "migrate_pages01" failed
  2017-10-16 11:42 [bug report & help] arm64: ltp testcase "migrate_pages01" failed Tan Xiaojun
@ 2017-10-17  2:58 ` Tan Xiaojun
  2017-10-17  9:23   ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Tan Xiaojun @ 2017-10-17  2:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

I'm not sure if this is the problem on arm64 numa. What do you think ?
By the way, this testcase can be successful in any case on x86.

Thanks.
Xiaojun.

On 2017/10/16 19:42, Tan Xiaojun wrote:
> Hi all,
> 
> I test ltp in Hisilicon D05 board and get a failed result about the testcase "migrate_pages01".
> 
> In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to find a invalid numa node and migrate memory pages to this node via syscall of "migrate_pages".
> The expected result of this case is returning "-1", but it actually return "0".
> 
> --------------------------------------------------------
> # ./migrate_pages01
> migrate_pages01    0  TINFO  :  test_empty_mask
> migrate_pages01    1  TPASS  :  expected ret success: returned value = 0
> migrate_pages01    0  TINFO  :  test_invalid_pid -1
> migrate_pages01    2  TPASS  :  expected ret success: returned value = -1
> migrate_pages01    3  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
> migrate_pages01    0  TINFO  :  test_invalid_pid unused pid
> migrate_pages01    4  TPASS  :  expected ret success: returned value = -1
> migrate_pages01    5  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
> migrate_pages01    0  TINFO  :  test_invalid_masksize
> migrate_pages01    6  TPASS  :  expected ret success: returned value = -1
> migrate_pages01    7  TPASS  :  expected failure: TEST_ERRNO=EINVAL(22): Invalid argument
> migrate_pages01    0  TINFO  :  test_invalid_mem -1
> migrate_pages01    8  TPASS  :  expected ret success: returned value = -1
> migrate_pages01    9  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
> migrate_pages01    0  TINFO  :  test_invalid_mem invalid prot
> migrate_pages01   10  TPASS  :  expected ret success: returned value = -1
> migrate_pages01   11  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
> migrate_pages01    0  TINFO  :  test_invalid_mem unmmaped
> migrate_pages01   12  TPASS  :  expected ret success: returned value = -1
> migrate_pages01   13  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
> migrate_pages01    0  TINFO  :  test_invalid_nodes
> migrate_pages01   14  TFAIL  :  migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1
> migrate_pages01   15  TFAIL  :  migrate_pages_common.c:55: call succeeded unexpectedly
> migrate_pages01    0  TINFO  :  test_invalid_perm
> migrate_pages01   16  TPASS  :  expected ret success: returned value = -1
> migrate_pages01   17  TPASS  :  expected failure: TEST_ERRNO=EPERM(1): Operation not permitted
> --------------------------------------------------------
> 
> I debug and find a interesting thing, this case does not always fail.
> 
> 1) If one or several numa nodes have no memory, this case will run successfully like below:
> 
> --------------------
> available: 4 nodes (0-3)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 65309 MB
> node 0 free: 61650 MB
> node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> node 1 size: 65404 MB
> node 1 free: 61377 MB
> node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
> node 2 size: 65401 MB
> node 2 free: 62316 MB
> node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
> node 3 size: 0 MB
> node 3 free: 0 MB
> node distances:
> node   0   1   2   3
>   0:  10  15  20  20
>   1:  15  10  20  20
>   2:  20  20  10  15
>   3:  20  20  15  10
> ---------------------
> 
> This testcase will find node number 3 and migrate pages to node 3. And syscall of "migrate_pages" return -1, test succeeded.
> 
> 2) In most cases, all nodes have memory, and the testcase will get non-existent node like node number 4. The syscall of "migrate_pages" should also return -1, but return 0 actually.
> So the testcase failed.
> 
> I think it is a problem in arm64. But I am not familiar with numa, so I ask for help from you.
> 
> Thanks.
> Xiaojun.
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [bug report & help] arm64: ltp testcase "migrate_pages01" failed
  2017-10-17  2:58 ` Tan Xiaojun
@ 2017-10-17  9:23   ` Will Deacon
  2017-10-17 11:33     ` Tan Xiaojun
  2017-10-17 13:19     ` Yisheng Xie
  0 siblings, 2 replies; 6+ messages in thread
From: Will Deacon @ 2017-10-17  9:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 17, 2017 at 10:58:53AM +0800, Tan Xiaojun wrote:
> I'm not sure if this is the problem on arm64 numa. What do you think ?
> By the way, this testcase can be successful in any case on x86.

To be honest, this isn't a particularly helpful bug report. I appreciate
that a test is reporting failure, but it doesn't look like you've spent
very much effort to understand what the test is trying to do and why it
thinks it's failed to do it. All I can sensibly do with your bug report
is run the test myself, and it passes on the systems I have available.

So, you need to:

1. Understand what the test is doing.
2. Figure out which bit isn't doing what it's supposed to
3. See if that part can be isolated to trigger the problem

At that point, it should be possible to describe the unexpected behaviour
at a level which we can actually investigate if necessary.

Will

> On 2017/10/16 19:42, Tan Xiaojun wrote:
> > Hi all,
> > 
> > I test ltp in Hisilicon D05 board and get a failed result about the testcase "migrate_pages01".
> > 
> > In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to find a invalid numa node and migrate memory pages to this node via syscall of "migrate_pages".
> > The expected result of this case is returning "-1", but it actually return "0".
> > 
> > --------------------------------------------------------
> > # ./migrate_pages01
> > migrate_pages01    0  TINFO  :  test_empty_mask
> > migrate_pages01    1  TPASS  :  expected ret success: returned value = 0
> > migrate_pages01    0  TINFO  :  test_invalid_pid -1
> > migrate_pages01    2  TPASS  :  expected ret success: returned value = -1
> > migrate_pages01    3  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
> > migrate_pages01    0  TINFO  :  test_invalid_pid unused pid
> > migrate_pages01    4  TPASS  :  expected ret success: returned value = -1
> > migrate_pages01    5  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
> > migrate_pages01    0  TINFO  :  test_invalid_masksize
> > migrate_pages01    6  TPASS  :  expected ret success: returned value = -1
> > migrate_pages01    7  TPASS  :  expected failure: TEST_ERRNO=EINVAL(22): Invalid argument
> > migrate_pages01    0  TINFO  :  test_invalid_mem -1
> > migrate_pages01    8  TPASS  :  expected ret success: returned value = -1
> > migrate_pages01    9  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
> > migrate_pages01    0  TINFO  :  test_invalid_mem invalid prot
> > migrate_pages01   10  TPASS  :  expected ret success: returned value = -1
> > migrate_pages01   11  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
> > migrate_pages01    0  TINFO  :  test_invalid_mem unmmaped
> > migrate_pages01   12  TPASS  :  expected ret success: returned value = -1
> > migrate_pages01   13  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
> > migrate_pages01    0  TINFO  :  test_invalid_nodes
> > migrate_pages01   14  TFAIL  :  migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1
> > migrate_pages01   15  TFAIL  :  migrate_pages_common.c:55: call succeeded unexpectedly
> > migrate_pages01    0  TINFO  :  test_invalid_perm
> > migrate_pages01   16  TPASS  :  expected ret success: returned value = -1
> > migrate_pages01   17  TPASS  :  expected failure: TEST_ERRNO=EPERM(1): Operation not permitted
> > --------------------------------------------------------
> > 
> > I debug and find a interesting thing, this case does not always fail.
> > 
> > 1) If one or several numa nodes have no memory, this case will run successfully like below:
> > 
> > --------------------
> > available: 4 nodes (0-3)
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> > node 0 size: 65309 MB
> > node 0 free: 61650 MB
> > node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> > node 1 size: 65404 MB
> > node 1 free: 61377 MB
> > node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
> > node 2 size: 65401 MB
> > node 2 free: 62316 MB
> > node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
> > node 3 size: 0 MB
> > node 3 free: 0 MB
> > node distances:
> > node   0   1   2   3
> >   0:  10  15  20  20
> >   1:  15  10  20  20
> >   2:  20  20  10  15
> >   3:  20  20  15  10
> > ---------------------
> > 
> > This testcase will find node number 3 and migrate pages to node 3. And syscall of "migrate_pages" return -1, test succeeded.
> > 
> > 2) In most cases, all nodes have memory, and the testcase will get non-existent node like node number 4. The syscall of "migrate_pages" should also return -1, but return 0 actually.
> > So the testcase failed.
> > 
> > I think it is a problem in arm64. But I am not familiar with numa, so I ask for help from you.
> > 
> > Thanks.
> > Xiaojun.
> > 
> > 
> > .
> > 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [bug report & help] arm64: ltp testcase "migrate_pages01" failed
  2017-10-17  9:23   ` Will Deacon
@ 2017-10-17 11:33     ` Tan Xiaojun
  2017-10-17 13:19     ` Yisheng Xie
  1 sibling, 0 replies; 6+ messages in thread
From: Tan Xiaojun @ 2017-10-17 11:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 2017/10/17 17:23, Will Deacon wrote:
> On Tue, Oct 17, 2017 at 10:58:53AM +0800, Tan Xiaojun wrote:
>> I'm not sure if this is the problem on arm64 numa. What do you think ?
>> By the way, this testcase can be successful in any case on x86.
> 
> To be honest, this isn't a particularly helpful bug report. I appreciate
> that a test is reporting failure, but it doesn't look like you've spent
> very much effort to understand what the test is trying to do and why it
> thinks it's failed to do it. All I can sensibly do with your bug report
> is run the test myself, and it passes on the systems I have available.
> 
> So, you need to:
> 
> 1. Understand what the test is doing.
> 2. Figure out which bit isn't doing what it's supposed to
> 3. See if that part can be isolated to trigger the problem
> 
> At that point, it should be possible to describe the unexpected behaviour
> at a level which we can actually investigate if necessary.
> 
> Will
> 

OK. I will do these things. And can you tell me the model of your test board?
If possible, run "numactl -H" to make sure all nodes have memory.

Thanks.
Xiaojun.

>> On 2017/10/16 19:42, Tan Xiaojun wrote:
>>> Hi all,
>>>
>>> I test ltp in Hisilicon D05 board and get a failed result about the testcase "migrate_pages01".
>>>
>>> In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to find a invalid numa node and migrate memory pages to this node via syscall of "migrate_pages".
>>> The expected result of this case is returning "-1", but it actually return "0".
>>>
>>> --------------------------------------------------------
>>> # ./migrate_pages01
>>> migrate_pages01    0  TINFO  :  test_empty_mask
>>> migrate_pages01    1  TPASS  :  expected ret success: returned value = 0
>>> migrate_pages01    0  TINFO  :  test_invalid_pid -1
>>> migrate_pages01    2  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    3  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
>>> migrate_pages01    0  TINFO  :  test_invalid_pid unused pid
>>> migrate_pages01    4  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    5  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
>>> migrate_pages01    0  TINFO  :  test_invalid_masksize
>>> migrate_pages01    6  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    7  TPASS  :  expected failure: TEST_ERRNO=EINVAL(22): Invalid argument
>>> migrate_pages01    0  TINFO  :  test_invalid_mem -1
>>> migrate_pages01    8  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    9  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
>>> migrate_pages01    0  TINFO  :  test_invalid_mem invalid prot
>>> migrate_pages01   10  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01   11  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
>>> migrate_pages01    0  TINFO  :  test_invalid_mem unmmaped
>>> migrate_pages01   12  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01   13  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
>>> migrate_pages01    0  TINFO  :  test_invalid_nodes
>>> migrate_pages01   14  TFAIL  :  migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1
>>> migrate_pages01   15  TFAIL  :  migrate_pages_common.c:55: call succeeded unexpectedly
>>> migrate_pages01    0  TINFO  :  test_invalid_perm
>>> migrate_pages01   16  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01   17  TPASS  :  expected failure: TEST_ERRNO=EPERM(1): Operation not permitted
>>> --------------------------------------------------------
>>>
>>> I debug and find a interesting thing, this case does not always fail.
>>>
>>> 1) If one or several numa nodes have no memory, this case will run successfully like below:
>>>
>>> --------------------
>>> available: 4 nodes (0-3)
>>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
>>> node 0 size: 65309 MB
>>> node 0 free: 61650 MB
>>> node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
>>> node 1 size: 65404 MB
>>> node 1 free: 61377 MB
>>> node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
>>> node 2 size: 65401 MB
>>> node 2 free: 62316 MB
>>> node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
>>> node 3 size: 0 MB
>>> node 3 free: 0 MB
>>> node distances:
>>> node   0   1   2   3
>>>   0:  10  15  20  20
>>>   1:  15  10  20  20
>>>   2:  20  20  10  15
>>>   3:  20  20  15  10
>>> ---------------------
>>>
>>> This testcase will find node number 3 and migrate pages to node 3. And syscall of "migrate_pages" return -1, test succeeded.
>>>
>>> 2) In most cases, all nodes have memory, and the testcase will get non-existent node like node number 4. The syscall of "migrate_pages" should also return -1, but return 0 actually.
>>> So the testcase failed.
>>>
>>> I think it is a problem in arm64. But I am not familiar with numa, so I ask for help from you.
>>>
>>> Thanks.
>>> Xiaojun.
>>>
>>>
>>> .
>>>
>>
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [bug report & help] arm64: ltp testcase "migrate_pages01" failed
  2017-10-17  9:23   ` Will Deacon
  2017-10-17 11:33     ` Tan Xiaojun
@ 2017-10-17 13:19     ` Yisheng Xie
  2017-10-18  1:45       ` Yisheng Xie
  1 sibling, 1 reply; 6+ messages in thread
From: Yisheng Xie @ 2017-10-17 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2017/10/17 17:23, Will Deacon wrote:
> On Tue, Oct 17, 2017 at 10:58:53AM +0800, Tan Xiaojun wrote:
>> I'm not sure if this is the problem on arm64 numa. What do you think ?
>> By the way, this testcase can be successful in any case on x86.
> 
> To be honest, this isn't a particularly helpful bug report. I appreciate
> that a test is reporting failure, but it doesn't look like you've spent
> very much effort to understand what the test is trying to do and why it
> thinks it's failed to do it. All I can sensibly do with your bug report
> is run the test myself, and it passes on the systems I have available.
> 
> So, you need to:
> 
> 1. Understand what the test is doing.
> 2. Figure out which bit isn't doing what it's supposed to
> 3. See if that part can be isolated to trigger the problem
> 
> At that point, it should be possible to describe the unexpected behaviour
> at a level which we can actually investigate if necessary.
This test case is to test whether we should migrate successfully if user call
SYSC_migrate_pages with a invalid node. eg, we should 4 node 0-3, and try to
migrate to node 4. And this should return -EINVAL.

however, the kernel will migrate the memory to node 0 and return ok(e.g. 0).
The root cause is for
	nodes_subset(*new, node_states[N_MEMORY])

will return true when new = 0x10 and node_states[N_MEMORY]=0xf, MAX_NUMNODES=4.

And this is common issue, and I also can reproduce at certain config on X86-64
e.g. CONFIG_NODES_SHIFT=3 and have 8 node in the system.

IMO, if nbits=4, 0x0 or 0x10, 0xFF..F0 should not a subset of anything, so following
patch may fix this problem:

From: Yisheng Xie <xieyisheng1@huawei.com>
Date: Tue, 17 Oct 2017 20:53:55 +0800
Subject: [PATCH] bitmap: fix corner case of bitmap_subset

As Xiaojun reported the ltp of migrate_pages01 will failed in system
whoes has 4 node with CONFIG_NODES_SHIFT=2:

migrate_pages01    0  TINFO  :  test_invalid_nodes
migrate_pages01   14  TFAIL  :  migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1
migrate_pages01   15  TFAIL  :  migrate_pages_common.c:55: call succeeded unexpectedly

and the root cause is
	nodes_subset(*new, node_states[N_MEMORY])

will return true in the case like: new = 0x10 and node_states[N_MEMORY]=0xf,
MAX_NUMNODES=4.

Fix it by correct the corner case of bitmap_subset, which makes 0x0 or
0x10, 0xFF..F0  not a subset of bitmap when bitmap lenth is 4.

Reported-by: Tan Xiaojun <tanxiaojun@huawei.com>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
---
 include/linux/bitmap.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 700cf5f..bc66978 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -283,6 +283,8 @@ static inline int bitmap_intersects(const unsigned long *src1,
 static inline int bitmap_subset(const unsigned long *src1,
 			const unsigned long *src2, unsigned int nbits)
 {
+	if(!(*src1 & BITMAP_LAST_WORD_MASK(nbits)))
+		return false;
 	if (small_const_nbits(nbits))
 		return ! ((*src1 & ~(*src2)) & BITMAP_LAST_WORD_MASK(nbits));
 	else
-- 
1.7.12.4

Thanks
Yisheng Xie

> 
> Will
> 
>> On 2017/10/16 19:42, Tan Xiaojun wrote:
>>> Hi all,
>>>
>>> I test ltp in Hisilicon D05 board and get a failed result about the testcase "migrate_pages01".
>>>
>>> In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to find a invalid numa node and migrate memory pages to this node via syscall of "migrate_pages".
>>> The expected result of this case is returning "-1", but it actually return "0".
>>>
>>> --------------------------------------------------------
>>> # ./migrate_pages01
>>> migrate_pages01    0  TINFO  :  test_empty_mask
>>> migrate_pages01    1  TPASS  :  expected ret success: returned value = 0
>>> migrate_pages01    0  TINFO  :  test_invalid_pid -1
>>> migrate_pages01    2  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    3  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
>>> migrate_pages01    0  TINFO  :  test_invalid_pid unused pid
>>> migrate_pages01    4  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    5  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
>>> migrate_pages01    0  TINFO  :  test_invalid_masksize
>>> migrate_pages01    6  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    7  TPASS  :  expected failure: TEST_ERRNO=EINVAL(22): Invalid argument
>>> migrate_pages01    0  TINFO  :  test_invalid_mem -1
>>> migrate_pages01    8  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    9  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
>>> migrate_pages01    0  TINFO  :  test_invalid_mem invalid prot
>>> migrate_pages01   10  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01   11  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
>>> migrate_pages01    0  TINFO  :  test_invalid_mem unmmaped
>>> migrate_pages01   12  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01   13  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
>>> migrate_pages01    0  TINFO  :  test_invalid_nodes
>>> migrate_pages01   14  TFAIL  :  migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1
>>> migrate_pages01   15  TFAIL  :  migrate_pages_common.c:55: call succeeded unexpectedly
>>> migrate_pages01    0  TINFO  :  test_invalid_perm
>>> migrate_pages01   16  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01   17  TPASS  :  expected failure: TEST_ERRNO=EPERM(1): Operation not permitted
>>> --------------------------------------------------------
>>>
>>> I debug and find a interesting thing, this case does not always fail.
>>>
>>> 1) If one or several numa nodes have no memory, this case will run successfully like below:
>>>
>>> --------------------
>>> available: 4 nodes (0-3)
>>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
>>> node 0 size: 65309 MB
>>> node 0 free: 61650 MB
>>> node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
>>> node 1 size: 65404 MB
>>> node 1 free: 61377 MB
>>> node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
>>> node 2 size: 65401 MB
>>> node 2 free: 62316 MB
>>> node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
>>> node 3 size: 0 MB
>>> node 3 free: 0 MB
>>> node distances:
>>> node   0   1   2   3
>>>   0:  10  15  20  20
>>>   1:  15  10  20  20
>>>   2:  20  20  10  15
>>>   3:  20  20  15  10
>>> ---------------------
>>>
>>> This testcase will find node number 3 and migrate pages to node 3. And syscall of "migrate_pages" return -1, test succeeded.
>>>
>>> 2) In most cases, all nodes have memory, and the testcase will get non-existent node like node number 4. The syscall of "migrate_pages" should also return -1, but return 0 actually.
>>> So the testcase failed.
>>>
>>> I think it is a problem in arm64. But I am not familiar with numa, so I ask for help from you.
>>>
>>> Thanks.
>>> Xiaojun.
>>>
>>>
>>> .
>>>
>>
>>
> 
> .
> 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [bug report & help] arm64: ltp testcase "migrate_pages01" failed
  2017-10-17 13:19     ` Yisheng Xie
@ 2017-10-18  1:45       ` Yisheng Xie
  0 siblings, 0 replies; 6+ messages in thread
From: Yisheng Xie @ 2017-10-18  1:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2017/10/17 21:19, Yisheng Xie wrote:
> Hi Will,
> 
> On 2017/10/17 17:23, Will Deacon wrote:
>> On Tue, Oct 17, 2017 at 10:58:53AM +0800, Tan Xiaojun wrote:
>>> I'm not sure if this is the problem on arm64 numa. What do you think ?
>>> By the way, this testcase can be successful in any case on x86.
>>
>> To be honest, this isn't a particularly helpful bug report. I appreciate
>> that a test is reporting failure, but it doesn't look like you've spent
>> very much effort to understand what the test is trying to do and why it
>> thinks it's failed to do it. All I can sensibly do with your bug report
>> is run the test myself, and it passes on the systems I have available.
>>
>> So, you need to:
>>
>> 1. Understand what the test is doing.
>> 2. Figure out which bit isn't doing what it's supposed to
>> 3. See if that part can be isolated to trigger the problem
>>
>> At that point, it should be possible to describe the unexpected behaviour
>> at a level which we can actually investigate if necessary.
> This test case is to test whether we should migrate successfully if user call
> SYSC_migrate_pages with a invalid node. eg, we should 4 node 0-3, and try to
> migrate to node 4. And this should return -EINVAL.
> 
> however, the kernel will migrate the memory to node 0 and return ok(e.g. 0).
> The root cause is for
> 	nodes_subset(*new, node_states[N_MEMORY])
> 
> will return true when new = 0x10 and node_states[N_MEMORY]=0xf, MAX_NUMNODES=4.
> 
> And this is common issue, and I also can reproduce at certain config on X86-64
> e.g. CONFIG_NODES_SHIFT=3 and have 8 node in the system.
> 
> IMO, if nbits=4, 0x0 or 0x10, 0xFF..F0 should not a subset of anything, so following
> patch may fix this problem:

Sorry for having made a mathematics mistake, for empty set should be subset of
any set. Please forget about following patch, and I will send a new one with
node_empty check in SYSC_migrate_pages.

Thanks
Yisheng Xie
> 
> From: Yisheng Xie <xieyisheng1@huawei.com>
> Date: Tue, 17 Oct 2017 20:53:55 +0800
> Subject: [PATCH] bitmap: fix corner case of bitmap_subset
> 
> As Xiaojun reported the ltp of migrate_pages01 will failed in system
> whoes has 4 node with CONFIG_NODES_SHIFT=2:
> 
> migrate_pages01    0  TINFO  :  test_invalid_nodes
> migrate_pages01   14  TFAIL  :  migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1
> migrate_pages01   15  TFAIL  :  migrate_pages_common.c:55: call succeeded unexpectedly
> 
> and the root cause is
> 	nodes_subset(*new, node_states[N_MEMORY])
> 
> will return true in the case like: new = 0x10 and node_states[N_MEMORY]=0xf,
> MAX_NUMNODES=4.
> 
> Fix it by correct the corner case of bitmap_subset, which makes 0x0 or
> 0x10, 0xFF..F0  not a subset of bitmap when bitmap lenth is 4.
> 
> Reported-by: Tan Xiaojun <tanxiaojun@huawei.com>
> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
> ---
>  include/linux/bitmap.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 700cf5f..bc66978 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -283,6 +283,8 @@ static inline int bitmap_intersects(const unsigned long *src1,
>  static inline int bitmap_subset(const unsigned long *src1,
>  			const unsigned long *src2, unsigned int nbits)
>  {
> +	if(!(*src1 & BITMAP_LAST_WORD_MASK(nbits)))
> +		return false;
>  	if (small_const_nbits(nbits))
>  		return ! ((*src1 & ~(*src2)) & BITMAP_LAST_WORD_MASK(nbits));
>  	else
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-10-18  1:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-16 11:42 [bug report & help] arm64: ltp testcase "migrate_pages01" failed Tan Xiaojun
2017-10-17  2:58 ` Tan Xiaojun
2017-10-17  9:23   ` Will Deacon
2017-10-17 11:33     ` Tan Xiaojun
2017-10-17 13:19     ` Yisheng Xie
2017-10-18  1:45       ` Yisheng Xie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).