public inbox for ltp@lists.linux.it
 help / color / mirror / Atom feed
* [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
@ 2017-02-10  2:53 Li Wang
  2017-02-13  9:08 ` Cyril Hrubis
  0 siblings, 1 reply; 8+ messages in thread
From: Li Wang @ 2017-02-10  2:53 UTC (permalink / raw)
  To: ltp

Hi,

I'm trying to run ltp on upstream kernel-4.10.0-rc7, and found that
madvise07 always failing with no SIGBUS received when mmap the PRIVATE
memory. I hope to know if there're some relevant stuff about this
issue.
Any discussion or document for that?


# uname -r
4.10.0-rc7

# ./madvise07
tst_test.c:794: INFO: Timeout per run is 0h 05m 00s
madvise07.c:57: INFO: madvise(0x7f25bdd7e000, 4096, MADV_HWPOISON)
madvise07.c:72: FAIL: Did not receive SIGBUS after accessing
MAP_PRIVATE memory marked with MADV_HWPOISON
madvise07.c:57: INFO: madvise(0x7f25bdd7e000, 4096, MADV_HWPOISON)
madvise07.c:90: PASS: madvise(..., MADV_HWPOISON) on MAP_SHARED memory


-- 
Regards,
Li Wang
Email: liwang@redhat.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
  2017-02-10  2:53 [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS Li Wang
@ 2017-02-13  9:08 ` Cyril Hrubis
  2017-02-13 12:43   ` Richard Palethorpe
  2017-02-14 14:06   ` Jan Stancek
  0 siblings, 2 replies; 8+ messages in thread
From: Cyril Hrubis @ 2017-02-13  9:08 UTC (permalink / raw)
  To: ltp

Hi!
> I'm trying to run ltp on upstream kernel-4.10.0-rc7, and found that
> madvise07 always failing with no SIGBUS received when mmap the PRIVATE
> memory. I hope to know if there're some relevant stuff about this
> issue.
> Any discussion or document for that?

Looks like a plain old kernel bug to me.

> # uname -r
> 4.10.0-rc7
> 
> # ./madvise07
> tst_test.c:794: INFO: Timeout per run is 0h 05m 00s
> madvise07.c:57: INFO: madvise(0x7f25bdd7e000, 4096, MADV_HWPOISON)
> madvise07.c:72: FAIL: Did not receive SIGBUS after accessing
> MAP_PRIVATE memory marked with MADV_HWPOISON

If you reach this TFAIL the child wasn't killed with a signal after it
accessed memory marked with MADV_HWPOISON.

What hardware is this?

> madvise07.c:57: INFO: madvise(0x7f25bdd7e000, 4096, MADV_HWPOISON)
> madvise07.c:90: PASS: madvise(..., MADV_HWPOISON) on MAP_SHARED memory

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
  2017-02-13  9:08 ` Cyril Hrubis
@ 2017-02-13 12:43   ` Richard Palethorpe
  2017-02-14 14:06   ` Jan Stancek
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Palethorpe @ 2017-02-13 12:43 UTC (permalink / raw)
  To: ltp

Hello Li & Metan,

On Mon, 13 Feb 2017 10:08:37 +0100
"Cyril Hrubis" <chrubis@suse.cz> wrote:

> Hi!
> > I'm trying to run ltp on upstream kernel-4.10.0-rc7, and found that
> > madvise07 always failing with no SIGBUS received when mmap the PRIVATE
> > memory. I hope to know if there're some relevant stuff about this
> > issue.
> > Any discussion or document for that?  
> 
> Looks like a plain old kernel bug to me.

Sorry, I have to admit that I knew this fails, but did not follow it up before
submitting the patch! don't know whether it is a bug, or if MADV_HWPOISON is
not intended to work with private memory. I would assume that it is a bug
judging by the man pages.

> 
> > # uname -r
> > 4.10.0-rc7
> > 
> > # ./madvise07
> > tst_test.c:794: INFO: Timeout per run is 0h 05m 00s
> > madvise07.c:57: INFO: madvise(0x7f25bdd7e000, 4096, MADV_HWPOISON)
> > madvise07.c:72: FAIL: Did not receive SIGBUS after accessing
> > MAP_PRIVATE memory marked with MADV_HWPOISON  
> 
> If you reach this TFAIL the child wasn't killed with a signal after it
> accessed memory marked with MADV_HWPOISON.
> 
> What hardware is this?
> 
> > madvise07.c:57: INFO: madvise(0x7f25bdd7e000, 4096, MADV_HWPOISON)
> > madvise07.c:90: PASS: madvise(..., MADV_HWPOISON) on MAP_SHARED memory  
> 

I know that it fails on x86_64 and ppc64le.

Thank you,
Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
  2017-02-13  9:08 ` Cyril Hrubis
  2017-02-13 12:43   ` Richard Palethorpe
@ 2017-02-14 14:06   ` Jan Stancek
  2017-02-14 15:18     ` Richard Palethorpe
  2017-02-15  9:38     ` Li Wang
  1 sibling, 2 replies; 8+ messages in thread
From: Jan Stancek @ 2017-02-14 14:06 UTC (permalink / raw)
  To: ltp



----- Original Message -----
> From: "Cyril Hrubis" <chrubis@suse.cz>
> To: "Li Wang" <liwang@redhat.com>
> Cc: richiejp@f-m.fm, ltp@lists.linux.it
> Sent: Monday, 13 February, 2017 10:08:37 AM
> Subject: Re: [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
> 
> Hi!
> > I'm trying to run ltp on upstream kernel-4.10.0-rc7, and found that
> > madvise07 always failing with no SIGBUS received when mmap the PRIVATE
> > memory. I hope to know if there're some relevant stuff about this
> > issue.
> > Any discussion or document for that?
> 
> Looks like a plain old kernel bug to me.

Or maybe MADV_HWPOISON is supposed to work only for faulted-in pages?
It works fine for me with change below:

diff --git a/testcases/kernel/syscalls/madvise/madvise07.c b/testcases/kernel/syscalls/madvise/madvise07.c
index 2f8c42e..f5fd4b7 100644
--- a/testcases/kernel/syscalls/madvise/madvise07.c
+++ b/testcases/kernel/syscalls/madvise/madvise07.c
@@ -44,13 +44,13 @@ static int maptypes[] = {
 
 static void run_child(int maptype)
 {
-       const size_t msize = 4096;
+       const size_t msize = getpagesize();
        void *mem = NULL;
 
        mem = SAFE_MMAP(NULL,
                        msize,
                        PROT_READ | PROT_WRITE,
-                       MAP_ANONYMOUS | maptype,
+                       MAP_ANONYMOUS | maptype | MAP_POPULATE,
                        -1,
                        0);
 

> 
> > # uname -r
> > 4.10.0-rc7
> > 
> > # ./madvise07
> > tst_test.c:794: INFO: Timeout per run is 0h 05m 00s
> > madvise07.c:57: INFO: madvise(0x7f25bdd7e000, 4096, MADV_HWPOISON)
> > madvise07.c:72: FAIL: Did not receive SIGBUS after accessing
> > MAP_PRIVATE memory marked with MADV_HWPOISON
> 
> If you reach this TFAIL the child wasn't killed with a signal after it
> accessed memory marked with MADV_HWPOISON.
> 
> What hardware is this?

I'm seeing it on x86 KVM guest, with 2.6.32 (RHEL6.0), 3.10 (RHEL7), 4.8 and 4.9 kernels.

> 
> > madvise07.c:57: INFO: madvise(0x7f25bdd7e000, 4096, MADV_HWPOISON)
> > madvise07.c:90: PASS: madvise(..., MADV_HWPOISON) on MAP_SHARED memory
> 
> --
> Cyril Hrubis
> chrubis@suse.cz
> 
> --
> Mailing list info: https://lists.linux.it/listinfo/ltp
> 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
  2017-02-14 14:06   ` Jan Stancek
@ 2017-02-14 15:18     ` Richard Palethorpe
  2017-02-14 15:25       ` Jan Stancek
  2017-02-15  9:38     ` Li Wang
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Palethorpe @ 2017-02-14 15:18 UTC (permalink / raw)
  To: ltp

Hi Jan,

On Tue, 14 Feb 2017 09:06:14 -0500 (EST)
"Jan Stancek" <jstancek@redhat.com> wrote:

> 
> Or maybe MADV_HWPOISON is supposed to work only for faulted-in pages?
> It works fine for me with change below:
> 
> diff --git a/testcases/kernel/syscalls/madvise/madvise07.c b/testcases/kernel/syscalls/madvise/madvise07.c
> index 2f8c42e..f5fd4b7 100644
> --- a/testcases/kernel/syscalls/madvise/madvise07.c
> +++ b/testcases/kernel/syscalls/madvise/madvise07.c
> @@ -44,13 +44,13 @@ static int maptypes[] = {
>  
>  static void run_child(int maptype)
>  {
> -       const size_t msize = 4096;
> +       const size_t msize = getpagesize();
>         void *mem = NULL;
>  
>         mem = SAFE_MMAP(NULL,
>                         msize,
>                         PROT_READ | PROT_WRITE,
> -                       MAP_ANONYMOUS | maptype,
> +                       MAP_ANONYMOUS | maptype | MAP_POPULATE,
>                         -1,
>                         0);
>  

My only concern is that this is not documented in the man pages, but
considering we are testing a test interface, I'm not sure it matters.

Thank you,
Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
  2017-02-14 15:18     ` Richard Palethorpe
@ 2017-02-14 15:25       ` Jan Stancek
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Stancek @ 2017-02-14 15:25 UTC (permalink / raw)
  To: ltp



----- Original Message -----
> From: "Richard Palethorpe" <rpalethorpe@suse.com>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: "Cyril Hrubis" <chrubis@suse.cz>, ltp@lists.linux.it
> Sent: Tuesday, 14 February, 2017 4:18:45 PM
> Subject: Re: [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
> 
> Hi Jan,
> 
> On Tue, 14 Feb 2017 09:06:14 -0500 (EST)
> "Jan Stancek" <jstancek@redhat.com> wrote:
> 
> > 
> > Or maybe MADV_HWPOISON is supposed to work only for faulted-in pages?
> > It works fine for me with change below:
> > 
> > diff --git a/testcases/kernel/syscalls/madvise/madvise07.c
> > b/testcases/kernel/syscalls/madvise/madvise07.c
> > index 2f8c42e..f5fd4b7 100644
> > --- a/testcases/kernel/syscalls/madvise/madvise07.c
> > +++ b/testcases/kernel/syscalls/madvise/madvise07.c
> > @@ -44,13 +44,13 @@ static int maptypes[] = {
> >  
> >  static void run_child(int maptype)
> >  {
> > -       const size_t msize = 4096;
> > +       const size_t msize = getpagesize();
> >         void *mem = NULL;
> >  
> >         mem = SAFE_MMAP(NULL,
> >                         msize,
> >                         PROT_READ | PROT_WRITE,
> > -                       MAP_ANONYMOUS | maptype,
> > +                       MAP_ANONYMOUS | maptype | MAP_POPULATE,
> >                         -1,
> >                         0);
> >  
> 
> My only concern is that this is not documented in the man pages, but
> considering we are testing a test interface, I'm not sure it matters.

I'll ask on linux-mm.

> 
> Thank you,
> Richard.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
  2017-02-14 14:06   ` Jan Stancek
  2017-02-14 15:18     ` Richard Palethorpe
@ 2017-02-15  9:38     ` Li Wang
  2017-02-15  9:45       ` Li Wang
  1 sibling, 1 reply; 8+ messages in thread
From: Li Wang @ 2017-02-15  9:38 UTC (permalink / raw)
  To: ltp

On Tue, Feb 14, 2017 at 10:06 PM, Jan Stancek <jstancek@redhat.com> wrote:
>
>
> ----- Original Message -----
>> From: "Cyril Hrubis" <chrubis@suse.cz>
>> To: "Li Wang" <liwang@redhat.com>
>> Cc: richiejp@f-m.fm, ltp@lists.linux.it
>> Sent: Monday, 13 February, 2017 10:08:37 AM
>> Subject: Re: [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
>>
>> Hi!
>> > I'm trying to run ltp on upstream kernel-4.10.0-rc7, and found that
>> > madvise07 always failing with no SIGBUS received when mmap the PRIVATE
>> > memory. I hope to know if there're some relevant stuff about this
>> > issue.
>> > Any discussion or document for that?
>>
>> Looks like a plain old kernel bug to me.
>
> Or maybe MADV_HWPOISON is supposed to work only for faulted-in pages?

Looks like this thought is reasonable. Since the flag MAP_PRIVATE
creates a private copy-on-write page mapping, it means the testcase
will poison the read-only empty zero page many times if we reserve
more than one page. I did a test and verify that imagination.

e.g  Only running madvise07 PRIVATE part with 4pages on rhel7.3

# dmesg
[   62.322637] Injecting memory failure for page 1c9d at 7f0594254000
[   62.329660] MCE 0x1c9d: reserved kernel page still referenced by 1 users
[   62.337143] MCE 0x1c9d: reserved kernel page recovery: Failed
[   91.505460] Injecting memory failure for page 1c9d at 7f09ab16e000
[   91.512363] MCE 0x1c9d: already hardware poisoned
[   91.517620] Injecting memory failure for page 1c9d at 7f09ab16f000
[   91.524516] MCE 0x1c9d: already hardware poisoned
[   91.529763] Injecting memory failure for page 1c9d at 7f09ab170000
[   91.536659] MCE 0x1c9d: already hardware poisoned



And a patch in upstream kernel to fix a similar problem like that, it
make sense to fix our LTP case madvise07.c.

commit 29b4eedee67b449534214058e1bcb36307a7f1dc
Author: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Date:   Wed Sep 11 14:22:59 2013 -0700

    mm/hwpoison.c: fix held reference count after unpoisoning empty zero page



> It works fine for me with change below:
>
> diff --git a/testcases/kernel/syscalls/madvise/madvise07.c b/testcases/kernel/syscalls/madvise/madvise07.c
> index 2f8c42e..f5fd4b7 100644
> --- a/testcases/kernel/syscalls/madvise/madvise07.c
> +++ b/testcases/kernel/syscalls/madvise/madvise07.c
> @@ -44,13 +44,13 @@ static int maptypes[] = {
>
>  static void run_child(int maptype)
>  {
> -       const size_t msize = 4096;
> +       const size_t msize = getpagesize();
>         void *mem = NULL;
>
>         mem = SAFE_MMAP(NULL,
>                         msize,
>                         PROT_READ | PROT_WRITE,
> -                       MAP_ANONYMOUS | maptype,
> +                       MAP_ANONYMOUS | maptype | MAP_POPULATE,
>                         -1,
>                         0);
>

An other way I propose to fix the problem is just to using the page
before madvise():

$ git diff
diff --git a/testcases/kernel/syscalls/madvise/madvise07.c
b/testcases/kernel/syscalls/madvise/madvise07.c
index 2f8c42e..0ed5307 100644
--- a/testcases/kernel/syscalls/madvise/madvise07.c
+++ b/testcases/kernel/syscalls/madvise/madvise07.c
@@ -54,6 +54,8 @@ static void run_child(int maptype)
                        -1,
                        0);

+       *((char *)mem) = 'a';
+
        tst_res(TINFO, "madvise(%p, %zu, MADV_HWPOISON)", mem, msize);
        if (madvise(mem, msize, MADV_HWPOISON) == -1) {
                if (errno == EINVAL)



>
>>
>> > # uname -r
>> > 4.10.0-rc7
>> >
>> > # ./madvise07
>> > tst_test.c:794: INFO: Timeout per run is 0h 05m 00s
>> > madvise07.c:57: INFO: madvise(0x7f25bdd7e000, 4096, MADV_HWPOISON)
>> > madvise07.c:72: FAIL: Did not receive SIGBUS after accessing
>> > MAP_PRIVATE memory marked with MADV_HWPOISON
>>
>> If you reach this TFAIL the child wasn't killed with a signal after it
>> accessed memory marked with MADV_HWPOISON.
>>
>> What hardware is this?
>
> I'm seeing it on x86 KVM guest, with 2.6.32 (RHEL6.0), 3.10 (RHEL7), 4.8 and 4.9 kernels.
>
>>
>> > madvise07.c:57: INFO: madvise(0x7f25bdd7e000, 4096, MADV_HWPOISON)
>> > madvise07.c:90: PASS: madvise(..., MADV_HWPOISON) on MAP_SHARED memory
>>
>> --
>> Cyril Hrubis
>> chrubis@suse.cz
>>
>> --
>> Mailing list info: https://lists.linux.it/listinfo/ltp
>>



-- 
Regards,
Li Wang
Email: liwang@redhat.com

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
  2017-02-15  9:38     ` Li Wang
@ 2017-02-15  9:45       ` Li Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Li Wang @ 2017-02-15  9:45 UTC (permalink / raw)
  To: ltp

On Wed, Feb 15, 2017 at 5:38 PM, Li Wang <liwang@redhat.com> wrote:
> On Tue, Feb 14, 2017 at 10:06 PM, Jan Stancek <jstancek@redhat.com> wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Cyril Hrubis" <chrubis@suse.cz>
>>> To: "Li Wang" <liwang@redhat.com>
>>> Cc: richiejp@f-m.fm, ltp@lists.linux.it
>>> Sent: Monday, 13 February, 2017 10:08:37 AM
>>> Subject: Re: [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
>>>
>>> Hi!
>>> > I'm trying to run ltp on upstream kernel-4.10.0-rc7, and found that
>>> > madvise07 always failing with no SIGBUS received when mmap the PRIVATE
>>> > memory. I hope to know if there're some relevant stuff about this
>>> > issue.
>>> > Any discussion or document for that?
>>>
>>> Looks like a plain old kernel bug to me.
>>
>> Or maybe MADV_HWPOISON is supposed to work only for faulted-in pages?
>
> Looks like this thought is reasonable. Since the flag MAP_PRIVATE
> creates a private copy-on-write page mapping, it means the testcase
> will poison the read-only empty zero page many times if we reserve
> more than one page. I did a test and verify that imagination.
>
> e.g  Only running madvise07 PRIVATE part with 4pages on rhel7.3
>
> # dmesg
> [   62.322637] Injecting memory failure for page 1c9d at 7f0594254000
> [   62.329660] MCE 0x1c9d: reserved kernel page still referenced by 1 users
> [   62.337143] MCE 0x1c9d: reserved kernel page recovery: Failed
> [   91.505460] Injecting memory failure for page 1c9d at 7f09ab16e000
> [   91.512363] MCE 0x1c9d: already hardware poisoned
> [   91.517620] Injecting memory failure for page 1c9d at 7f09ab16f000
> [   91.524516] MCE 0x1c9d: already hardware poisoned
> [   91.529763] Injecting memory failure for page 1c9d at 7f09ab170000
> [   91.536659] MCE 0x1c9d: already hardware poisoned
>
>
>
> And a patch in upstream kernel to fix a similar problem like that, it
> make sense to fix our LTP case madvise07.c.
>
> commit 29b4eedee67b449534214058e1bcb36307a7f1dc
> Author: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Date:   Wed Sep 11 14:22:59 2013 -0700
>
>     mm/hwpoison.c: fix held reference count after unpoisoning empty zero page
>
>
>
>> It works fine for me with change below:
>>
>> diff --git a/testcases/kernel/syscalls/madvise/madvise07.c b/testcases/kernel/syscalls/madvise/madvise07.c
>> index 2f8c42e..f5fd4b7 100644
>> --- a/testcases/kernel/syscalls/madvise/madvise07.c
>> +++ b/testcases/kernel/syscalls/madvise/madvise07.c
>> @@ -44,13 +44,13 @@ static int maptypes[] = {
>>
>>  static void run_child(int maptype)
>>  {
>> -       const size_t msize = 4096;
>> +       const size_t msize = getpagesize();
>>         void *mem = NULL;
>>
>>         mem = SAFE_MMAP(NULL,
>>                         msize,
>>                         PROT_READ | PROT_WRITE,
>> -                       MAP_ANONYMOUS | maptype,
>> +                       MAP_ANONYMOUS | maptype | MAP_POPULATE,
>>                         -1,
>>                         0);
>>
>
> An other way I propose to fix the problem is just to using the page
> before madvise():
>
> $ git diff
> diff --git a/testcases/kernel/syscalls/madvise/madvise07.c
> b/testcases/kernel/syscalls/madvise/madvise07.c
> index 2f8c42e..0ed5307 100644
> --- a/testcases/kernel/syscalls/madvise/madvise07.c
> +++ b/testcases/kernel/syscalls/madvise/madvise07.c
> @@ -54,6 +54,8 @@ static void run_child(int maptype)
>                         -1,
>                         0);
>
> +       *((char *)mem) = 'a';
> +
>         tst_res(TINFO, "madvise(%p, %zu, MADV_HWPOISON)", mem, msize);
>         if (madvise(mem, msize, MADV_HWPOISON) == -1) {
>                 if (errno == EINVAL)
>

Attach this patched madvise07 result below:


# ./madvise07
tst_test.c:792: INFO: Timeout per run is 0h 05m 00s
madvise07.c:54: INFO: madvise(0x7f864a116000, 4096, MADV_HWPOISON)
madvise07.c:88: PASS: madvise(..., MADV_HWPOISON) on MAP_PRIVATE memory
madvise07.c:54: INFO: madvise(0x7f864a116000, 4096, MADV_HWPOISON)
madvise07.c:88: PASS: madvise(..., MADV_HWPOISON) on MAP_SHARED memory

Summary:
passed   2
failed   0
skipped  0
warnings 0

# dmesg
[  636.254254] Injecting memory failure for page 223cfd at 7f864a116000
[  636.261400] MCE 0x223cfd: dirty LRU page recovery: Recovered
[  636.267722] MCE: Killing madvise07:2498 due to hardware memory
corruption fault at 7f864a116000
[  636.277674] Injecting memory failure for page 223d18 at 7f864a116000
[  636.284811] MCE 0x223d18: dirty LRU page recovery: Recovered
[  636.291133] MCE: Killing madvise07:2499 due to hardware memory
corruption fault at 7f864a116000


Regards,
Li Wang

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-02-15  9:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-10  2:53 [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS Li Wang
2017-02-13  9:08 ` Cyril Hrubis
2017-02-13 12:43   ` Richard Palethorpe
2017-02-14 14:06   ` Jan Stancek
2017-02-14 15:18     ` Richard Palethorpe
2017-02-14 15:25       ` Jan Stancek
2017-02-15  9:38     ` Li Wang
2017-02-15  9:45       ` Li Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox