Is MADV_HWPOISON supposed to work only on faulted-in pages?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Is MADV_HWPOISON supposed to work only on faulted-in pages?
@ 2017-02-14 15:41 Jan Stancek
  2017-02-20  5:00 ` Naoya Horiguchi
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Stancek @ 2017-02-14 15:41 UTC (permalink / raw)
  To: linux-mm; +Cc: ltp

Hi,

code below (and LTP madvise07 [1]) doesn't produce SIGBUS,
unless I touch/prefault page before call to madvise().

Is this expected behavior?

Thanks,
Jan

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise07.c

-------------------- 8< --------------------
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>

int main(void)
{
	void *mem = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
			MAP_ANONYMOUS | MAP_PRIVATE /*| MAP_POPULATE*/,
			-1, 0);

	if (mem == MAP_FAILED)
		exit(1);

	if (madvise(mem, getpagesize(), MADV_HWPOISON) == -1)
		exit(1);

	*((char *)mem) = 'd';

	return 0;
}
-------------------- 8< --------------------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-02-14 15:41 Is MADV_HWPOISON supposed to work only on faulted-in pages? Jan Stancek
@ 2017-02-20  5:00 ` Naoya Horiguchi
  2017-02-23  3:23   ` Naoya Horiguchi
  0 siblings, 1 reply; 13+ messages in thread
From: Naoya Horiguchi @ 2017-02-20  5:00 UTC (permalink / raw)
  To: Jan Stancek; +Cc: linux-mm@kvack.org, ltp@lists.linux.it

On Tue, Feb 14, 2017 at 04:41:29PM +0100, Jan Stancek wrote:
> Hi,
>
> code below (and LTP madvise07 [1]) doesn't produce SIGBUS,
> unless I touch/prefault page before call to madvise().
>
> Is this expected behavior?

Thank you for reporting.

madvise(MADV_HWPOISON) triggers page fault when called on the address
over which no page is faulted-in, so I think that SIGBUS should be
called in such case.

But it seems that memory error handler considers such a page as "reserved
kernel page" and recovery action fails (see below.)

  [  383.371372] Injecting memory failure for page 0x1f10 at 0x7efcdc569000
  [  383.375678] Memory failure: 0x1f10: reserved kernel page still referenced by 1 users
  [  383.377570] Memory failure: 0x1f10: recovery action for reserved kernel page: Failed

I'm not sure how/when this behavior was introduced, so I try to understand.
IMO, the test code below looks valid to me, so no need to change.

Thanks,
Naoya Horiguchi

>
> Thanks,
> Jan
>
> [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise07.c
>
> -------------------- 8< --------------------
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <unistd.h>
>
> int main(void)
> {
> 	void *mem = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
> 			MAP_ANONYMOUS | MAP_PRIVATE /*| MAP_POPULATE*/,
> 			-1, 0);
>
> 	if (mem == MAP_FAILED)
> 		exit(1);
>
> 	if (madvise(mem, getpagesize(), MADV_HWPOISON) == -1)
> 		exit(1);
>
> 	*((char *)mem) = 'd';
>
> 	return 0;
> }
> -------------------- 8< --------------------
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-02-20  5:00 ` Naoya Horiguchi
@ 2017-02-23  3:23   ` Naoya Horiguchi
  2017-02-25  2:28     ` Yisheng Xie
  2017-03-27 23:54     ` Andi Kleen
  0 siblings, 2 replies; 13+ messages in thread
From: Naoya Horiguchi @ 2017-02-23  3:23 UTC (permalink / raw)
  To: Jan Stancek; +Cc: linux-mm@kvack.org, ltp@lists.linux.it

On Mon, Feb 20, 2017 at 05:00:17AM +0000, Horiguchi Naoya(堀口 直也) wrote:
> On Tue, Feb 14, 2017 at 04:41:29PM +0100, Jan Stancek wrote:
> > Hi,
> >
> > code below (and LTP madvise07 [1]) doesn't produce SIGBUS,
> > unless I touch/prefault page before call to madvise().
> >
> > Is this expected behavior?
> 
> Thank you for reporting.
> 
> madvise(MADV_HWPOISON) triggers page fault when called on the address
> over which no page is faulted-in, so I think that SIGBUS should be
> called in such case.
> 
> But it seems that memory error handler considers such a page as "reserved
> kernel page" and recovery action fails (see below.)
> 
>   [  383.371372] Injecting memory failure for page 0x1f10 at 0x7efcdc569000
>   [  383.375678] Memory failure: 0x1f10: reserved kernel page still referenced by 1 users
>   [  383.377570] Memory failure: 0x1f10: recovery action for reserved kernel page: Failed
> 
> I'm not sure how/when this behavior was introduced, so I try to understand.

I found that this is a zero page, which is not recoverable for memory
error now.

> IMO, the test code below looks valid to me, so no need to change.

I think that what the testcase effectively does is to test whether memory
handling on zero pages works or not.
And the testcase's failure seems acceptable, because it's simply not-implemented yet.
Maybe recovering from error on zero page is possible (because there's no data
loss for memory error,) but I'm not sure that code might be simple enough and/or
it's worth doing ...

Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-02-23  3:23   ` Naoya Horiguchi
@ 2017-02-25  2:28     ` Yisheng Xie
  2017-02-27  1:20       ` Naoya Horiguchi
  2017-03-27 23:54     ` Andi Kleen
  1 sibling, 1 reply; 13+ messages in thread
From: Yisheng Xie @ 2017-02-25  2:28 UTC (permalink / raw)
  To: Naoya Horiguchi, Jan Stancek; +Cc: linux-mm@kvack.org, ltp@lists.linux.it

hi Naoya,

On 2017/2/23 11:23, Naoya Horiguchi wrote:
> On Mon, Feb 20, 2017 at 05:00:17AM +0000, Horiguchi Naoya(堀口 直也) wrote:
>> On Tue, Feb 14, 2017 at 04:41:29PM +0100, Jan Stancek wrote:
>>> Hi,
>>>
>>> code below (and LTP madvise07 [1]) doesn't produce SIGBUS,
>>> unless I touch/prefault page before call to madvise().
>>>
>>> Is this expected behavior?
>>
>> Thank you for reporting.
>>
>> madvise(MADV_HWPOISON) triggers page fault when called on the address
>> over which no page is faulted-in, so I think that SIGBUS should be
>> called in such case.
>>
>> But it seems that memory error handler considers such a page as "reserved
>> kernel page" and recovery action fails (see below.)
>>
>>   [  383.371372] Injecting memory failure for page 0x1f10 at 0x7efcdc569000
>>   [  383.375678] Memory failure: 0x1f10: reserved kernel page still referenced by 1 users
>>   [  383.377570] Memory failure: 0x1f10: recovery action for reserved kernel page: Failed
>>
>> I'm not sure how/when this behavior was introduced, so I try to understand.
> 
> I found that this is a zero page, which is not recoverable for memory
> error now.
> 
>> IMO, the test code below looks valid to me, so no need to change.
> 
> I think that what the testcase effectively does is to test whether memory
> handling on zero pages works or not.
> And the testcase's failure seems acceptable, because it's simply not-implemented yet.
> Maybe recovering from error on zero page is possible (because there's no data
> loss for memory error,) but I'm not sure that code might be simple enough and/or
> it's worth doing ...
I question about it,  if a memory error happened on zero page, it will
cause all of data read from zero page is error, I mean no-zero, right?
And can we just use re-initial it with zero data maybe by memset ?

Thanks
Yisheng Xie.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-02-25  2:28     ` Yisheng Xie
@ 2017-02-27  1:20       ` Naoya Horiguchi
  2017-02-27  4:27         ` Zi Yan
  0 siblings, 1 reply; 13+ messages in thread
From: Naoya Horiguchi @ 2017-02-27  1:20 UTC (permalink / raw)
  To: Yisheng Xie; +Cc: Jan Stancek, linux-mm@kvack.org, ltp@lists.linux.it

On Sat, Feb 25, 2017 at 10:28:15AM +0800, Yisheng Xie wrote:
> hi Naoya,
> 
> On 2017/2/23 11:23, Naoya Horiguchi wrote:
> > On Mon, Feb 20, 2017 at 05:00:17AM +0000, Horiguchi Naoya(堀口 直也) wrote:
> >> On Tue, Feb 14, 2017 at 04:41:29PM +0100, Jan Stancek wrote:
> >>> Hi,
> >>>
> >>> code below (and LTP madvise07 [1]) doesn't produce SIGBUS,
> >>> unless I touch/prefault page before call to madvise().
> >>>
> >>> Is this expected behavior?
> >>
> >> Thank you for reporting.
> >>
> >> madvise(MADV_HWPOISON) triggers page fault when called on the address
> >> over which no page is faulted-in, so I think that SIGBUS should be
> >> called in such case.
> >>
> >> But it seems that memory error handler considers such a page as "reserved
> >> kernel page" and recovery action fails (see below.)
> >>
> >>   [  383.371372] Injecting memory failure for page 0x1f10 at 0x7efcdc569000
> >>   [  383.375678] Memory failure: 0x1f10: reserved kernel page still referenced by 1 users
> >>   [  383.377570] Memory failure: 0x1f10: recovery action for reserved kernel page: Failed
> >>
> >> I'm not sure how/when this behavior was introduced, so I try to understand.
> > 
> > I found that this is a zero page, which is not recoverable for memory
> > error now.
> > 
> >> IMO, the test code below looks valid to me, so no need to change.
> > 
> > I think that what the testcase effectively does is to test whether memory
> > handling on zero pages works or not.
> > And the testcase's failure seems acceptable, because it's simply not-implemented yet.
> > Maybe recovering from error on zero page is possible (because there's no data
> > loss for memory error,) but I'm not sure that code might be simple enough and/or
> > it's worth doing ...
> I question about it,  if a memory error happened on zero page, it will
> cause all of data read from zero page is error, I mean no-zero, right?

Hi Yisheng,

Yes, the impact is serious (could affect many processes,) but it's possibility
is very low because there's only one page in a system that is used for zero page.
There are many other pages which are not recoverable for memory error like
slab pages, so I'm not sure how I prioritize it (maybe it's not a
top-priority thing, nor low-hanging fruit.)

> And can we just use re-initial it with zero data maybe by memset ?

Maybe it's not enoguh. Under a real hwpoison, we should isolate the error
page to prevent the access on the broken data.
But zero page is statically defined as an array of global variable, so
it's not trival to replace it with a new zero page at runtime.

Anyway, it's in my todo list, so hopefully revisited in the future.

Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-02-27  1:20       ` Naoya Horiguchi
@ 2017-02-27  4:27         ` Zi Yan
  2017-02-27  6:33           ` Naoya Horiguchi
  0 siblings, 1 reply; 13+ messages in thread
From: Zi Yan @ 2017-02-27  4:27 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Yisheng Xie, Jan Stancek, linux-mm@kvack.org, ltp@lists.linux.it

[-- Attachment #1: Type: text/plain, Size: 3539 bytes --]

On 26 Feb 2017, at 19:20, Naoya Horiguchi wrote:

> On Sat, Feb 25, 2017 at 10:28:15AM +0800, Yisheng Xie wrote:
>> hi Naoya,
>>
>> On 2017/2/23 11:23, Naoya Horiguchi wrote:
>>> On Mon, Feb 20, 2017 at 05:00:17AM +0000, Horiguchi Naoya(堀口 直也) wrote:
>>>> On Tue, Feb 14, 2017 at 04:41:29PM +0100, Jan Stancek wrote:
>>>>> Hi,
>>>>>
>>>>> code below (and LTP madvise07 [1]) doesn't produce SIGBUS,
>>>>> unless I touch/prefault page before call to madvise().
>>>>>
>>>>> Is this expected behavior?
>>>>
>>>> Thank you for reporting.
>>>>
>>>> madvise(MADV_HWPOISON) triggers page fault when called on the address
>>>> over which no page is faulted-in, so I think that SIGBUS should be
>>>> called in such case.
>>>>
>>>> But it seems that memory error handler considers such a page as "reserved
>>>> kernel page" and recovery action fails (see below.)
>>>>
>>>>   [  383.371372] Injecting memory failure for page 0x1f10 at 0x7efcdc569000
>>>>   [  383.375678] Memory failure: 0x1f10: reserved kernel page still referenced by 1 users
>>>>   [  383.377570] Memory failure: 0x1f10: recovery action for reserved kernel page: Failed
>>>>
>>>> I'm not sure how/when this behavior was introduced, so I try to understand.
>>>
>>> I found that this is a zero page, which is not recoverable for memory
>>> error now.
>>>
>>>> IMO, the test code below looks valid to me, so no need to change.
>>>
>>> I think that what the testcase effectively does is to test whether memory
>>> handling on zero pages works or not.
>>> And the testcase's failure seems acceptable, because it's simply not-implemented yet.
>>> Maybe recovering from error on zero page is possible (because there's no data
>>> loss for memory error,) but I'm not sure that code might be simple enough and/or
>>> it's worth doing ...
>> I question about it,  if a memory error happened on zero page, it will
>> cause all of data read from zero page is error, I mean no-zero, right?
>
> Hi Yisheng,
>
> Yes, the impact is serious (could affect many processes,) but it's possibility
> is very low because there's only one page in a system that is used for zero page.
> There are many other pages which are not recoverable for memory error like
> slab pages, so I'm not sure how I prioritize it (maybe it's not a
> top-priority thing, nor low-hanging fruit.)
>
>> And can we just use re-initial it with zero data maybe by memset ?
>
> Maybe it's not enoguh. Under a real hwpoison, we should isolate the error
> page to prevent the access on the broken data.
> But zero page is statically defined as an array of global variable, so
> it's not trival to replace it with a new zero page at runtime.
>
> Anyway, it's in my todo list, so hopefully revisited in the future.
>

Hi Naoya,

The test case tries to HWPOISON a range of virtual addresses that do not
map to any physical pages.

I expected either madvise should fail because HWPOISON does not work on
non-existing physical pages or madvise_hwpoison() should populate
some physical pages for that virtual address range and poison them.

As I tested it on kernel v4.10, the test application exited at
madvise, because madvise returns -1 and error message is
"Device or resource busy". I think this is a proper behavior.

There might be some confusion in madvise's man page on MADV_HWPOISON.
If you add some text saying madvise fails if any page is not mapped in
the given address range, that can eliminate the confusion.


--
Best Regards
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-02-27  4:27         ` Zi Yan
@ 2017-02-27  6:33           ` Naoya Horiguchi
  2017-02-27 16:10             ` Zi Yan
                               ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Naoya Horiguchi @ 2017-02-27  6:33 UTC (permalink / raw)
  To: Zi Yan; +Cc: Yisheng Xie, Jan Stancek, linux-mm@kvack.org, ltp@lists.linux.it

On Sun, Feb 26, 2017 at 10:27:02PM -0600, Zi Yan wrote:
> On 26 Feb 2017, at 19:20, Naoya Horiguchi wrote:
> 
> > On Sat, Feb 25, 2017 at 10:28:15AM +0800, Yisheng Xie wrote:
> >> hi Naoya,
> >>
> >> On 2017/2/23 11:23, Naoya Horiguchi wrote:
> >>> On Mon, Feb 20, 2017 at 05:00:17AM +0000, Horiguchi Naoya(堀口 直也) wrote:
> >>>> On Tue, Feb 14, 2017 at 04:41:29PM +0100, Jan Stancek wrote:
> >>>>> Hi,
> >>>>>
> >>>>> code below (and LTP madvise07 [1]) doesn't produce SIGBUS,
> >>>>> unless I touch/prefault page before call to madvise().
> >>>>>
> >>>>> Is this expected behavior?
> >>>>
> >>>> Thank you for reporting.
> >>>>
> >>>> madvise(MADV_HWPOISON) triggers page fault when called on the address
> >>>> over which no page is faulted-in, so I think that SIGBUS should be
> >>>> called in such case.
> >>>>
> >>>> But it seems that memory error handler considers such a page as "reserved
> >>>> kernel page" and recovery action fails (see below.)
> >>>>
> >>>>   [  383.371372] Injecting memory failure for page 0x1f10 at 0x7efcdc569000
> >>>>   [  383.375678] Memory failure: 0x1f10: reserved kernel page still referenced by 1 users
> >>>>   [  383.377570] Memory failure: 0x1f10: recovery action for reserved kernel page: Failed
> >>>>
> >>>> I'm not sure how/when this behavior was introduced, so I try to understand.
> >>>
> >>> I found that this is a zero page, which is not recoverable for memory
> >>> error now.
> >>>
> >>>> IMO, the test code below looks valid to me, so no need to change.
> >>>
> >>> I think that what the testcase effectively does is to test whether memory
> >>> handling on zero pages works or not.
> >>> And the testcase's failure seems acceptable, because it's simply not-implemented yet.
> >>> Maybe recovering from error on zero page is possible (because there's no data
> >>> loss for memory error,) but I'm not sure that code might be simple enough and/or
> >>> it's worth doing ...
> >> I question about it,  if a memory error happened on zero page, it will
> >> cause all of data read from zero page is error, I mean no-zero, right?
> >
> > Hi Yisheng,
> >
> > Yes, the impact is serious (could affect many processes,) but it's possibility
> > is very low because there's only one page in a system that is used for zero page.
> > There are many other pages which are not recoverable for memory error like
> > slab pages, so I'm not sure how I prioritize it (maybe it's not a
> > top-priority thing, nor low-hanging fruit.)
> >
> >> And can we just use re-initial it with zero data maybe by memset ?
> >
> > Maybe it's not enoguh. Under a real hwpoison, we should isolate the error
> > page to prevent the access on the broken data.
> > But zero page is statically defined as an array of global variable, so
> > it's not trival to replace it with a new zero page at runtime.
> >
> > Anyway, it's in my todo list, so hopefully revisited in the future.
> >
> 
> Hi Naoya,
> 
> The test case tries to HWPOISON a range of virtual addresses that do not
> map to any physical pages.
> 

Hi Yan,

> I expected either madvise should fail because HWPOISON does not work on
> non-existing physical pages or madvise_hwpoison() should populate
> some physical pages for that virtual address range and poison them.

The latter is the current behavior. It just comes from get_user_pages_fast()
which not only finds the page and takes refcount, but also touch the page.

madvise(MADV_HWPOISON) is a test feature, and calling it for address backed
by no page doesn't simulate anything real. IOW, the behavior is undefined.
So I don't have a strong opinion about how it should behave.

> 
> As I tested it on kernel v4.10, the test application exited at
> madvise, because madvise returns -1 and error message is
> "Device or resource busy". I think this is a proper behavior.

yes, maybe we see the same thing, you can see in dmesg "recovery action
for reserved kernel page: Failed" message.

> 
> There might be some confusion in madvise's man page on MADV_HWPOISON.
> If you add some text saying madvise fails if any page is not mapped in
> the given address range, that can eliminate the confusion*

Writing it down to man page makes readers think this behavior is a part of
specification, that might not be good now because the failure in error
handling of zero page is not the eventually fixed behavior.
I mean that if zero page handles hwpoison properly in the future, madvise
will succeed without any confusion.
So I feel that we don't have to update man page for this issue.

Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-02-27  6:33           ` Naoya Horiguchi
@ 2017-02-27 16:10             ` Zi Yan
  2017-03-14 13:20             ` [LTP] " Cyril Hrubis
  2017-03-27 12:08             ` Richard Palethorpe
  2 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2017-02-27 16:10 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Yisheng Xie, Jan Stancek, linux-mm@kvack.org, ltp@lists.linux.it

[-- Attachment #1: Type: text/plain, Size: 4902 bytes --]

On 27 Feb 2017, at 0:33, Naoya Horiguchi wrote:

> On Sun, Feb 26, 2017 at 10:27:02PM -0600, Zi Yan wrote:
>> On 26 Feb 2017, at 19:20, Naoya Horiguchi wrote:
>>
>>> On Sat, Feb 25, 2017 at 10:28:15AM +0800, Yisheng Xie wrote:
>>>> hi Naoya,
>>>>
>>>> On 2017/2/23 11:23, Naoya Horiguchi wrote:
>>>>> On Mon, Feb 20, 2017 at 05:00:17AM +0000, Horiguchi Naoya(堀口 直也) wrote:
>>>>>> On Tue, Feb 14, 2017 at 04:41:29PM +0100, Jan Stancek wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> code below (and LTP madvise07 [1]) doesn't produce SIGBUS,
>>>>>>> unless I touch/prefault page before call to madvise().
>>>>>>>
>>>>>>> Is this expected behavior?
>>>>>>
>>>>>> Thank you for reporting.
>>>>>>
>>>>>> madvise(MADV_HWPOISON) triggers page fault when called on the address
>>>>>> over which no page is faulted-in, so I think that SIGBUS should be
>>>>>> called in such case.
>>>>>>
>>>>>> But it seems that memory error handler considers such a page as "reserved
>>>>>> kernel page" and recovery action fails (see below.)
>>>>>>
>>>>>>   [  383.371372] Injecting memory failure for page 0x1f10 at 0x7efcdc569000
>>>>>>   [  383.375678] Memory failure: 0x1f10: reserved kernel page still referenced by 1 users
>>>>>>   [  383.377570] Memory failure: 0x1f10: recovery action for reserved kernel page: Failed
>>>>>>
>>>>>> I'm not sure how/when this behavior was introduced, so I try to understand.
>>>>>
>>>>> I found that this is a zero page, which is not recoverable for memory
>>>>> error now.
>>>>>
>>>>>> IMO, the test code below looks valid to me, so no need to change.
>>>>>
>>>>> I think that what the testcase effectively does is to test whether memory
>>>>> handling on zero pages works or not.
>>>>> And the testcase's failure seems acceptable, because it's simply not-implemented yet.
>>>>> Maybe recovering from error on zero page is possible (because there's no data
>>>>> loss for memory error,) but I'm not sure that code might be simple enough and/or
>>>>> it's worth doing ...
>>>> I question about it,  if a memory error happened on zero page, it will
>>>> cause all of data read from zero page is error, I mean no-zero, right?
>>>
>>> Hi Yisheng,
>>>
>>> Yes, the impact is serious (could affect many processes,) but it's possibility
>>> is very low because there's only one page in a system that is used for zero page.
>>> There are many other pages which are not recoverable for memory error like
>>> slab pages, so I'm not sure how I prioritize it (maybe it's not a
>>> top-priority thing, nor low-hanging fruit.)
>>>
>>>> And can we just use re-initial it with zero data maybe by memset ?
>>>
>>> Maybe it's not enoguh. Under a real hwpoison, we should isolate the error
>>> page to prevent the access on the broken data.
>>> But zero page is statically defined as an array of global variable, so
>>> it's not trival to replace it with a new zero page at runtime.
>>>
>>> Anyway, it's in my todo list, so hopefully revisited in the future.
>>>
>>
>> Hi Naoya,
>>
>> The test case tries to HWPOISON a range of virtual addresses that do not
>> map to any physical pages.
>>
>
> Hi Yan,
>
>> I expected either madvise should fail because HWPOISON does not work on
>> non-existing physical pages or madvise_hwpoison() should populate
>> some physical pages for that virtual address range and poison them.
>
> The latter is the current behavior. It just comes from get_user_pages_fast()
> which not only finds the page and takes refcount, but also touch the page.
>
> madvise(MADV_HWPOISON) is a test feature, and calling it for address backed
> by no page doesn't simulate anything real. IOW, the behavior is undefined.
> So I don't have a strong opinion about how it should behave.
>
>>
>> As I tested it on kernel v4.10, the test application exited at
>> madvise, because madvise returns -1 and error message is
>> "Device or resource busy". I think this is a proper behavior.
>
> yes, maybe we see the same thing, you can see in dmesg "recovery action
> for reserved kernel page: Failed" message.
>
>>
>> There might be some confusion in madvise's man page on MADV_HWPOISON.
>> If you add some text saying madvise fails if any page is not mapped in
>> the given address range, that can eliminate the confusion*
>
> Writing it down to man page makes readers think this behavior is a part of
> specification, that might not be good now because the failure in error
> handling of zero page is not the eventually fixed behavior.
> I mean that if zero page handles hwpoison properly in the future, madvise
> will succeed without any confusion.
> So I feel that we don't have to update man page for this issue.

You are right, I missed the part that get_user_pages_fast() will actually fault
in the madvised pages with zero_page.

Thanks for clarifying this.

--
Best Regards
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LTP] Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-02-27  6:33           ` Naoya Horiguchi
  2017-02-27 16:10             ` Zi Yan
@ 2017-03-14 13:20             ` Cyril Hrubis
  2017-03-27 12:08             ` Richard Palethorpe
  2 siblings, 0 replies; 13+ messages in thread
From: Cyril Hrubis @ 2017-03-14 13:20 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Zi Yan, Yisheng Xie, linux-mm@kvack.org, ltp@lists.linux.it,
	linux-man, mtk.manpages

Hi!
> > >>>>> code below (and LTP madvise07 [1]) doesn't produce SIGBUS,
> > >>>>> unless I touch/prefault page before call to madvise().
> > >>>>>
> > >>>>> Is this expected behavior?
> > >>>>
> > >>>> Thank you for reporting.
> > >>>>
> > >>>> madvise(MADV_HWPOISON) triggers page fault when called on the address
> > >>>> over which no page is faulted-in, so I think that SIGBUS should be
> > >>>> called in such case.
> > >>>>
> > >>>> But it seems that memory error handler considers such a page as "reserved
> > >>>> kernel page" and recovery action fails (see below.)
> > >>>>
> > >>>>   [  383.371372] Injecting memory failure for page 0x1f10 at 0x7efcdc569000
> > >>>>   [  383.375678] Memory failure: 0x1f10: reserved kernel page still referenced by 1 users
> > >>>>   [  383.377570] Memory failure: 0x1f10: recovery action for reserved kernel page: Failed
> > >>>>
> > >>>> I'm not sure how/when this behavior was introduced, so I try to understand.
> > >>>
> > >>> I found that this is a zero page, which is not recoverable for memory
> > >>> error now.
> > >>>
> > >>>> IMO, the test code below looks valid to me, so no need to change.
> > >>>
> > >>> I think that what the testcase effectively does is to test whether memory
> > >>> handling on zero pages works or not.
> > >>> And the testcase's failure seems acceptable, because it's simply not-implemented yet.
> > >>> Maybe recovering from error on zero page is possible (because there's no data
> > >>> loss for memory error,) but I'm not sure that code might be simple enough and/or
> > >>> it's worth doing ...
> > >> I question about it,  if a memory error happened on zero page, it will
> > >> cause all of data read from zero page is error, I mean no-zero, right?
> > >
> > > Hi Yisheng,
> > >
> > > Yes, the impact is serious (could affect many processes,) but it's possibility
> > > is very low because there's only one page in a system that is used for zero page.
> > > There are many other pages which are not recoverable for memory error like
> > > slab pages, so I'm not sure how I prioritize it (maybe it's not a
> > > top-priority thing, nor low-hanging fruit.)
> > >
> > >> And can we just use re-initial it with zero data maybe by memset ?
> > >
> > > Maybe it's not enoguh. Under a real hwpoison, we should isolate the error
> > > page to prevent the access on the broken data.
> > > But zero page is statically defined as an array of global variable, so
> > > it's not trival to replace it with a new zero page at runtime.
> > >
> > > Anyway, it's in my todo list, so hopefully revisited in the future.
> > >
> > 
> > Hi Naoya,
> > 
> > The test case tries to HWPOISON a range of virtual addresses that do not
> > map to any physical pages.
> > 
> 
> Hi Yan,
> 
> > I expected either madvise should fail because HWPOISON does not work on
> > non-existing physical pages or madvise_hwpoison() should populate
> > some physical pages for that virtual address range and poison them.
> 
> The latter is the current behavior. It just comes from get_user_pages_fast()
> which not only finds the page and takes refcount, but also touch the page.
> 
> madvise(MADV_HWPOISON) is a test feature, and calling it for address backed
> by no page doesn't simulate anything real. IOW, the behavior is undefined.
> So I don't have a strong opinion about how it should behave.
> 
> > 
> > As I tested it on kernel v4.10, the test application exited at
> > madvise, because madvise returns -1 and error message is
> > "Device or resource busy". I think this is a proper behavior.
> 
> yes, maybe we see the same thing, you can see in dmesg "recovery action
> for reserved kernel page: Failed" message.
> 
> > 
> > There might be some confusion in madvise's man page on MADV_HWPOISON.
> > If you add some text saying madvise fails if any page is not mapped in
> > the given address range, that can eliminate the confusion*
> 
> Writing it down to man page makes readers think this behavior is a part of
> specification, that might not be good now because the failure in error
> handling of zero page is not the eventually fixed behavior.
> I mean that if zero page handles hwpoison properly in the future, madvise
> will succeed without any confusion.
> So I feel that we don't have to update man page for this issue.

I still think that this is a worth of documenting in the manual page,
since the call failed silently before 4.10 right? I guess that we may as
well add a BUGS section and document at least that.

-- 
Cyril Hrubis
chrubis@suse.cz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LTP] Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-02-27  6:33           ` Naoya Horiguchi
  2017-02-27 16:10             ` Zi Yan
  2017-03-14 13:20             ` [LTP] " Cyril Hrubis
@ 2017-03-27 12:08             ` Richard Palethorpe
  2 siblings, 0 replies; 13+ messages in thread
From: Richard Palethorpe @ 2017-03-27 12:08 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Zi Yan, Yisheng Xie, linux-mm@kvack.org, ltp@lists.linux.it

Hi Naoya,

On Mon, 27 Feb 2017 06:33:09 +0000
"Naoya Horiguchi" <n-horiguchi@ah.jp.nec.com> wrote:

> 
> > I expected either madvise should fail because HWPOISON does not work on
> > non-existing physical pages or madvise_hwpoison() should populate
> > some physical pages for that virtual address range and poison them.  
> 
> The latter is the current behavior. It just comes from get_user_pages_fast()
> which not only finds the page and takes refcount, but also touch the page.

To clarify, the current behaviour seems to be the following:

1st madvise_hwpoison() -> EBUSY,
2nd madvise_hwpoison() -> SUCCESS, but no SIGBUS when the memory is accessed.

So it touches the zero page and madvise succeeds on the second attempt because
it is now mapped, but still the memory is not poisoned.

This means that when I modify the LTP test to accept EBUSY, it still fails if
a user runs it twice. This is OK, but I will need to document it in the test.

Thank you,
Richard.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-02-23  3:23   ` Naoya Horiguchi
  2017-02-25  2:28     ` Yisheng Xie
@ 2017-03-27 23:54     ` Andi Kleen
  2017-03-28  8:25       ` [LTP] " Cyril Hrubis
  1 sibling, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2017-03-27 23:54 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Jan Stancek, linux-mm@kvack.org, ltp@lists.linux.it

Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> writes:
>
> I think that what the testcase effectively does is to test whether memory
> handling on zero pages works or not.
> And the testcase's failure seems acceptable, because it's simply not-implemented yet.
> Maybe recovering from error on zero page is possible (because there's no data
> loss for memory error,) but I'm not sure that code might be simple enough and/or
> it's worth doing ...

I doubt it's worth doing, it's just too unlikely that a specific page
is hit. Memory error handling is all about probabilities.

The test is just broken and should be fixed.

mce-test had similar problems at some point, but they were all fixed.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LTP] Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-03-27 23:54     ` Andi Kleen
@ 2017-03-28  8:25       ` Cyril Hrubis
  2017-03-28 20:26         ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: Cyril Hrubis @ 2017-03-28  8:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Naoya Horiguchi, linux-mm@kvack.org, ltp@lists.linux.it

Hi!
> > I think that what the testcase effectively does is to test whether memory
> > handling on zero pages works or not.
> > And the testcase's failure seems acceptable, because it's simply not-implemented yet.
> > Maybe recovering from error on zero page is possible (because there's no data
> > loss for memory error,) but I'm not sure that code might be simple enough and/or
> > it's worth doing ...
> 
> I doubt it's worth doing, it's just too unlikely that a specific page
> is hit. Memory error handling is all about probabilities.
> 
> The test is just broken and should be fixed.
> 
> mce-test had similar problems at some point, but they were all fixed.

Well I disagree, the reason why the test fails is that MADV_HWPOISON on
not-faulted private mappings fails silently, which is a bug, albeit
minor one. If something is not implemented, it should report a failure,
the usual error return would be EINVAL in this case.

It appears that it fails with EBUSY on first try on newer kernels, but
still fails silently when we try for a second time.

Why can't we simply check if the page is faulted or not and return error
in the latter case?

-- 
Cyril Hrubis
chrubis@suse.cz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LTP] Is MADV_HWPOISON supposed to work only on faulted-in pages?
  2017-03-28  8:25       ` [LTP] " Cyril Hrubis
@ 2017-03-28 20:26         ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2017-03-28 20:26 UTC (permalink / raw)
  To: Cyril Hrubis
  Cc: Andi Kleen, Naoya Horiguchi, linux-mm@kvack.org,
	ltp@lists.linux.it

> Well I disagree, the reason why the test fails is that MADV_HWPOISON on
> not-faulted private mappings fails silently, which is a bug, albeit
> minor one. If something is not implemented, it should report a failure,
> the usual error return would be EINVAL in this case.
> 
> It appears that it fails with EBUSY on first try on newer kernels, but
> still fails silently when we try for a second time.
> 
> Why can't we simply check if the page is faulted or not and return error
> in the latter case?

It's a debug interface. You're supposed to know what you're doing.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-03-28 20:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-14 15:41 Is MADV_HWPOISON supposed to work only on faulted-in pages? Jan Stancek
2017-02-20  5:00 ` Naoya Horiguchi
2017-02-23  3:23   ` Naoya Horiguchi
2017-02-25  2:28     ` Yisheng Xie
2017-02-27  1:20       ` Naoya Horiguchi
2017-02-27  4:27         ` Zi Yan
2017-02-27  6:33           ` Naoya Horiguchi
2017-02-27 16:10             ` Zi Yan
2017-03-14 13:20             ` [LTP] " Cyril Hrubis
2017-03-27 12:08             ` Richard Palethorpe
2017-03-27 23:54     ` Andi Kleen
2017-03-28  8:25       ` [LTP] " Cyril Hrubis
2017-03-28 20:26         ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).