* [PATCH V2] MCE: fix an error of mce_bad_pages statistics @ 2012-12-07 8:48 Xishi Qiu 2012-12-07 14:33 ` Borislav Petkov 2012-12-07 22:11 ` Andrew Morton 0 siblings, 2 replies; 38+ messages in thread From: Xishi Qiu @ 2012-12-07 8:48 UTC (permalink / raw) To: WuJianguo, Xishi Qiu, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, akpm, linux-mm, linux-kernel On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a free page twice, the value of mce_bad_pages will be added twice. So this is an error, since the page was already marked HWPoison, we should skip the page and don't add the value of mce_bad_pages. $ cat /proc/meminfo | grep HardwareCorrupted soft_offline_page() get_any_page() atomic_long_add(1, &mce_bad_pages) Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> i>>?Signed-off-by: Jiang Liu <jiang.liu@huawei.com> --- mm/memory-failure.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 8b20278..de760ca 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) return ret; done: - atomic_long_add(1, &mce_bad_pages); - SetPageHWPoison(page); /* keep elevated page count for bad page */ + if (!PageHWPoison(page)) { + atomic_long_add(1, &mce_bad_pages); + SetPageHWPoison(page); + } + return ret; } -- 1.7.6.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-07 8:48 [PATCH V2] MCE: fix an error of mce_bad_pages statistics Xishi Qiu @ 2012-12-07 14:33 ` Borislav Petkov 2012-12-07 22:11 ` Andrew Morton 1 sibling, 0 replies; 38+ messages in thread From: Borislav Petkov @ 2012-12-07 14:33 UTC (permalink / raw) To: Xishi Qiu Cc: WuJianguo, Liujiang, Vyacheslav.Dubeyko, andi, akpm, linux-mm, linux-kernel On Fri, Dec 07, 2012 at 04:48:45PM +0800, Xishi Qiu wrote: > On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a > free page twice, the value of mce_bad_pages will be added twice. So this is an error, > since the page was already marked HWPoison, we should skip the page and don't add the > value of mce_bad_pages. > > $ cat /proc/meminfo | grep HardwareCorrupted > > soft_offline_page() > get_any_page() > atomic_long_add(1, &mce_bad_pages) > > Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> > i>>?Signed-off-by: Jiang Liu <jiang.liu@huawei.com> > --- > mm/memory-failure.c | 7 +++++-- > 1 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 8b20278..de760ca 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) > return ret; > > done: > - atomic_long_add(1, &mce_bad_pages); > - SetPageHWPoison(page); > /* keep elevated page count for bad page */ > + if (!PageHWPoison(page)) { > + atomic_long_add(1, &mce_bad_pages); > + SetPageHWPoison(page); > + } Ok, I don't know the memory-failure code all that well but IMHO why should we wade through the whole soft_offline_page function for a page which has been marked as poisoned already? IOW, why not do what you started with previously and exit this function ASAP: --- diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 8b20278be6a6..a83baeca0644 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1494,6 +1494,12 @@ int soft_offline_page(struct page *page, int flags) if (ret == 0) goto done; + if (PageHWPoison(page)) { + put_page(page); + pr_info("soft offline: %#lx page already poisoned\n", pfn); + return -EBUSY; + } + /* * Page cache page we can handle? */ --- -- Regards/Gruss, Boris. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-07 8:48 [PATCH V2] MCE: fix an error of mce_bad_pages statistics Xishi Qiu 2012-12-07 14:33 ` Borislav Petkov @ 2012-12-07 22:11 ` Andrew Morton 2012-12-07 22:41 ` Borislav Petkov ` (3 more replies) 1 sibling, 4 replies; 38+ messages in thread From: Andrew Morton @ 2012-12-07 22:11 UTC (permalink / raw) To: Xishi Qiu Cc: WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel On Fri, 7 Dec 2012 16:48:45 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote: > On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a > free page twice, the value of mce_bad_pages will be added twice. So this is an error, > since the page was already marked HWPoison, we should skip the page and don't add the > value of mce_bad_pages. > > $ cat /proc/meminfo | grep HardwareCorrupted > > soft_offline_page() > get_any_page() > atomic_long_add(1, &mce_bad_pages) > > ... > > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) > return ret; > > done: > - atomic_long_add(1, &mce_bad_pages); > - SetPageHWPoison(page); > /* keep elevated page count for bad page */ > + if (!PageHWPoison(page)) { > + atomic_long_add(1, &mce_bad_pages); > + SetPageHWPoison(page); > + } > + > return ret; > } A few things: - soft_offline_page() already checks for this case: if (PageHWPoison(page)) { unlock_page(page); put_page(page); pr_info("soft offline: %#lx page already poisoned\n", pfn); return -EBUSY; } so why didn't this check work for you? Presumably because one of the earlier "goto done" branches was taken. Which one, any why? This function is an utter mess. It contains six return points randomly intermingled with three "goto done" return points. This mess is probably the cause of the bug you have observed. Can we please fix it up somehow? It *seems* that the design (lol) of this function is "for errors, return immediately. For success, goto done". In which case "done" should have been called "success". But if you just look at the function you'll see that this approach didn't work. I suggest it be converted to have two return points - one for the success path, one for the failure path. Or something. - soft_offline_huge_page() is a miniature copy of soft_offline_page() and might suffer the same bug. - A cleaner, shorter and possibly faster implementation is if (!TestSetPageHWPoison(page)) atomic_long_add(1, &mce_bad_pages); - We have atomic_long_inc(). Use it? - Why do we have a variable called "mce_bad_pages"? MCE is an x86 concept, and this code is in mm/. Lights are flashing, bells are ringing and a loudspeaker is blaring "layering violation" at us! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-07 22:11 ` Andrew Morton @ 2012-12-07 22:41 ` Borislav Petkov 2012-12-10 4:33 ` Xishi Qiu ` (2 subsequent siblings) 3 siblings, 0 replies; 38+ messages in thread From: Borislav Petkov @ 2012-12-07 22:41 UTC (permalink / raw) To: Andrew Morton Cc: Xishi Qiu, WuJianguo, Liujiang, Vyacheslav.Dubeyko, andi, linux-mm, linux-kernel On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: > A few things: > > - soft_offline_page() already checks for this case: > > if (PageHWPoison(page)) { > unlock_page(page); > put_page(page); > pr_info("soft offline: %#lx page already poisoned\n", pfn); > return -EBUSY; > } Oh, so we do this check after all. But later in the function. Why? Why not at the beginning so that when a page is marked poisoned already we can exit early? Strange. -- Regards/Gruss, Boris. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-07 22:11 ` Andrew Morton 2012-12-07 22:41 ` Borislav Petkov @ 2012-12-10 4:33 ` Xishi Qiu 2012-12-10 8:33 ` Wanpeng Li 2012-12-10 8:33 ` Wanpeng Li 3 siblings, 0 replies; 38+ messages in thread From: Xishi Qiu @ 2012-12-10 4:33 UTC (permalink / raw) To: Andrew Morton Cc: WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel On 2012/12/8 6:11, Andrew Morton wrote: > On Fri, 7 Dec 2012 16:48:45 +0800 > Xishi Qiu <qiuxishi@huawei.com> wrote: > >> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a >> free page twice, the value of mce_bad_pages will be added twice. So this is an error, >> since the page was already marked HWPoison, we should skip the page and don't add the >> value of mce_bad_pages. >> >> $ cat /proc/meminfo | grep HardwareCorrupted >> >> soft_offline_page() >> get_any_page() >> atomic_long_add(1, &mce_bad_pages) >> >> ... >> >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) >> return ret; >> >> done: >> - atomic_long_add(1, &mce_bad_pages); >> - SetPageHWPoison(page); >> /* keep elevated page count for bad page */ >> + if (!PageHWPoison(page)) { >> + atomic_long_add(1, &mce_bad_pages); >> + SetPageHWPoison(page); >> + } >> + >> return ret; >> } > > A few things: > > - soft_offline_page() already checks for this case: > > if (PageHWPoison(page)) { > unlock_page(page); > put_page(page); > pr_info("soft offline: %#lx page already poisoned\n", pfn); > return -EBUSY; > } > > so why didn't this check work for you? > > Presumably because one of the earlier "goto done" branches was > taken. Which one, any why? > > This function is an utter mess. It contains six return points > randomly intermingled with three "goto done" return points. > > This mess is probably the cause of the bug you have observed. Can > we please fix it up somehow? It *seems* that the design (lol) of > this function is "for errors, return immediately. For success, goto > done". In which case "done" should have been called "success". But > if you just look at the function you'll see that this approach didn't > work. I suggest it be converted to have two return points - one for > the success path, one for the failure path. Or something. > > - soft_offline_huge_page() is a miniature copy of soft_offline_page() > and might suffer the same bug. > > - A cleaner, shorter and possibly faster implementation is > > if (!TestSetPageHWPoison(page)) > atomic_long_add(1, &mce_bad_pages); > > - We have atomic_long_inc(). Use it? > > - Why do we have a variable called "mce_bad_pages"? MCE is an x86 > concept, and this code is in mm/. Lights are flashing, bells are > ringing and a loudspeaker is blaring "layering violation" at us! > Hi Andrew, thank you for your advice, I will send V3 soon. Thanks Xishi Qiu > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-07 22:11 ` Andrew Morton 2012-12-07 22:41 ` Borislav Petkov 2012-12-10 4:33 ` Xishi Qiu @ 2012-12-10 8:33 ` Wanpeng Li 2012-12-10 9:06 ` Xishi Qiu 2012-12-10 8:33 ` Wanpeng Li 3 siblings, 1 reply; 38+ messages in thread From: Wanpeng Li @ 2012-12-10 8:33 UTC (permalink / raw) To: Andrew Morton, Xishi Qiu Cc: WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: >On Fri, 7 Dec 2012 16:48:45 +0800 >Xishi Qiu <qiuxishi@huawei.com> wrote: > >> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a >> free page twice, the value of mce_bad_pages will be added twice. So this is an error, >> since the page was already marked HWPoison, we should skip the page and don't add the >> value of mce_bad_pages. >> >> $ cat /proc/meminfo | grep HardwareCorrupted >> >> soft_offline_page() >> get_any_page() >> atomic_long_add(1, &mce_bad_pages) >> >> ... >> >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) >> return ret; >> >> done: >> - atomic_long_add(1, &mce_bad_pages); >> - SetPageHWPoison(page); >> /* keep elevated page count for bad page */ >> + if (!PageHWPoison(page)) { >> + atomic_long_add(1, &mce_bad_pages); >> + SetPageHWPoison(page); >> + } >> + >> return ret; >> } > >A few things: > >- soft_offline_page() already checks for this case: > > if (PageHWPoison(page)) { > unlock_page(page); > put_page(page); > pr_info("soft offline: %#lx page already poisoned\n", pfn); > return -EBUSY; > } > > so why didn't this check work for you? > > Presumably because one of the earlier "goto done" branches was > taken. Which one, any why? > > This function is an utter mess. It contains six return points > randomly intermingled with three "goto done" return points. > > This mess is probably the cause of the bug you have observed. Can > we please fix it up somehow? It *seems* that the design (lol) of > this function is "for errors, return immediately. For success, goto > done". In which case "done" should have been called "success". But > if you just look at the function you'll see that this approach didn't > work. I suggest it be converted to have two return points - one for > the success path, one for the failure path. Or something. > >- soft_offline_huge_page() is a miniature copy of soft_offline_page() > and might suffer the same bug. > >- A cleaner, shorter and possibly faster implementation is > > if (!TestSetPageHWPoison(page)) > atomic_long_add(1, &mce_bad_pages); > Hi Andrew, Since hwpoison bit for free buddy page has already be set in get_any_page, !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for the first time. Regards, Wanpeng Li >- We have atomic_long_inc(). Use it? > >- Why do we have a variable called "mce_bad_pages"? MCE is an x86 > concept, and this code is in mm/. Lights are flashing, bells are > ringing and a loudspeaker is blaring "layering violation" at us! > >-- >To unsubscribe, send a message with 'unsubscribe linux-mm' in >the body to majordomo@kvack.org. For more info on Linux MM, >see: http://www.linux-mm.org/ . >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 8:33 ` Wanpeng Li @ 2012-12-10 9:06 ` Xishi Qiu 2012-12-10 10:47 ` Simon Jeons 0 siblings, 1 reply; 38+ messages in thread From: Xishi Qiu @ 2012-12-10 9:06 UTC (permalink / raw) To: Wanpeng Li Cc: Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency On 2012/12/10 16:33, Wanpeng Li wrote: > On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: >> On Fri, 7 Dec 2012 16:48:45 +0800 >> Xishi Qiu <qiuxishi@huawei.com> wrote: >> >>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a >>> free page twice, the value of mce_bad_pages will be added twice. So this is an error, >>> since the page was already marked HWPoison, we should skip the page and don't add the >>> value of mce_bad_pages. >>> >>> $ cat /proc/meminfo | grep HardwareCorrupted >>> >>> soft_offline_page() >>> get_any_page() >>> atomic_long_add(1, &mce_bad_pages) >>> >>> ... >>> >>> --- a/mm/memory-failure.c >>> +++ b/mm/memory-failure.c >>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) >>> return ret; >>> >>> done: >>> - atomic_long_add(1, &mce_bad_pages); >>> - SetPageHWPoison(page); >>> /* keep elevated page count for bad page */ >>> + if (!PageHWPoison(page)) { >>> + atomic_long_add(1, &mce_bad_pages); >>> + SetPageHWPoison(page); >>> + } >>> + >>> return ret; >>> } >> >> A few things: >> >> - soft_offline_page() already checks for this case: >> >> if (PageHWPoison(page)) { >> unlock_page(page); >> put_page(page); >> pr_info("soft offline: %#lx page already poisoned\n", pfn); >> return -EBUSY; >> } >> >> so why didn't this check work for you? >> >> Presumably because one of the earlier "goto done" branches was >> taken. Which one, any why? >> >> This function is an utter mess. It contains six return points >> randomly intermingled with three "goto done" return points. >> >> This mess is probably the cause of the bug you have observed. Can >> we please fix it up somehow? It *seems* that the design (lol) of >> this function is "for errors, return immediately. For success, goto >> done". In which case "done" should have been called "success". But >> if you just look at the function you'll see that this approach didn't >> work. I suggest it be converted to have two return points - one for >> the success path, one for the failure path. Or something. >> >> - soft_offline_huge_page() is a miniature copy of soft_offline_page() >> and might suffer the same bug. >> >> - A cleaner, shorter and possibly faster implementation is >> >> if (!TestSetPageHWPoison(page)) >> atomic_long_add(1, &mce_bad_pages); >> > > Hi Andrew, > > Since hwpoison bit for free buddy page has already be set in get_any_page, > !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for > the first time. > > Regards, > Wanpeng Li > The poisoned page is isolated in bad_page(), I wonder whether it could be isolated immediately in soft_offline_page() and memory_failure()? buffered_rmqueue() prep_new_page() check_new_page() bad_page() Thanks Xishi Qiu >> - We have atomic_long_inc(). Use it? >> >> - Why do we have a variable called "mce_bad_pages"? MCE is an x86 >> concept, and this code is in mm/. Lights are flashing, bells are >> ringing and a loudspeaker is blaring "layering violation" at us! >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 9:06 ` Xishi Qiu @ 2012-12-10 10:47 ` Simon Jeons 2012-12-10 11:16 ` Xishi Qiu 0 siblings, 1 reply; 38+ messages in thread From: Simon Jeons @ 2012-12-10 10:47 UTC (permalink / raw) To: Xishi Qiu Cc: Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote: > On 2012/12/10 16:33, Wanpeng Li wrote: > > > On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: > >> On Fri, 7 Dec 2012 16:48:45 +0800 > >> Xishi Qiu <qiuxishi@huawei.com> wrote: > >> > >>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a > >>> free page twice, the value of mce_bad_pages will be added twice. So this is an error, > >>> since the page was already marked HWPoison, we should skip the page and don't add the > >>> value of mce_bad_pages. > >>> > >>> $ cat /proc/meminfo | grep HardwareCorrupted > >>> > >>> soft_offline_page() > >>> get_any_page() > >>> atomic_long_add(1, &mce_bad_pages) > >>> > >>> ... > >>> > >>> --- a/mm/memory-failure.c > >>> +++ b/mm/memory-failure.c > >>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) > >>> return ret; > >>> > >>> done: > >>> - atomic_long_add(1, &mce_bad_pages); > >>> - SetPageHWPoison(page); > >>> /* keep elevated page count for bad page */ > >>> + if (!PageHWPoison(page)) { > >>> + atomic_long_add(1, &mce_bad_pages); > >>> + SetPageHWPoison(page); > >>> + } > >>> + > >>> return ret; > >>> } > >> > >> A few things: > >> > >> - soft_offline_page() already checks for this case: > >> > >> if (PageHWPoison(page)) { > >> unlock_page(page); > >> put_page(page); > >> pr_info("soft offline: %#lx page already poisoned\n", pfn); > >> return -EBUSY; > >> } > >> > >> so why didn't this check work for you? > >> > >> Presumably because one of the earlier "goto done" branches was > >> taken. Which one, any why? > >> > >> This function is an utter mess. It contains six return points > >> randomly intermingled with three "goto done" return points. > >> > >> This mess is probably the cause of the bug you have observed. Can > >> we please fix it up somehow? It *seems* that the design (lol) of > >> this function is "for errors, return immediately. For success, goto > >> done". In which case "done" should have been called "success". But > >> if you just look at the function you'll see that this approach didn't > >> work. I suggest it be converted to have two return points - one for > >> the success path, one for the failure path. Or something. > >> > >> - soft_offline_huge_page() is a miniature copy of soft_offline_page() > >> and might suffer the same bug. > >> > >> - A cleaner, shorter and possibly faster implementation is > >> > >> if (!TestSetPageHWPoison(page)) > >> atomic_long_add(1, &mce_bad_pages); > >> > > > > Hi Andrew, > > > > Since hwpoison bit for free buddy page has already be set in get_any_page, > > !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for > > the first time. > > > > Regards, > > Wanpeng Li > > > > The poisoned page is isolated in bad_page(), I wonder whether it could be isolated > immediately in soft_offline_page() and memory_failure()? > > buffered_rmqueue() > prep_new_page() > check_new_page() > bad_page() Do you mean else if(is_free_buddy_page(p)) branch is redundancy? > > Thanks > Xishi Qiu > > >> - We have atomic_long_inc(). Use it? > >> > >> - Why do we have a variable called "mce_bad_pages"? MCE is an x86 > >> concept, and this code is in mm/. Lights are flashing, bells are > >> ringing and a loudspeaker is blaring "layering violation" at us! > >> > >> -- > >> To unsubscribe, send a message with 'unsubscribe linux-mm' in > >> the body to majordomo@kvack.org. For more info on Linux MM, > >> see: http://www.linux-mm.org/ . > >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > > > > > . > > > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 10:47 ` Simon Jeons @ 2012-12-10 11:16 ` Xishi Qiu 2012-12-10 11:39 ` Wanpeng Li ` (4 more replies) 0 siblings, 5 replies; 38+ messages in thread From: Xishi Qiu @ 2012-12-10 11:16 UTC (permalink / raw) To: Simon Jeons Cc: Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency On 2012/12/10 18:47, Simon Jeons wrote: > On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote: >> On 2012/12/10 16:33, Wanpeng Li wrote: >> >>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: >>>> On Fri, 7 Dec 2012 16:48:45 +0800 >>>> Xishi Qiu <qiuxishi@huawei.com> wrote: >>>> >>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a >>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error, >>>>> since the page was already marked HWPoison, we should skip the page and don't add the >>>>> value of mce_bad_pages. >>>>> >>>>> $ cat /proc/meminfo | grep HardwareCorrupted >>>>> >>>>> soft_offline_page() >>>>> get_any_page() >>>>> atomic_long_add(1, &mce_bad_pages) >>>>> >>>>> ... >>>>> >>>>> --- a/mm/memory-failure.c >>>>> +++ b/mm/memory-failure.c >>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) >>>>> return ret; >>>>> >>>>> done: >>>>> - atomic_long_add(1, &mce_bad_pages); >>>>> - SetPageHWPoison(page); >>>>> /* keep elevated page count for bad page */ >>>>> + if (!PageHWPoison(page)) { >>>>> + atomic_long_add(1, &mce_bad_pages); >>>>> + SetPageHWPoison(page); >>>>> + } >>>>> + >>>>> return ret; >>>>> } >>>> >>>> A few things: >>>> >>>> - soft_offline_page() already checks for this case: >>>> >>>> if (PageHWPoison(page)) { >>>> unlock_page(page); >>>> put_page(page); >>>> pr_info("soft offline: %#lx page already poisoned\n", pfn); >>>> return -EBUSY; >>>> } >>>> >>>> so why didn't this check work for you? >>>> >>>> Presumably because one of the earlier "goto done" branches was >>>> taken. Which one, any why? >>>> >>>> This function is an utter mess. It contains six return points >>>> randomly intermingled with three "goto done" return points. >>>> >>>> This mess is probably the cause of the bug you have observed. Can >>>> we please fix it up somehow? It *seems* that the design (lol) of >>>> this function is "for errors, return immediately. For success, goto >>>> done". In which case "done" should have been called "success". But >>>> if you just look at the function you'll see that this approach didn't >>>> work. I suggest it be converted to have two return points - one for >>>> the success path, one for the failure path. Or something. >>>> >>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page() >>>> and might suffer the same bug. >>>> >>>> - A cleaner, shorter and possibly faster implementation is >>>> >>>> if (!TestSetPageHWPoison(page)) >>>> atomic_long_add(1, &mce_bad_pages); >>>> >>> >>> Hi Andrew, >>> >>> Since hwpoison bit for free buddy page has already be set in get_any_page, >>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for >>> the first time. >>> >>> Regards, >>> Wanpeng Li >>> >> >> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated >> immediately in soft_offline_page() and memory_failure()? >> >> buffered_rmqueue() >> prep_new_page() >> check_new_page() >> bad_page() > > Do you mean else if(is_free_buddy_page(p)) branch is redundancy? > Hi Simon, get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy. It is another topic, I mean since the page is poisoned, so why not isolate it from page buddy alocator in soft_offline_page() rather than in check_new_page(). I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned page is still managed by page buddy alocator. >> >> Thanks >> Xishi Qiu >> >>>> - We have atomic_long_inc(). Use it? >>>> >>>> - Why do we have a variable called "mce_bad_pages"? MCE is an x86 >>>> concept, and this code is in mm/. Lights are flashing, bells are >>>> ringing and a loudspeaker is blaring "layering violation" at us! >>>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 11:16 ` Xishi Qiu @ 2012-12-10 11:39 ` Wanpeng Li 2012-12-10 11:54 ` Xishi Qiu 2012-12-10 15:39 ` Andi Kleen 2012-12-10 11:39 ` Wanpeng Li ` (3 subsequent siblings) 4 siblings, 2 replies; 38+ messages in thread From: Wanpeng Li @ 2012-12-10 11:39 UTC (permalink / raw) To: Xishi Qiu Cc: Simon Jeons, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency On Mon, Dec 10, 2012 at 07:16:50PM +0800, Xishi Qiu wrote: >On 2012/12/10 18:47, Simon Jeons wrote: > >> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote: >>> On 2012/12/10 16:33, Wanpeng Li wrote: >>> >>>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: >>>>> On Fri, 7 Dec 2012 16:48:45 +0800 >>>>> Xishi Qiu <qiuxishi@huawei.com> wrote: >>>>> >>>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a >>>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error, >>>>>> since the page was already marked HWPoison, we should skip the page and don't add the >>>>>> value of mce_bad_pages. >>>>>> >>>>>> $ cat /proc/meminfo | grep HardwareCorrupted >>>>>> >>>>>> soft_offline_page() >>>>>> get_any_page() >>>>>> atomic_long_add(1, &mce_bad_pages) >>>>>> >>>>>> ... >>>>>> >>>>>> --- a/mm/memory-failure.c >>>>>> +++ b/mm/memory-failure.c >>>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) >>>>>> return ret; >>>>>> >>>>>> done: >>>>>> - atomic_long_add(1, &mce_bad_pages); >>>>>> - SetPageHWPoison(page); >>>>>> /* keep elevated page count for bad page */ >>>>>> + if (!PageHWPoison(page)) { >>>>>> + atomic_long_add(1, &mce_bad_pages); >>>>>> + SetPageHWPoison(page); >>>>>> + } >>>>>> + >>>>>> return ret; >>>>>> } >>>>> >>>>> A few things: >>>>> >>>>> - soft_offline_page() already checks for this case: >>>>> >>>>> if (PageHWPoison(page)) { >>>>> unlock_page(page); >>>>> put_page(page); >>>>> pr_info("soft offline: %#lx page already poisoned\n", pfn); >>>>> return -EBUSY; >>>>> } >>>>> >>>>> so why didn't this check work for you? >>>>> >>>>> Presumably because one of the earlier "goto done" branches was >>>>> taken. Which one, any why? >>>>> >>>>> This function is an utter mess. It contains six return points >>>>> randomly intermingled with three "goto done" return points. >>>>> >>>>> This mess is probably the cause of the bug you have observed. Can >>>>> we please fix it up somehow? It *seems* that the design (lol) of >>>>> this function is "for errors, return immediately. For success, goto >>>>> done". In which case "done" should have been called "success". But >>>>> if you just look at the function you'll see that this approach didn't >>>>> work. I suggest it be converted to have two return points - one for >>>>> the success path, one for the failure path. Or something. >>>>> >>>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page() >>>>> and might suffer the same bug. >>>>> >>>>> - A cleaner, shorter and possibly faster implementation is >>>>> >>>>> if (!TestSetPageHWPoison(page)) >>>>> atomic_long_add(1, &mce_bad_pages); >>>>> >>>> >>>> Hi Andrew, >>>> >>>> Since hwpoison bit for free buddy page has already be set in get_any_page, >>>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for >>>> the first time. >>>> >>>> Regards, >>>> Wanpeng Li >>>> >>> >>> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated >>> immediately in soft_offline_page() and memory_failure()? >>> >>> buffered_rmqueue() >>> prep_new_page() >>> check_new_page() >>> bad_page() >> >> Do you mean else if(is_free_buddy_page(p)) branch is redundancy? >> > >Hi Simon, > >get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy. > >It is another topic, I mean since the page is poisoned, so why not isolate it >from page buddy alocator in soft_offline_page() rather than in check_new_page(). > >I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned >page is still managed by page buddy alocator. > Hi Xishi, HWPoison delays any action on buddy allocator pages, handling can be safely postponed until a later time when the page might be referenced. By delaying, some transient errors may not reoccur or may be irrelevant. Regards, Wanpeng Li >>> >>> Thanks >>> Xishi Qiu >>> >>>>> - We have atomic_long_inc(). Use it? >>>>> >>>>> - Why do we have a variable called "mce_bad_pages"? MCE is an x86 >>>>> concept, and this code is in mm/. Lights are flashing, bells are >>>>> ringing and a loudspeaker is blaring "layering violation" at us! >>>>> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 11:39 ` Wanpeng Li @ 2012-12-10 11:54 ` Xishi Qiu 2012-12-10 12:11 ` Borislav Petkov 2012-12-10 15:39 ` Andi Kleen 1 sibling, 1 reply; 38+ messages in thread From: Xishi Qiu @ 2012-12-10 11:54 UTC (permalink / raw) To: Wanpeng Li Cc: Simon Jeons, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency On 2012/12/10 19:39, Wanpeng Li wrote: > On Mon, Dec 10, 2012 at 07:16:50PM +0800, Xishi Qiu wrote: >> On 2012/12/10 18:47, Simon Jeons wrote: >> >>> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote: >>>> On 2012/12/10 16:33, Wanpeng Li wrote: >>>> >>>>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: >>>>>> On Fri, 7 Dec 2012 16:48:45 +0800 >>>>>> Xishi Qiu <qiuxishi@huawei.com> wrote: >>>>>> >>>>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a >>>>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error, >>>>>>> since the page was already marked HWPoison, we should skip the page and don't add the >>>>>>> value of mce_bad_pages. >>>>>>> >>>>>>> $ cat /proc/meminfo | grep HardwareCorrupted >>>>>>> >>>>>>> soft_offline_page() >>>>>>> get_any_page() >>>>>>> atomic_long_add(1, &mce_bad_pages) >>>>>>> >>>>>>> ... >>>>>>> >>>>>>> --- a/mm/memory-failure.c >>>>>>> +++ b/mm/memory-failure.c >>>>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) >>>>>>> return ret; >>>>>>> >>>>>>> done: >>>>>>> - atomic_long_add(1, &mce_bad_pages); >>>>>>> - SetPageHWPoison(page); >>>>>>> /* keep elevated page count for bad page */ >>>>>>> + if (!PageHWPoison(page)) { >>>>>>> + atomic_long_add(1, &mce_bad_pages); >>>>>>> + SetPageHWPoison(page); >>>>>>> + } >>>>>>> + >>>>>>> return ret; >>>>>>> } >>>>>> >>>>>> A few things: >>>>>> >>>>>> - soft_offline_page() already checks for this case: >>>>>> >>>>>> if (PageHWPoison(page)) { >>>>>> unlock_page(page); >>>>>> put_page(page); >>>>>> pr_info("soft offline: %#lx page already poisoned\n", pfn); >>>>>> return -EBUSY; >>>>>> } >>>>>> >>>>>> so why didn't this check work for you? >>>>>> >>>>>> Presumably because one of the earlier "goto done" branches was >>>>>> taken. Which one, any why? >>>>>> >>>>>> This function is an utter mess. It contains six return points >>>>>> randomly intermingled with three "goto done" return points. >>>>>> >>>>>> This mess is probably the cause of the bug you have observed. Can >>>>>> we please fix it up somehow? It *seems* that the design (lol) of >>>>>> this function is "for errors, return immediately. For success, goto >>>>>> done". In which case "done" should have been called "success". But >>>>>> if you just look at the function you'll see that this approach didn't >>>>>> work. I suggest it be converted to have two return points - one for >>>>>> the success path, one for the failure path. Or something. >>>>>> >>>>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page() >>>>>> and might suffer the same bug. >>>>>> >>>>>> - A cleaner, shorter and possibly faster implementation is >>>>>> >>>>>> if (!TestSetPageHWPoison(page)) >>>>>> atomic_long_add(1, &mce_bad_pages); >>>>>> >>>>> >>>>> Hi Andrew, >>>>> >>>>> Since hwpoison bit for free buddy page has already be set in get_any_page, >>>>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for >>>>> the first time. >>>>> >>>>> Regards, >>>>> Wanpeng Li >>>>> >>>> >>>> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated >>>> immediately in soft_offline_page() and memory_failure()? >>>> >>>> buffered_rmqueue() >>>> prep_new_page() >>>> check_new_page() >>>> bad_page() >>> >>> Do you mean else if(is_free_buddy_page(p)) branch is redundancy? >>> >> >> Hi Simon, >> >> get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy. >> >> It is another topic, I mean since the page is poisoned, so why not isolate it >>from page buddy alocator in soft_offline_page() rather than in check_new_page(). >> >> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned >> page is still managed by page buddy alocator. >> > > Hi Xishi, > > HWPoison delays any action on buddy allocator pages, handling can be safely postponed > until a later time when the page might be referenced. By delaying, some transient errors > may not reoccur or may be irrelevant. > > Regards, > Wanpeng Li > Hi Wanpeng, thanks for your explanation. One more question, can we add a list_head to manager the poisoned pages? I find ia64 has the array which named "static struct page *page_isolate[MAX_PAGE_ISOLATE]". Andrew, what do you think? Thanks Xishi Qiu >>>> >>>> Thanks >>>> Xishi Qiu >>>> >>>>>> - We have atomic_long_inc(). Use it? >>>>>> >>>>>> - Why do we have a variable called "mce_bad_pages"? MCE is an x86 >>>>>> concept, and this code is in mm/. Lights are flashing, bells are >>>>>> ringing and a loudspeaker is blaring "layering violation" at us! >>>>>> >> > > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 11:54 ` Xishi Qiu @ 2012-12-10 12:11 ` Borislav Petkov 0 siblings, 0 replies; 38+ messages in thread From: Borislav Petkov @ 2012-12-10 12:11 UTC (permalink / raw) To: Xishi Qiu Cc: Wanpeng Li, Simon Jeons, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, andi, linux-mm, linux-kernel, wency On Mon, Dec 10, 2012 at 07:54:53PM +0800, Xishi Qiu wrote: > One more question, can we add a list_head to manager the poisoned pages? What would you need that list for? Also, a list is not the most optimal data structure for when you need to traverse it often. Thanks. -- Regards/Gruss, Boris. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 11:39 ` Wanpeng Li 2012-12-10 11:54 ` Xishi Qiu @ 2012-12-10 15:39 ` Andi Kleen 1 sibling, 0 replies; 38+ messages in thread From: Andi Kleen @ 2012-12-10 15:39 UTC (permalink / raw) To: Wanpeng Li Cc: Xishi Qiu, Simon Jeons, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency > HWPoison delays any action on buddy allocator pages, handling can be safely postponed > until a later time when the page might be referenced. By delaying, some transient errors > may not reoccur or may be irrelevant. That's not true for soft offlining, only for hard. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 11:16 ` Xishi Qiu 2012-12-10 11:39 ` Wanpeng Li @ 2012-12-10 11:39 ` Wanpeng Li 2012-12-10 11:58 ` Simon Jeons ` (2 subsequent siblings) 4 siblings, 0 replies; 38+ messages in thread From: Wanpeng Li @ 2012-12-10 11:39 UTC (permalink / raw) To: Xishi Qiu Cc: Simon Jeons, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency On Mon, Dec 10, 2012 at 07:16:50PM +0800, Xishi Qiu wrote: >On 2012/12/10 18:47, Simon Jeons wrote: > >> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote: >>> On 2012/12/10 16:33, Wanpeng Li wrote: >>> >>>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: >>>>> On Fri, 7 Dec 2012 16:48:45 +0800 >>>>> Xishi Qiu <qiuxishi@huawei.com> wrote: >>>>> >>>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a >>>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error, >>>>>> since the page was already marked HWPoison, we should skip the page and don't add the >>>>>> value of mce_bad_pages. >>>>>> >>>>>> $ cat /proc/meminfo | grep HardwareCorrupted >>>>>> >>>>>> soft_offline_page() >>>>>> get_any_page() >>>>>> atomic_long_add(1, &mce_bad_pages) >>>>>> >>>>>> ... >>>>>> >>>>>> --- a/mm/memory-failure.c >>>>>> +++ b/mm/memory-failure.c >>>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) >>>>>> return ret; >>>>>> >>>>>> done: >>>>>> - atomic_long_add(1, &mce_bad_pages); >>>>>> - SetPageHWPoison(page); >>>>>> /* keep elevated page count for bad page */ >>>>>> + if (!PageHWPoison(page)) { >>>>>> + atomic_long_add(1, &mce_bad_pages); >>>>>> + SetPageHWPoison(page); >>>>>> + } >>>>>> + >>>>>> return ret; >>>>>> } >>>>> >>>>> A few things: >>>>> >>>>> - soft_offline_page() already checks for this case: >>>>> >>>>> if (PageHWPoison(page)) { >>>>> unlock_page(page); >>>>> put_page(page); >>>>> pr_info("soft offline: %#lx page already poisoned\n", pfn); >>>>> return -EBUSY; >>>>> } >>>>> >>>>> so why didn't this check work for you? >>>>> >>>>> Presumably because one of the earlier "goto done" branches was >>>>> taken. Which one, any why? >>>>> >>>>> This function is an utter mess. It contains six return points >>>>> randomly intermingled with three "goto done" return points. >>>>> >>>>> This mess is probably the cause of the bug you have observed. Can >>>>> we please fix it up somehow? It *seems* that the design (lol) of >>>>> this function is "for errors, return immediately. For success, goto >>>>> done". In which case "done" should have been called "success". But >>>>> if you just look at the function you'll see that this approach didn't >>>>> work. I suggest it be converted to have two return points - one for >>>>> the success path, one for the failure path. Or something. >>>>> >>>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page() >>>>> and might suffer the same bug. >>>>> >>>>> - A cleaner, shorter and possibly faster implementation is >>>>> >>>>> if (!TestSetPageHWPoison(page)) >>>>> atomic_long_add(1, &mce_bad_pages); >>>>> >>>> >>>> Hi Andrew, >>>> >>>> Since hwpoison bit for free buddy page has already be set in get_any_page, >>>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for >>>> the first time. >>>> >>>> Regards, >>>> Wanpeng Li >>>> >>> >>> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated >>> immediately in soft_offline_page() and memory_failure()? >>> >>> buffered_rmqueue() >>> prep_new_page() >>> check_new_page() >>> bad_page() >> >> Do you mean else if(is_free_buddy_page(p)) branch is redundancy? >> > >Hi Simon, > >get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy. > >It is another topic, I mean since the page is poisoned, so why not isolate it >from page buddy alocator in soft_offline_page() rather than in check_new_page(). > >I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned >page is still managed by page buddy alocator. > Hi Xishi, HWPoison delays any action on buddy allocator pages, handling can be safely postponed until a later time when the page might be referenced. By delaying, some transient errors may not reoccur or may be irrelevant. Regards, Wanpeng Li >>> >>> Thanks >>> Xishi Qiu >>> >>>>> - We have atomic_long_inc(). Use it? >>>>> >>>>> - Why do we have a variable called "mce_bad_pages"? MCE is an x86 >>>>> concept, and this code is in mm/. Lights are flashing, bells are >>>>> ringing and a loudspeaker is blaring "layering violation" at us! >>>>> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 11:16 ` Xishi Qiu 2012-12-10 11:39 ` Wanpeng Li 2012-12-10 11:39 ` Wanpeng Li @ 2012-12-10 11:58 ` Simon Jeons [not found] ` <1355140561.1821.5.camel@kernel.cn.ibm.com> 2012-12-10 15:38 ` Andi Kleen 4 siblings, 0 replies; 38+ messages in thread From: Simon Jeons @ 2012-12-10 11:58 UTC (permalink / raw) To: Xishi Qiu Cc: Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency On Mon, 2012-12-10 at 19:16 +0800, Xishi Qiu wrote: > On 2012/12/10 18:47, Simon Jeons wrote: > > > On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote: > >> On 2012/12/10 16:33, Wanpeng Li wrote: > >> > >>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: > >>>> On Fri, 7 Dec 2012 16:48:45 +0800 > >>>> Xishi Qiu <qiuxishi@huawei.com> wrote: > >>>> > >>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a > >>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error, > >>>>> since the page was already marked HWPoison, we should skip the page and don't add the > >>>>> value of mce_bad_pages. > >>>>> > >>>>> $ cat /proc/meminfo | grep HardwareCorrupted > >>>>> > >>>>> soft_offline_page() > >>>>> get_any_page() > >>>>> atomic_long_add(1, &mce_bad_pages) > >>>>> > >>>>> ... > >>>>> > >>>>> --- a/mm/memory-failure.c > >>>>> +++ b/mm/memory-failure.c > >>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) > >>>>> return ret; > >>>>> > >>>>> done: > >>>>> - atomic_long_add(1, &mce_bad_pages); > >>>>> - SetPageHWPoison(page); > >>>>> /* keep elevated page count for bad page */ > >>>>> + if (!PageHWPoison(page)) { > >>>>> + atomic_long_add(1, &mce_bad_pages); > >>>>> + SetPageHWPoison(page); > >>>>> + } > >>>>> + > >>>>> return ret; > >>>>> } > >>>> > >>>> A few things: > >>>> > >>>> - soft_offline_page() already checks for this case: > >>>> > >>>> if (PageHWPoison(page)) { > >>>> unlock_page(page); > >>>> put_page(page); > >>>> pr_info("soft offline: %#lx page already poisoned\n", pfn); > >>>> return -EBUSY; > >>>> } > >>>> > >>>> so why didn't this check work for you? > >>>> > >>>> Presumably because one of the earlier "goto done" branches was > >>>> taken. Which one, any why? > >>>> > >>>> This function is an utter mess. It contains six return points > >>>> randomly intermingled with three "goto done" return points. > >>>> > >>>> This mess is probably the cause of the bug you have observed. Can > >>>> we please fix it up somehow? It *seems* that the design (lol) of > >>>> this function is "for errors, return immediately. For success, goto > >>>> done". In which case "done" should have been called "success". But > >>>> if you just look at the function you'll see that this approach didn't > >>>> work. I suggest it be converted to have two return points - one for > >>>> the success path, one for the failure path. Or something. > >>>> > >>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page() > >>>> and might suffer the same bug. > >>>> > >>>> - A cleaner, shorter and possibly faster implementation is > >>>> > >>>> if (!TestSetPageHWPoison(page)) > >>>> atomic_long_add(1, &mce_bad_pages); > >>>> > >>> > >>> Hi Andrew, > >>> > >>> Since hwpoison bit for free buddy page has already be set in get_any_page, > >>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for > >>> the first time. > >>> > >>> Regards, > >>> Wanpeng Li > >>> > >> > >> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated > >> immediately in soft_offline_page() and memory_failure()? > >> > >> buffered_rmqueue() > >> prep_new_page() > >> check_new_page() > >> bad_page() > > > > Do you mean else if(is_free_buddy_page(p)) branch is redundancy? > > > > Hi Simon, > > get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy. > > It is another topic, I mean since the page is poisoned, so why not isolate it What topic? I still can't figure out when this branch can be executed since hwpoison inject path can't poison free buddy pages. > from page buddy alocator in soft_offline_page() rather than in check_new_page(). > > I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned > page is still managed by page buddy alocator. > > >> > >> Thanks > >> Xishi Qiu > >> > >>>> - We have atomic_long_inc(). Use it? > >>>> > >>>> - Why do we have a variable called "mce_bad_pages"? MCE is an x86 > >>>> concept, and this code is in mm/. Lights are flashing, bells are > >>>> ringing and a loudspeaker is blaring "layering violation" at us! > >>>> > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
[parent not found: <1355140561.1821.5.camel@kernel.cn.ibm.com>]
[parent not found: <50C5D844.8050707@huawei.com>]
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics [not found] ` <50C5D844.8050707@huawei.com> @ 2012-12-10 12:47 ` Simon Jeons 2012-12-11 1:16 ` Wanpeng Li 2012-12-11 1:16 ` Wanpeng Li 0 siblings, 2 replies; 38+ messages in thread From: Simon Jeons @ 2012-12-10 12:47 UTC (permalink / raw) To: Xishi Qiu Cc: Wanpeng Li, Simon Jeons, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency Cc other guys. On Mon, 2012-12-10 at 20:40 +0800, Xishi Qiu wrote: > On 2012/12/10 19:56, Simon Jeons wrote: > > > On Mon, 2012-12-10 at 19:16 +0800, Xishi Qiu wrote: > >> On 2012/12/10 18:47, Simon Jeons wrote: > >> > >>> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote: > >>>> On 2012/12/10 16:33, Wanpeng Li wrote: > >>>> > >>>>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: > >>>>>> On Fri, 7 Dec 2012 16:48:45 +0800 > >>>>>> Xishi Qiu <qiuxishi@huawei.com> wrote: > >>>>>> > >>>>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a > >>>>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error, > >>>>>>> since the page was already marked HWPoison, we should skip the page and don't add the > >>>>>>> value of mce_bad_pages. > >>>>>>> > >>>>>>> $ cat /proc/meminfo | grep HardwareCorrupted > >>>>>>> > >>>>>>> soft_offline_page() > >>>>>>> get_any_page() > >>>>>>> atomic_long_add(1, &mce_bad_pages) > >>>>>>> > >>>>>>> ... > >>>>>>> > >>>>>>> --- a/mm/memory-failure.c > >>>>>>> +++ b/mm/memory-failure.c > >>>>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) > >>>>>>> return ret; > >>>>>>> > >>>>>>> done: > >>>>>>> - atomic_long_add(1, &mce_bad_pages); > >>>>>>> - SetPageHWPoison(page); > >>>>>>> /* keep elevated page count for bad page */ > >>>>>>> + if (!PageHWPoison(page)) { > >>>>>>> + atomic_long_add(1, &mce_bad_pages); > >>>>>>> + SetPageHWPoison(page); > >>>>>>> + } > >>>>>>> + > >>>>>>> return ret; > >>>>>>> } > >>>>>> > >>>>>> A few things: > >>>>>> > >>>>>> - soft_offline_page() already checks for this case: > >>>>>> > >>>>>> if (PageHWPoison(page)) { > >>>>>> unlock_page(page); > >>>>>> put_page(page); > >>>>>> pr_info("soft offline: %#lx page already poisoned\n", pfn); > >>>>>> return -EBUSY; > >>>>>> } > >>>>>> > >>>>>> so why didn't this check work for you? > >>>>>> > >>>>>> Presumably because one of the earlier "goto done" branches was > >>>>>> taken. Which one, any why? > >>>>>> > >>>>>> This function is an utter mess. It contains six return points > >>>>>> randomly intermingled with three "goto done" return points. > >>>>>> > >>>>>> This mess is probably the cause of the bug you have observed. Can > >>>>>> we please fix it up somehow? It *seems* that the design (lol) of > >>>>>> this function is "for errors, return immediately. For success, goto > >>>>>> done". In which case "done" should have been called "success". But > >>>>>> if you just look at the function you'll see that this approach didn't > >>>>>> work. I suggest it be converted to have two return points - one for > >>>>>> the success path, one for the failure path. Or something. > >>>>>> > >>>>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page() > >>>>>> and might suffer the same bug. > >>>>>> > >>>>>> - A cleaner, shorter and possibly faster implementation is > >>>>>> > >>>>>> if (!TestSetPageHWPoison(page)) > >>>>>> atomic_long_add(1, &mce_bad_pages); > >>>>>> > >>>>> > >>>>> Hi Andrew, > >>>>> > >>>>> Since hwpoison bit for free buddy page has already be set in get_any_page, > >>>>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for > >>>>> the first time. > >>>>> > >>>>> Regards, > >>>>> Wanpeng Li > >>>>> > >>>> > >>>> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated > >>>> immediately in soft_offline_page() and memory_failure()? > >>>> > >>>> buffered_rmqueue() > >>>> prep_new_page() > >>>> check_new_page() > >>>> bad_page() > >>> > >>> Do you mean else if(is_free_buddy_page(p)) branch is redundancy? > >>> > >> > >> Hi Simon, > >> > >> get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy. > >> > >> It is another topic, I mean since the page is poisoned, so why not isolate it > > > > What topic? I still can't figure out when this branch can be executed > > since hwpoison inject path can't poison free buddy pages. > > > > Hi Simon, > > If we use "/sys/devices/system/memory/soft_offline_page" to offline a > free page, the value of mce_bad_pages will be added. Then the page is marked > HWPoison, but it is still managed by page buddy alocator. > > So if we offline it again, the value of mce_bad_pages will be added again. > Assume the page is not allocated during this short time. > > soft_offline_page() > get_any_page() > "else if (is_free_buddy_page(p))" branch return 0 > "goto done"; > "atomic_long_add(1, &mce_bad_pages);" > > I think it would be better to move "if(PageHWPoison(page))" at the beginning of > soft_offline_page(). However I don't know what do these words mean, > "Synchronized using the page lock with memory_failure()" > > >> from page buddy alocator in soft_offline_page() rather than in check_new_page(). > >> > >> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned > >> page is still managed by page buddy alocator. > >> > >>>> > >>>> Thanks > >>>> Xishi Qiu > >>>> > >>>>>> - We have atomic_long_inc(). Use it? > >>>>>> > >>>>>> - Why do we have a variable called "mce_bad_pages"? MCE is an x86 > >>>>>> concept, and this code is in mm/. Lights are flashing, bells are > >>>>>> ringing and a loudspeaker is blaring "layering violation" at us! > >>>>>> > >> > >> > > > > > > > > . > > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 12:47 ` Simon Jeons @ 2012-12-11 1:16 ` Wanpeng Li 2012-12-11 1:16 ` Wanpeng Li 1 sibling, 0 replies; 38+ messages in thread From: Wanpeng Li @ 2012-12-11 1:16 UTC (permalink / raw) To: Xishi Qiu Cc: Simon Jeons, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency On Mon, Dec 10, 2012 at 06:47:44AM -0600, Simon Jeons wrote: >Cc other guys. > >On Mon, 2012-12-10 at 20:40 +0800, Xishi Qiu wrote: >> On 2012/12/10 19:56, Simon Jeons wrote: >> >> > On Mon, 2012-12-10 at 19:16 +0800, Xishi Qiu wrote: >> >> On 2012/12/10 18:47, Simon Jeons wrote: >> >> >> >>> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote: >> >>>> On 2012/12/10 16:33, Wanpeng Li wrote: >> >>>> >> >>>>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: >> >>>>>> On Fri, 7 Dec 2012 16:48:45 +0800 >> >>>>>> Xishi Qiu <qiuxishi@huawei.com> wrote: >> >>>>>> >> >>>>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a >> >>>>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error, >> >>>>>>> since the page was already marked HWPoison, we should skip the page and don't add the >> >>>>>>> value of mce_bad_pages. >> >>>>>>> >> >>>>>>> $ cat /proc/meminfo | grep HardwareCorrupted >> >>>>>>> >> >>>>>>> soft_offline_page() >> >>>>>>> get_any_page() >> >>>>>>> atomic_long_add(1, &mce_bad_pages) >> >>>>>>> >> >>>>>>> ... >> >>>>>>> >> >>>>>>> --- a/mm/memory-failure.c >> >>>>>>> +++ b/mm/memory-failure.c >> >>>>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) >> >>>>>>> return ret; >> >>>>>>> >> >>>>>>> done: >> >>>>>>> - atomic_long_add(1, &mce_bad_pages); >> >>>>>>> - SetPageHWPoison(page); >> >>>>>>> /* keep elevated page count for bad page */ >> >>>>>>> + if (!PageHWPoison(page)) { >> >>>>>>> + atomic_long_add(1, &mce_bad_pages); >> >>>>>>> + SetPageHWPoison(page); >> >>>>>>> + } >> >>>>>>> + >> >>>>>>> return ret; >> >>>>>>> } >> >>>>>> >> >>>>>> A few things: >> >>>>>> >> >>>>>> - soft_offline_page() already checks for this case: >> >>>>>> >> >>>>>> if (PageHWPoison(page)) { >> >>>>>> unlock_page(page); >> >>>>>> put_page(page); >> >>>>>> pr_info("soft offline: %#lx page already poisoned\n", pfn); >> >>>>>> return -EBUSY; >> >>>>>> } >> >>>>>> >> >>>>>> so why didn't this check work for you? >> >>>>>> >> >>>>>> Presumably because one of the earlier "goto done" branches was >> >>>>>> taken. Which one, any why? >> >>>>>> >> >>>>>> This function is an utter mess. It contains six return points >> >>>>>> randomly intermingled with three "goto done" return points. >> >>>>>> >> >>>>>> This mess is probably the cause of the bug you have observed. Can >> >>>>>> we please fix it up somehow? It *seems* that the design (lol) of >> >>>>>> this function is "for errors, return immediately. For success, goto >> >>>>>> done". In which case "done" should have been called "success". But >> >>>>>> if you just look at the function you'll see that this approach didn't >> >>>>>> work. I suggest it be converted to have two return points - one for >> >>>>>> the success path, one for the failure path. Or something. >> >>>>>> >> >>>>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page() >> >>>>>> and might suffer the same bug. >> >>>>>> >> >>>>>> - A cleaner, shorter and possibly faster implementation is >> >>>>>> >> >>>>>> if (!TestSetPageHWPoison(page)) >> >>>>>> atomic_long_add(1, &mce_bad_pages); >> >>>>>> >> >>>>> >> >>>>> Hi Andrew, >> >>>>> >> >>>>> Since hwpoison bit for free buddy page has already be set in get_any_page, >> >>>>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for >> >>>>> the first time. >> >>>>> >> >>>>> Regards, >> >>>>> Wanpeng Li >> >>>>> >> >>>> >> >>>> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated >> >>>> immediately in soft_offline_page() and memory_failure()? >> >>>> >> >>>> buffered_rmqueue() >> >>>> prep_new_page() >> >>>> check_new_page() >> >>>> bad_page() >> >>> >> >>> Do you mean else if(is_free_buddy_page(p)) branch is redundancy? >> >>> >> >> >> >> Hi Simon, >> >> >> >> get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy. >> >> >> >> It is another topic, I mean since the page is poisoned, so why not isolate it >> > >> > What topic? I still can't figure out when this branch can be executed >> > since hwpoison inject path can't poison free buddy pages. >> > >> >> Hi Simon, >> >> If we use "/sys/devices/system/memory/soft_offline_page" to offline a >> free page, the value of mce_bad_pages will be added. Then the page is marked >> HWPoison, but it is still managed by page buddy alocator. >> >> So if we offline it again, the value of mce_bad_pages will be added again. >> Assume the page is not allocated during this short time. >> >> soft_offline_page() >> get_any_page() >> "else if (is_free_buddy_page(p))" branch return 0 >> "goto done"; >> "atomic_long_add(1, &mce_bad_pages);" >> >> I think it would be better to move "if(PageHWPoison(page))" at the beginning of >> soft_offline_page(). However I don't know what do these words mean, >> "Synchronized using the page lock with memory_failure()" Hi Xishi, Unpoison will clear PG_hwpoison flag after hold page lock, memory_failure() and soft_offline_page() take the lock to avoid unpoison clear the flag behind them. Regards, Wanpeng Li >> >> >> from page buddy alocator in soft_offline_page() rather than in check_new_page(). >> >> >> >> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned >> >> page is still managed by page buddy alocator. >> >> >> >>>> >> >>>> Thanks >> >>>> Xishi Qiu >> >>>> >> >>>>>> - We have atomic_long_inc(). Use it? >> >>>>>> >> >>>>>> - Why do we have a variable called "mce_bad_pages"? MCE is an x86 >> >>>>>> concept, and this code is in mm/. Lights are flashing, bells are >> >>>>>> ringing and a loudspeaker is blaring "layering violation" at us! >> >>>>>> >> >> >> >> >> > >> > >> > >> > . >> > >> >> >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 12:47 ` Simon Jeons 2012-12-11 1:16 ` Wanpeng Li @ 2012-12-11 1:16 ` Wanpeng Li 2012-12-11 6:49 ` Xishi Qiu 1 sibling, 1 reply; 38+ messages in thread From: Wanpeng Li @ 2012-12-11 1:16 UTC (permalink / raw) To: Xishi Qiu Cc: Simon Jeons, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency On Mon, Dec 10, 2012 at 06:47:44AM -0600, Simon Jeons wrote: >Cc other guys. > >On Mon, 2012-12-10 at 20:40 +0800, Xishi Qiu wrote: >> On 2012/12/10 19:56, Simon Jeons wrote: >> >> > On Mon, 2012-12-10 at 19:16 +0800, Xishi Qiu wrote: >> >> On 2012/12/10 18:47, Simon Jeons wrote: >> >> >> >>> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote: >> >>>> On 2012/12/10 16:33, Wanpeng Li wrote: >> >>>> >> >>>>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: >> >>>>>> On Fri, 7 Dec 2012 16:48:45 +0800 >> >>>>>> Xishi Qiu <qiuxishi@huawei.com> wrote: >> >>>>>> >> >>>>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a >> >>>>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error, >> >>>>>>> since the page was already marked HWPoison, we should skip the page and don't add the >> >>>>>>> value of mce_bad_pages. >> >>>>>>> >> >>>>>>> $ cat /proc/meminfo | grep HardwareCorrupted >> >>>>>>> >> >>>>>>> soft_offline_page() >> >>>>>>> get_any_page() >> >>>>>>> atomic_long_add(1, &mce_bad_pages) >> >>>>>>> >> >>>>>>> ... >> >>>>>>> >> >>>>>>> --- a/mm/memory-failure.c >> >>>>>>> +++ b/mm/memory-failure.c >> >>>>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) >> >>>>>>> return ret; >> >>>>>>> >> >>>>>>> done: >> >>>>>>> - atomic_long_add(1, &mce_bad_pages); >> >>>>>>> - SetPageHWPoison(page); >> >>>>>>> /* keep elevated page count for bad page */ >> >>>>>>> + if (!PageHWPoison(page)) { >> >>>>>>> + atomic_long_add(1, &mce_bad_pages); >> >>>>>>> + SetPageHWPoison(page); >> >>>>>>> + } >> >>>>>>> + >> >>>>>>> return ret; >> >>>>>>> } >> >>>>>> >> >>>>>> A few things: >> >>>>>> >> >>>>>> - soft_offline_page() already checks for this case: >> >>>>>> >> >>>>>> if (PageHWPoison(page)) { >> >>>>>> unlock_page(page); >> >>>>>> put_page(page); >> >>>>>> pr_info("soft offline: %#lx page already poisoned\n", pfn); >> >>>>>> return -EBUSY; >> >>>>>> } >> >>>>>> >> >>>>>> so why didn't this check work for you? >> >>>>>> >> >>>>>> Presumably because one of the earlier "goto done" branches was >> >>>>>> taken. Which one, any why? >> >>>>>> >> >>>>>> This function is an utter mess. It contains six return points >> >>>>>> randomly intermingled with three "goto done" return points. >> >>>>>> >> >>>>>> This mess is probably the cause of the bug you have observed. Can >> >>>>>> we please fix it up somehow? It *seems* that the design (lol) of >> >>>>>> this function is "for errors, return immediately. For success, goto >> >>>>>> done". In which case "done" should have been called "success". But >> >>>>>> if you just look at the function you'll see that this approach didn't >> >>>>>> work. I suggest it be converted to have two return points - one for >> >>>>>> the success path, one for the failure path. Or something. >> >>>>>> >> >>>>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page() >> >>>>>> and might suffer the same bug. >> >>>>>> >> >>>>>> - A cleaner, shorter and possibly faster implementation is >> >>>>>> >> >>>>>> if (!TestSetPageHWPoison(page)) >> >>>>>> atomic_long_add(1, &mce_bad_pages); >> >>>>>> >> >>>>> >> >>>>> Hi Andrew, >> >>>>> >> >>>>> Since hwpoison bit for free buddy page has already be set in get_any_page, >> >>>>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for >> >>>>> the first time. >> >>>>> >> >>>>> Regards, >> >>>>> Wanpeng Li >> >>>>> >> >>>> >> >>>> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated >> >>>> immediately in soft_offline_page() and memory_failure()? >> >>>> >> >>>> buffered_rmqueue() >> >>>> prep_new_page() >> >>>> check_new_page() >> >>>> bad_page() >> >>> >> >>> Do you mean else if(is_free_buddy_page(p)) branch is redundancy? >> >>> >> >> >> >> Hi Simon, >> >> >> >> get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy. >> >> >> >> It is another topic, I mean since the page is poisoned, so why not isolate it >> > >> > What topic? I still can't figure out when this branch can be executed >> > since hwpoison inject path can't poison free buddy pages. >> > >> >> Hi Simon, >> >> If we use "/sys/devices/system/memory/soft_offline_page" to offline a >> free page, the value of mce_bad_pages will be added. Then the page is marked >> HWPoison, but it is still managed by page buddy alocator. >> >> So if we offline it again, the value of mce_bad_pages will be added again. >> Assume the page is not allocated during this short time. >> >> soft_offline_page() >> get_any_page() >> "else if (is_free_buddy_page(p))" branch return 0 >> "goto done"; >> "atomic_long_add(1, &mce_bad_pages);" >> >> I think it would be better to move "if(PageHWPoison(page))" at the beginning of >> soft_offline_page(). However I don't know what do these words mean, >> "Synchronized using the page lock with memory_failure()" Hi Xishi, Unpoison will clear PG_hwpoison flag after hold page lock, memory_failure() and soft_offline_page() take the lock to avoid unpoison clear the flag behind them. Regards, Wanpeng Li >> >> >> from page buddy alocator in soft_offline_page() rather than in check_new_page(). >> >> >> >> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned >> >> page is still managed by page buddy alocator. >> >> >> >>>> >> >>>> Thanks >> >>>> Xishi Qiu >> >>>> >> >>>>>> - We have atomic_long_inc(). Use it? >> >>>>>> >> >>>>>> - Why do we have a variable called "mce_bad_pages"? MCE is an x86 >> >>>>>> concept, and this code is in mm/. Lights are flashing, bells are >> >>>>>> ringing and a loudspeaker is blaring "layering violation" at us! >> >>>>>> >> >> >> >> >> > >> > >> > >> > . >> > >> >> >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 1:16 ` Wanpeng Li @ 2012-12-11 6:49 ` Xishi Qiu 2012-12-11 8:02 ` Wanpeng Li 2012-12-11 8:02 ` Wanpeng Li 0 siblings, 2 replies; 38+ messages in thread From: Xishi Qiu @ 2012-12-11 6:49 UTC (permalink / raw) To: Wanpeng Li Cc: Simon Jeons, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency >>> Hi Simon, >>> >>> If we use "/sys/devices/system/memory/soft_offline_page" to offline a >>> free page, the value of mce_bad_pages will be added. Then the page is marked >>> HWPoison, but it is still managed by page buddy alocator. >>> >>> So if we offline it again, the value of mce_bad_pages will be added again. >>> Assume the page is not allocated during this short time. >>> >>> soft_offline_page() >>> get_any_page() >>> "else if (is_free_buddy_page(p))" branch return 0 >>> "goto done"; >>> "atomic_long_add(1, &mce_bad_pages);" >>> >>> I think it would be better to move "if(PageHWPoison(page))" at the beginning of >>> soft_offline_page(). However I don't know what do these words mean, >>> "Synchronized using the page lock with memory_failure()" > > Hi Xishi, > > Unpoison will clear PG_hwpoison flag after hold page lock, memory_failure() and > soft_offline_page() take the lock to avoid unpoison clear the flag behind them. > > Regards, > Wanpeng Li > Hi Wanpeng, As you mean, it is the necessary to get the page lock first when we check the HWPoison flag every time, this is in order to avoid conflict, right? So why not use a globe lock here? For example lock_memory_hotplug() is used in online_pages() and offline_pages()? Thanks, Xishi Qiu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 6:49 ` Xishi Qiu @ 2012-12-11 8:02 ` Wanpeng Li 2012-12-11 8:02 ` Wanpeng Li 1 sibling, 0 replies; 38+ messages in thread From: Wanpeng Li @ 2012-12-11 8:02 UTC (permalink / raw) To: Xishi Qiu Cc: Simon Jeons, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency On Tue, Dec 11, 2012 at 02:49:06PM +0800, Xishi Qiu wrote: >>>> Hi Simon, > >>>> >>>> If we use "/sys/devices/system/memory/soft_offline_page" to offline a >>>> free page, the value of mce_bad_pages will be added. Then the page is marked >>>> HWPoison, but it is still managed by page buddy alocator. >>>> >>>> So if we offline it again, the value of mce_bad_pages will be added again. >>>> Assume the page is not allocated during this short time. >>>> >>>> soft_offline_page() >>>> get_any_page() >>>> "else if (is_free_buddy_page(p))" branch return 0 >>>> "goto done"; >>>> "atomic_long_add(1, &mce_bad_pages);" >>>> >>>> I think it would be better to move "if(PageHWPoison(page))" at the beginning of >>>> soft_offline_page(). However I don't know what do these words mean, >>>> "Synchronized using the page lock with memory_failure()" >> >> Hi Xishi, >> >> Unpoison will clear PG_hwpoison flag after hold page lock, memory_failure() and >> soft_offline_page() take the lock to avoid unpoison clear the flag behind them. >> >> Regards, >> Wanpeng Li >> > >Hi Wanpeng, > >As you mean, it is the necessary to get the page lock first when we check the >HWPoison flag every time, this is in order to avoid conflict, right? > Hi Xishi, Avoid race. >So why not use a globe lock here? For example lock_memory_hotplug() is used in >online_pages() and offline_pages()? Just for a single page, a global lock maybe more contend. Regards, Wanpeng Li > >Thanks, >Xishi Qiu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 6:49 ` Xishi Qiu 2012-12-11 8:02 ` Wanpeng Li @ 2012-12-11 8:02 ` Wanpeng Li 1 sibling, 0 replies; 38+ messages in thread From: Wanpeng Li @ 2012-12-11 8:02 UTC (permalink / raw) To: Xishi Qiu Cc: Simon Jeons, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency On Tue, Dec 11, 2012 at 02:49:06PM +0800, Xishi Qiu wrote: >>>> Hi Simon, > >>>> >>>> If we use "/sys/devices/system/memory/soft_offline_page" to offline a >>>> free page, the value of mce_bad_pages will be added. Then the page is marked >>>> HWPoison, but it is still managed by page buddy alocator. >>>> >>>> So if we offline it again, the value of mce_bad_pages will be added again. >>>> Assume the page is not allocated during this short time. >>>> >>>> soft_offline_page() >>>> get_any_page() >>>> "else if (is_free_buddy_page(p))" branch return 0 >>>> "goto done"; >>>> "atomic_long_add(1, &mce_bad_pages);" >>>> >>>> I think it would be better to move "if(PageHWPoison(page))" at the beginning of >>>> soft_offline_page(). However I don't know what do these words mean, >>>> "Synchronized using the page lock with memory_failure()" >> >> Hi Xishi, >> >> Unpoison will clear PG_hwpoison flag after hold page lock, memory_failure() and >> soft_offline_page() take the lock to avoid unpoison clear the flag behind them. >> >> Regards, >> Wanpeng Li >> > >Hi Wanpeng, > >As you mean, it is the necessary to get the page lock first when we check the >HWPoison flag every time, this is in order to avoid conflict, right? > Hi Xishi, Avoid race. >So why not use a globe lock here? For example lock_memory_hotplug() is used in >online_pages() and offline_pages()? Just for a single page, a global lock maybe more contend. Regards, Wanpeng Li > >Thanks, >Xishi Qiu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 11:16 ` Xishi Qiu ` (3 preceding siblings ...) [not found] ` <1355140561.1821.5.camel@kernel.cn.ibm.com> @ 2012-12-10 15:38 ` Andi Kleen 2012-12-11 1:49 ` Simon Jeons 2012-12-11 2:25 ` Xishi Qiu 4 siblings, 2 replies; 38+ messages in thread From: Andi Kleen @ 2012-12-10 15:38 UTC (permalink / raw) To: Xishi Qiu Cc: Simon Jeons, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm, linux-kernel, wency > It is another topic, I mean since the page is poisoned, so why not isolate it > from page buddy alocator in soft_offline_page() rather than in check_new_page(). > I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned > page is still managed by page buddy alocator. Doing it in check_new_page is the only way if the page is currently allocated by someone. Since that's not uncommon it's simplest to always do it this way. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 15:38 ` Andi Kleen @ 2012-12-11 1:49 ` Simon Jeons 2012-12-11 2:03 ` Andi Kleen 2012-12-11 2:25 ` Xishi Qiu 1 sibling, 1 reply; 38+ messages in thread From: Simon Jeons @ 2012-12-11 1:49 UTC (permalink / raw) To: Andi Kleen Cc: Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency On Mon, 2012-12-10 at 16:38 +0100, Andi Kleen wrote: > > It is another topic, I mean since the page is poisoned, so why not isolate it > > from page buddy alocator in soft_offline_page() rather than in check_new_page(). > > I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned > > page is still managed by page buddy alocator. > > Doing it in check_new_page is the only way if the page is currently > allocated by someone. Since that's not uncommon it's simplest to always > do it this way. Hi Andi, IIUC, soft offlining will isolate and migrate hwpoisoned page, and this page will not be accessed by memory management subsystem until unpoison, correct? -Simon > > -Andi > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 1:49 ` Simon Jeons @ 2012-12-11 2:03 ` Andi Kleen 2012-12-11 2:14 ` Simon Jeons 0 siblings, 1 reply; 38+ messages in thread From: Andi Kleen @ 2012-12-11 2:03 UTC (permalink / raw) To: Simon Jeons Cc: Andi Kleen, Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency > IIUC, soft offlining will isolate and migrate hwpoisoned page, and this > page will not be accessed by memory management subsystem until unpoison, > correct? No, soft offlining can still allow accesses for some time. It'll never kill anything. Hard tries much harder and will kill. In some cases (unshrinkable kernel allocation) they end up doing the same because there isn't any other alternative though. However these are expected to only apply to a small percentage of pages in a typical system. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 2:03 ` Andi Kleen @ 2012-12-11 2:14 ` Simon Jeons 2012-12-11 3:01 ` Andi Kleen 0 siblings, 1 reply; 38+ messages in thread From: Simon Jeons @ 2012-12-11 2:14 UTC (permalink / raw) To: Andi Kleen Cc: Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency On Tue, 2012-12-11 at 03:03 +0100, Andi Kleen wrote: > > IIUC, soft offlining will isolate and migrate hwpoisoned page, and this > > page will not be accessed by memory management subsystem until unpoison, > > correct? > > No, soft offlining can still allow accesses for some time. It'll never kill > anything. Oh, it will be putback to lru list during migration. So does your "some time" mean before call check_new_page? -Simon > > Hard tries much harder and will kill. > > In some cases (unshrinkable kernel allocation) they end up doing the same > because there isn't any other alternative though. However these are > expected to only apply to a small percentage of pages in a typical > system. > > -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 2:14 ` Simon Jeons @ 2012-12-11 3:01 ` Andi Kleen 2012-12-11 3:13 ` Simon Jeons 0 siblings, 1 reply; 38+ messages in thread From: Andi Kleen @ 2012-12-11 3:01 UTC (permalink / raw) To: Simon Jeons Cc: Andi Kleen, Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency > Oh, it will be putback to lru list during migration. So does your "some > time" mean before call check_new_page? Yes until the next check_new_page() whenever that is. If the migration works it will be earlier, otherwise later. -andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 3:01 ` Andi Kleen @ 2012-12-11 3:13 ` Simon Jeons 2012-12-11 3:19 ` Andi Kleen 0 siblings, 1 reply; 38+ messages in thread From: Simon Jeons @ 2012-12-11 3:13 UTC (permalink / raw) To: Andi Kleen Cc: Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote: > > Oh, it will be putback to lru list during migration. So does your "some > > time" mean before call check_new_page? > > Yes until the next check_new_page() whenever that is. If the migration > works it will be earlier, otherwise later. But I can't figure out any page reclaim path check if the page is set PG_hwpoison, can poisoned pages be rclaimed? -Simon > > -andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 3:13 ` Simon Jeons @ 2012-12-11 3:19 ` Andi Kleen 2012-12-11 3:48 ` Simon Jeons 0 siblings, 1 reply; 38+ messages in thread From: Andi Kleen @ 2012-12-11 3:19 UTC (permalink / raw) To: Simon Jeons Cc: Andi Kleen, Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote: > On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote: > > > Oh, it will be putback to lru list during migration. So does your "some > > > time" mean before call check_new_page? > > > > Yes until the next check_new_page() whenever that is. If the migration > > works it will be earlier, otherwise later. > > But I can't figure out any page reclaim path check if the page is set > PG_hwpoison, can poisoned pages be rclaimed? The only way to reclaim a page is to free and reallocate it. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 3:19 ` Andi Kleen @ 2012-12-11 3:48 ` Simon Jeons 2012-12-11 5:55 ` Xishi Qiu 0 siblings, 1 reply; 38+ messages in thread From: Simon Jeons @ 2012-12-11 3:48 UTC (permalink / raw) To: Andi Kleen Cc: Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote: > On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote: > > On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote: > > > > Oh, it will be putback to lru list during migration. So does your "some > > > > time" mean before call check_new_page? > > > > > > Yes until the next check_new_page() whenever that is. If the migration > > > works it will be earlier, otherwise later. > > > > But I can't figure out any page reclaim path check if the page is set > > PG_hwpoison, can poisoned pages be rclaimed? > > The only way to reclaim a page is to free and reallocate it. Then why there doesn't have check in reclaim path to avoid relcaim poisoned page? -Simon > > -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 3:48 ` Simon Jeons @ 2012-12-11 5:55 ` Xishi Qiu 2012-12-11 6:34 ` Wanpeng Li 2012-12-11 6:34 ` Wanpeng Li 0 siblings, 2 replies; 38+ messages in thread From: Xishi Qiu @ 2012-12-11 5:55 UTC (permalink / raw) To: Simon Jeons Cc: Andi Kleen, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency On 2012/12/11 11:48, Simon Jeons wrote: > On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote: >> On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote: >>> On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote: >>>>> Oh, it will be putback to lru list during migration. So does your "some >>>>> time" mean before call check_new_page? >>>> >>>> Yes until the next check_new_page() whenever that is. If the migration >>>> works it will be earlier, otherwise later. >>> >>> But I can't figure out any page reclaim path check if the page is set >>> PG_hwpoison, can poisoned pages be rclaimed? >> >> The only way to reclaim a page is to free and reallocate it. > > Then why there doesn't have check in reclaim path to avoid relcaim > poisoned page? > > -Simon Hi Simon, If the page is free, it will be set PG_hwpoison, and soft_offline_page() is done. When the page is alocated later, check_new_page() will find the poisoned page and isolate the whole buddy block(just drop the block). If the page is not free, soft_offline_page() try to free it first, if this is failed, it will migrate the page, but the page is still in LRU list after migration, migrate_pages() unmap_and_move() if (rc != -EAGAIN) { ... putback_lru_page(page); } We can use lru_add_drain_all() to drain lru pagevec, at last free_hot_cold_page() will be called, and free_pages_prepare() check the poisoned pages. free_pages_prepare() free_pages_check() bad_page() Is this right, Andi? Thanks Xishi Qiu >> >> -Andi > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 5:55 ` Xishi Qiu @ 2012-12-11 6:34 ` Wanpeng Li 2012-12-11 6:34 ` Wanpeng Li 1 sibling, 0 replies; 38+ messages in thread From: Wanpeng Li @ 2012-12-11 6:34 UTC (permalink / raw) To: Xishi Qiu Cc: Simon Jeons, Andi Kleen, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency On Tue, Dec 11, 2012 at 01:55:15PM +0800, Xishi Qiu wrote: >On 2012/12/11 11:48, Simon Jeons wrote: > >> On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote: >>> On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote: >>>> On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote: >>>>>> Oh, it will be putback to lru list during migration. So does your "some >>>>>> time" mean before call check_new_page? >>>>> >>>>> Yes until the next check_new_page() whenever that is. If the migration >>>>> works it will be earlier, otherwise later. >>>> >>>> But I can't figure out any page reclaim path check if the page is set >>>> PG_hwpoison, can poisoned pages be rclaimed? >>> >>> The only way to reclaim a page is to free and reallocate it. >> >> Then why there doesn't have check in reclaim path to avoid relcaim >> poisoned page? >> >> -Simon > >Hi Simon, > >If the page is free, it will be set PG_hwpoison, and soft_offline_page() is done. >When the page is alocated later, check_new_page() will find the poisoned page and >isolate the whole buddy block(just drop the block). > >If the page is not free, soft_offline_page() try to free it first, if this is >failed, it will migrate the page, but the page is still in LRU list after migration, >migrate_pages() > unmap_and_move() > if (rc != -EAGAIN) { > ... > putback_lru_page(page); > } >We can use lru_add_drain_all() to drain lru pagevec, at last free_hot_cold_page() Hi Xishi, I don't understand why you need drain lru pagevec here, if the page has been migrated has all references removed and then it will be freed. The putback_lru_page mentioned above will call put_page free to it. putback_lru_page ->put_page ->__put_single_page ->free_hot_cold_page ->free_page_check ->free_pages_prepare ->free_pages_check ->bad_page Regards, Wanpeng Li >will be called, and free_pages_prepare() check the poisoned pages. >free_pages_prepare() > free_pages_check() > bad_page() > >Is this right, Andi? > >Thanks >Xishi Qiu > >>> >>> -Andi >> >> >> >> > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 5:55 ` Xishi Qiu 2012-12-11 6:34 ` Wanpeng Li @ 2012-12-11 6:34 ` Wanpeng Li 1 sibling, 0 replies; 38+ messages in thread From: Wanpeng Li @ 2012-12-11 6:34 UTC (permalink / raw) To: Xishi Qiu Cc: Simon Jeons, Andi Kleen, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency On Tue, Dec 11, 2012 at 01:55:15PM +0800, Xishi Qiu wrote: >On 2012/12/11 11:48, Simon Jeons wrote: > >> On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote: >>> On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote: >>>> On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote: >>>>>> Oh, it will be putback to lru list during migration. So does your "some >>>>>> time" mean before call check_new_page? >>>>> >>>>> Yes until the next check_new_page() whenever that is. If the migration >>>>> works it will be earlier, otherwise later. >>>> >>>> But I can't figure out any page reclaim path check if the page is set >>>> PG_hwpoison, can poisoned pages be rclaimed? >>> >>> The only way to reclaim a page is to free and reallocate it. >> >> Then why there doesn't have check in reclaim path to avoid relcaim >> poisoned page? >> >> -Simon > >Hi Simon, > >If the page is free, it will be set PG_hwpoison, and soft_offline_page() is done. >When the page is alocated later, check_new_page() will find the poisoned page and >isolate the whole buddy block(just drop the block). > >If the page is not free, soft_offline_page() try to free it first, if this is >failed, it will migrate the page, but the page is still in LRU list after migration, >migrate_pages() > unmap_and_move() > if (rc != -EAGAIN) { > ... > putback_lru_page(page); > } >We can use lru_add_drain_all() to drain lru pagevec, at last free_hot_cold_page() Hi Xishi, I don't understand why you need drain lru pagevec here, if the page has been migrated has all references removed and then it will be freed. The putback_lru_page mentioned above will call put_page free to it. putback_lru_page ->put_page ->__put_single_page ->free_hot_cold_page ->free_page_check ->free_pages_prepare ->free_pages_check ->bad_page Regards, Wanpeng Li >will be called, and free_pages_prepare() check the poisoned pages. >free_pages_prepare() > free_pages_check() > bad_page() > >Is this right, Andi? > >Thanks >Xishi Qiu > >>> >>> -Andi >> >> >> >> > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-10 15:38 ` Andi Kleen 2012-12-11 1:49 ` Simon Jeons @ 2012-12-11 2:25 ` Xishi Qiu 2012-12-11 2:45 ` Fengguang Wu 1 sibling, 1 reply; 38+ messages in thread From: Xishi Qiu @ 2012-12-11 2:25 UTC (permalink / raw) To: Andi Kleen Cc: Simon Jeons, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency, fengguang.wu, Hanjun Guo On 2012/12/10 23:38, Andi Kleen wrote: >> It is another topic, I mean since the page is poisoned, so why not isolate it >> from page buddy alocator in soft_offline_page() rather than in check_new_page(). >> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned >> page is still managed by page buddy alocator. > > Doing it in check_new_page is the only way if the page is currently > allocated by someone. Since that's not uncommon it's simplest to always > do it this way. > > -Andi > Hi Andi, The poisoned page is isolated in check_new_page, however the whole buddy block will be dropped, it seems to be a waste of memory. Can we separate the poisoned page from the buddy block, then *only* drop the poisoned page? Thanks Xishi Qiu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 2:25 ` Xishi Qiu @ 2012-12-11 2:45 ` Fengguang Wu 2012-12-11 2:58 ` Andi Kleen 0 siblings, 1 reply; 38+ messages in thread From: Fengguang Wu @ 2012-12-11 2:45 UTC (permalink / raw) To: Xishi Qiu Cc: Andi Kleen, Simon Jeons, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency, Hanjun Guo On Tue, Dec 11, 2012 at 10:25:00AM +0800, Xishi Qiu wrote: > On 2012/12/10 23:38, Andi Kleen wrote: > > >> It is another topic, I mean since the page is poisoned, so why not isolate it > >> from page buddy alocator in soft_offline_page() rather than in check_new_page(). > >> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned > >> page is still managed by page buddy alocator. > > > > Doing it in check_new_page is the only way if the page is currently > > allocated by someone. Since that's not uncommon it's simplest to always > > do it this way. > > > > -Andi > > > > Hi Andi, > > The poisoned page is isolated in check_new_page, however the whole buddy block will > be dropped, it seems to be a waste of memory. > > Can we separate the poisoned page from the buddy block, then *only* drop the poisoned > page? That sounds like overkill. There are not so many free pages in a typical server system. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 2:45 ` Fengguang Wu @ 2012-12-11 2:58 ` Andi Kleen 2012-12-11 3:25 ` Xishi Qiu 0 siblings, 1 reply; 38+ messages in thread From: Andi Kleen @ 2012-12-11 2:58 UTC (permalink / raw) To: Fengguang Wu Cc: Xishi Qiu, Andi Kleen, Simon Jeons, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency, Hanjun Guo > That sounds like overkill. There are not so many free pages in a > typical server system. As Fengguang said -- memory error handling is tricky. Lots of things could be done in theory, but they all have a cost in testing and maintenance. In general they are only worth doing if the situation is common and represents a significant percentage of the total pages of a relevant server workload. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 2:58 ` Andi Kleen @ 2012-12-11 3:25 ` Xishi Qiu 2012-12-11 3:36 ` Andi Kleen 0 siblings, 1 reply; 38+ messages in thread From: Xishi Qiu @ 2012-12-11 3:25 UTC (permalink / raw) To: Andi Kleen Cc: Fengguang Wu, Simon Jeons, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency, Hanjun Guo On 2012/12/11 10:58, Andi Kleen wrote: >> That sounds like overkill. There are not so many free pages in a >> typical server system. > > As Fengguang said -- memory error handling is tricky. Lots of things > could be done in theory, but they all have a cost in testing and > maintenance. > > In general they are only worth doing if the situation is common and > represents a significant percentage of the total pages of a relevant server > workload. > > -Andi > Hi Andi and Fengguang, "There are not so many free pages in a typical server system", sorry I don't quite understand it. buffered_rmqueue() prep_new_page() check_new_page() bad_page() If we alloc 2^10 pages and one of them is a poisoned page, then the whole 4M memory will be dropped. Thanks, Xishi Qiu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-11 3:25 ` Xishi Qiu @ 2012-12-11 3:36 ` Andi Kleen 0 siblings, 0 replies; 38+ messages in thread From: Andi Kleen @ 2012-12-11 3:36 UTC (permalink / raw) To: Xishi Qiu Cc: Andi Kleen, Fengguang Wu, Simon Jeons, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel, wency, Hanjun Guo > "There are not so many free pages in a typical server system", sorry I don't > quite understand it. Linux tries to keep most memory in caches. As Linus says "free memory is bad memory" > > buffered_rmqueue() > prep_new_page() > check_new_page() > bad_page() > > If we alloc 2^10 pages and one of them is a poisoned page, then the whole 4M > memory will be dropped. prep_new_page() is only called on whatever is allocated. MAX_ORDER is much smaller than 2^10 If you allocate a large order page then yes the complete page is dropped. This is today generally true in hwpoison. It would be one possible area of improvement (probably mostly if 1GB pages become more common than they are today) It's usually not a problem because usually most allocations are small order and systems have generally very few memory errors, and even the largest MAX_ORDER pages are a small fraction of the total memory. If you lose larger amounts of memory usually you quickly hit something that HWPoison cannot handle. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics 2012-12-07 22:11 ` Andrew Morton ` (2 preceding siblings ...) 2012-12-10 8:33 ` Wanpeng Li @ 2012-12-10 8:33 ` Wanpeng Li 3 siblings, 0 replies; 38+ messages in thread From: Wanpeng Li @ 2012-12-10 8:33 UTC (permalink / raw) Cc: WuJianguo, Xishi Qiu, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi, akpm, linux-mm, linux-kernel On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote: >On Fri, 7 Dec 2012 16:48:45 +0800 >Xishi Qiu <qiuxishi@huawei.com> wrote: > >> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a >> free page twice, the value of mce_bad_pages will be added twice. So this is an error, >> since the page was already marked HWPoison, we should skip the page and don't add the >> value of mce_bad_pages. >> >> $ cat /proc/meminfo | grep HardwareCorrupted >> >> soft_offline_page() >> get_any_page() >> atomic_long_add(1, &mce_bad_pages) >> >> ... >> >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags) >> return ret; >> >> done: >> - atomic_long_add(1, &mce_bad_pages); >> - SetPageHWPoison(page); >> /* keep elevated page count for bad page */ >> + if (!PageHWPoison(page)) { >> + atomic_long_add(1, &mce_bad_pages); >> + SetPageHWPoison(page); >> + } >> + >> return ret; >> } > >A few things: > >- soft_offline_page() already checks for this case: > > if (PageHWPoison(page)) { > unlock_page(page); > put_page(page); > pr_info("soft offline: %#lx page already poisoned\n", pfn); > return -EBUSY; > } > > so why didn't this check work for you? > > Presumably because one of the earlier "goto done" branches was > taken. Which one, any why? > > This function is an utter mess. It contains six return points > randomly intermingled with three "goto done" return points. > > This mess is probably the cause of the bug you have observed. Can > we please fix it up somehow? It *seems* that the design (lol) of > this function is "for errors, return immediately. For success, goto > done". In which case "done" should have been called "success". But > if you just look at the function you'll see that this approach didn't > work. I suggest it be converted to have two return points - one for > the success path, one for the failure path. Or something. > >- soft_offline_huge_page() is a miniature copy of soft_offline_page() > and might suffer the same bug. > >- A cleaner, shorter and possibly faster implementation is > > if (!TestSetPageHWPoison(page)) > atomic_long_add(1, &mce_bad_pages); > Hi Andrew, Since hwpoison bit for free buddy page has already be set in get_any_page, !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for the first time. Regards, Wanpeng Li >- We have atomic_long_inc(). Use it? > >- Why do we have a variable called "mce_bad_pages"? MCE is an x86 > concept, and this code is in mm/. Lights are flashing, bells are > ringing and a loudspeaker is blaring "layering violation" at us! > >-- >To unsubscribe, send a message with 'unsubscribe linux-mm' in >the body to majordomo@kvack.org. For more info on Linux MM, >see: http://www.linux-mm.org/ . >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2012-12-11 8:03 UTC | newest] Thread overview: 38+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-07 8:48 [PATCH V2] MCE: fix an error of mce_bad_pages statistics Xishi Qiu 2012-12-07 14:33 ` Borislav Petkov 2012-12-07 22:11 ` Andrew Morton 2012-12-07 22:41 ` Borislav Petkov 2012-12-10 4:33 ` Xishi Qiu 2012-12-10 8:33 ` Wanpeng Li 2012-12-10 9:06 ` Xishi Qiu 2012-12-10 10:47 ` Simon Jeons 2012-12-10 11:16 ` Xishi Qiu 2012-12-10 11:39 ` Wanpeng Li 2012-12-10 11:54 ` Xishi Qiu 2012-12-10 12:11 ` Borislav Petkov 2012-12-10 15:39 ` Andi Kleen 2012-12-10 11:39 ` Wanpeng Li 2012-12-10 11:58 ` Simon Jeons [not found] ` <1355140561.1821.5.camel@kernel.cn.ibm.com> [not found] ` <50C5D844.8050707@huawei.com> 2012-12-10 12:47 ` Simon Jeons 2012-12-11 1:16 ` Wanpeng Li 2012-12-11 1:16 ` Wanpeng Li 2012-12-11 6:49 ` Xishi Qiu 2012-12-11 8:02 ` Wanpeng Li 2012-12-11 8:02 ` Wanpeng Li 2012-12-10 15:38 ` Andi Kleen 2012-12-11 1:49 ` Simon Jeons 2012-12-11 2:03 ` Andi Kleen 2012-12-11 2:14 ` Simon Jeons 2012-12-11 3:01 ` Andi Kleen 2012-12-11 3:13 ` Simon Jeons 2012-12-11 3:19 ` Andi Kleen 2012-12-11 3:48 ` Simon Jeons 2012-12-11 5:55 ` Xishi Qiu 2012-12-11 6:34 ` Wanpeng Li 2012-12-11 6:34 ` Wanpeng Li 2012-12-11 2:25 ` Xishi Qiu 2012-12-11 2:45 ` Fengguang Wu 2012-12-11 2:58 ` Andi Kleen 2012-12-11 3:25 ` Xishi Qiu 2012-12-11 3:36 ` Andi Kleen 2012-12-10 8:33 ` Wanpeng Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).