All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
To: ltp@lists.linux.it
Subject: [LTP] [PATCH RFC] move_pages12: handle errno EBUSY for madvise(..., MADV_SOFT_OFFLINE)
Date: Thu, 27 Jun 2019 10:50:30 +0800	[thread overview]
Message-ID: <5D142EF6.6030402@cn.fujitsu.com> (raw)
In-Reply-To: <CAEemH2c+CWAOmAH=1WT+GR-iZ=5RoDcCmD=-zBpc63PHg6xXyQ@mail.gmail.com>

on 2019/06/24 10:43, Li Wang wrote:

> Hi Xu Yang,
>
> On Fri, Jun 21, 2019 at 1:58 PM Yang Xu <xuyang2018.jy@cn.fujitsu.com 
> <mailto:xuyang2018.jy@cn.fujitsu.com>> wrote:
>
>
>     > Hi Li Wang,
>     >
>     > Thank you for maintaining the testcase.
>     >
>     > Recently (since 4.19) we have a semantics change on the return
>     value of
>     > madvise(MADV_SOFT_OFFLINE), and we see -EBUSY when hugepage
>     migration
>     > succeeded and error containment failed:
>     >
>     >   commit 6bc9b56433b76e40d11099338d27fbc5cd2935ca
>     >   Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com
>     <mailto:n-horiguchi@ah.jp.nec.com>>
>     >   Date:   Thu Aug 23 17:00:38 2018 -0700
>     >
>     >       mm: fix race on soft-offlining free huge pages
>     >
>     > , so we don't have to consider this EBUSY as error, but a good
>     report
>     > for application. Your change meets the change.
>     >
>     > Feel free to add my ack:
>     >
>     > Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com
>     <mailto:n-horiguchi@ah.jp.nec.com>>
>     >
>     > Thanks,
>     > - Naoya
>     >
>     > On Fri, Jun 07, 2019 at 05:52:13PM +0800, Li Wang wrote:
>     >> The test#2 is going to simulate the race condition, where
>     move_pages()
>     >> and soft offline are called on a single hugetlb page
>     concurrently. But,
>     >> it return EBUSY and report FAIL in soft-offline a moving
>     hugepage as a
>     >> result sometimes.
>     >>
>     >> The root cause seems a call to page_huge_active return false,
>     then the
>     >> soft offline action will failed to isolate hugepage with EBUSY
>     return as
>     >> below call trace:
>     >>
>     >> In Parent:
>     >>   madvise(..., MADV_SOFT_OFFLINE)
>     >>   ...
>     >>     soft_offline_page
>     >>       soft_offline_in_use_page
>     >>         soft_offline_huge_page
>     >>           isolate_huge_page
>     >>             page_huge_active  --> return false at here
>     >>
>     >> In Child:
>     >>   move_pages()
>     >>   ...
>     >>     do_move_pages
>     >>       do_move_pages_to_node
>     >>         add_page_for_migration
>     >>           isolate_huge_page   --> it has already isolated the
>     hugepage
>     >>
>     >> In this patch, I simply regard the returned EBUSY as a normal
>     situation and
>     >> mask it in error handler. Because move_pages is calling
>     add_page_for_migration
>     >> to isolate hugepage before do migration, so that's very
>     possible to hit the
>     >> collision and return EBUSY on the same page.
>     >>
>     >> Error log:
>     >> ----------
>     >> move_pages12.c:235: INFO: Free RAM 8386256 kB
>     >> move_pages12.c:253: INFO: Increasing 2048kB hugepages pool on
>     node 0 to 4
>     >> move_pages12.c:263: INFO: Increasing 2048kB hugepages pool on
>     node 1 to 6
>     >> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on
>     node 0
>     >> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on
>     node 1
>     >> move_pages12.c:169: PASS: Bug not reproduced
>     >> move_pages12.c:81: FAIL: madvise failed: SUCCESS
>     >> move_pages12.c:81: FAIL: madvise failed: SUCCESS
>     >> move_pages12.c:143: BROK: mmap((nil),4194304,3,262178,-1,0)
>     failed: ENOMEM
>     >> move_pages12.c:114: FAIL: move_pages failed: EINVAL
>     >>
>     >> Dmesg:
>     >> ------
>     >> [165435.492170] soft offline: 0x61c00 hugepage failed to isolate
>     >> [165435.590252] soft offline: 0x61c00 hugepage failed to isolate
>     >> [165435.725493] soft offline: 0x61400 hugepage failed to isolate
>     >>
>     >> Other two fixes in this patch:
>     >>  * use TERRNO(but not TTERRNO) to catch madvise(...,
>     MADV_SOFT_OFFLINE) errno
>     >>  * go out test when hugepage allocating failed with ENOMEM
>     Hi Li
>
>     Your patch can handle EBUSY errno correctly for soft offline.
>     But move page  may be killed by SIGBUS because of  MCE  when we
>     soft offline concurrently.
>     That leads to move_page failed with ESRCH.   Also, move page may
>     fails with ENOMEM .
>     Do you notice it ?
>
>
> I didn't get this failure, it seems not related to this patch. Two 
> questions:
>
> 1. which kernel version do you test?
> 2. can you reproduce this without my patch?
Hi Li

I test it on 3.10.0-957.el7.x86_64  kvm(my machine was not support numa 
and i enable it on kvm. as below:
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>Penryn</model>
<feature policy='require' name='x2apic'/>
<feature policy='require' name='hypervisor'/>
<numa>
<cell id='0' cpus='0' memory='1048576' unit='KiB'/>
<cell id='1' cpus='1' memory='1048576' unit='KiB'/>
</numa>
</cpu>

Does it only exist on kvm and doesn't  exist on physical machine?  I 
don't have physical machine that supports numa.

And the fix patch has been merged since  3.10.0-957.el7.x86_64 .
Yes,   I can reproduce this without your patch because MCE kills child 
process and move_page gets ESRCH error.


>
>
>     I think ESRCH error can represent the soft offline bug not
>     reproduce because it don't trigger a crash.
>     What do you think about it?
>
>
> Maybe, but it needs to check details on your kernel.
>
>
>     err_log:
>     tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
>     move_pages12.c:236: INFO: Free RAM 119568 kB
>     move_pages12.c:254: INFO: Increasing 2048kB hugepages pool on node
>     0 to 83
>     move_pages12.c:264: INFO: Increasing 2048kB hugepages pool on node
>     1 to 94
>     move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 0
>     move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 1
>     move_pages12.c:170: PASS: Bug not reproduced
>     tst_test.c:1141: BROK: Test killed by SIGBUS!
>
>     Summary:
>     passed   1
>     failed   0
>     skipped  0
>     warnings 0
>
>     move_pages12.c:114: FAIL: move_pages failed: ESRCH
>
>     dmesg
>     [ 9868.180669] MCE: Killing move_pages12:29616 due to hardware
>     memory corruption fault at 2aaaaac00018
>     [ 9990.049875] Soft offlining page 50e00 at 2aaaaac00000
>     [ 9990.052218] Soft offlining page 50c00 at 2aaaaae00000
>     [ 9990.060395] Soft offlining page 51000 at 2aaaaac00000
>
>     Kind Regards,
>     Yang Xu
>
>     >> Signed-off-by: Li Wang <liwang@redhat.com
>     <mailto:liwang@redhat.com>>
>     >> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com
>     <mailto:n-horiguchi@ah.jp.nec.com>>
>     >> Cc: Xiao Yang <yangx.jy@cn.fujitsu.com
>     <mailto:yangx.jy@cn.fujitsu.com>>
>     >> Cc: Yang Xu <xuyang2018.jy@cn.fujitsu.com
>     <mailto:xuyang2018.jy@cn.fujitsu.com>>
>     >> ---
>     >>  .../kernel/syscalls/move_pages/move_pages12.c | 33
>     ++++++++++++++-----
>     >>  1 file changed, 24 insertions(+), 9 deletions(-)
>     >>
>     >> diff --git
>     a/testcases/kernel/syscalls/move_pages/move_pages12.c
>     b/testcases/kernel/syscalls/move_pages/move_pages12.c
>     >> index 964b712fb..c446396dc 100644
>     >> --- a/testcases/kernel/syscalls/move_pages/move_pages12.c
>     >> +++ b/testcases/kernel/syscalls/move_pages/move_pages12.c
>     >> @@ -77,8 +77,8 @@ static void *addr;
>     >>  static int do_soft_offline(int tpgs)
>     >>  {
>     >>      if (madvise(addr, tpgs * hpsz, MADV_SOFT_OFFLINE) == -1) {
>     >> -            if (errno != EINVAL)
>     >> -                    tst_res(TFAIL | TTERRNO, "madvise failed");
>     >> +            if (errno != EINVAL && errno != EBUSY)
>     >> +                    tst_res(TFAIL | TERRNO, "madvise failed");
>     >>              return errno;
>     >>      }
>     >>      return 0;
>     >> @@ -121,7 +121,8 @@ static void do_child(int tpgs)
>     >>
>     >>  static void do_test(unsigned int n)
>     >>  {
>     >> -    int i;
>     >> +    int i, ret;
>     >> +    void *ptr;
>     >>      pid_t cpid = -1;
>     >>      int status;
>     >>      unsigned int twenty_percent = (tst_timeout_remaining() / 5);
>     >> @@ -136,24 +137,37 @@ static void do_test(unsigned int n)
>     >>              do_child(tcases[n].tpages);
>     >>
>     >>      for (i = 0; i < LOOPS; i++) {
>     >> -            void *ptr;
>     >> +            ptr = mmap(NULL, tcases[n].tpages * hpsz,
>     >> +                            PROT_READ | PROT_WRITE,
>     >> +                            MAP_PRIVATE | MAP_ANONYMOUS |
>     MAP_HUGETLB, -1, 0);
>     >> +            if (ptr == MAP_FAILED) {
>     >> +                    if (errno == ENOMEM) {
>     >> +                            tst_res(TCONF,
>     >> +                                    "Cannot allocate hugepage,
>     memory too fragmented?");
>     >> +                            goto out;
>     >> +                    }
>     >> +
>     >> +                    tst_brk(TBROK | TERRNO, "Cannot allocate
>     hugepage");
>     >> +            }
>     >>
>     >> -            ptr = SAFE_MMAP(NULL, tcases[n].tpages * hpsz,
>     >> -                    PROT_READ | PROT_WRITE,
>     >> -                    MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
>     -1, 0);
>     >>              if (ptr != addr)
>     >>                      tst_brk(TBROK, "Failed to mmap at desired
>     addr");
>     >>
>     >>              memset(addr, 0, tcases[n].tpages * hpsz);
>     >>
>     >>              if (tcases[n].offline) {
>     >> -                    if (do_soft_offline(tcases[n].tpages) ==
>     EINVAL) {
>     >> +                    ret = do_soft_offline(tcases[n].tpages);
>     >> +
>     >> +                    if (ret == EINVAL) {
>     >>                              SAFE_KILL(cpid, SIGKILL);
>     >>                              SAFE_WAITPID(cpid, &status, 0);
>     >>                              SAFE_MUNMAP(addr, tcases[n].tpages
>     * hpsz);
>     >>                              tst_res(TCONF,
>     >>                                      "madvise() didn't support
>     MADV_SOFT_OFFLINE");
>     >>                              return;
>     >> +                    } else if (ret == EBUSY) {
>     >> +                            SAFE_MUNMAP(addr, tcases[n].tpages
>     * hpsz);
>     >> +                            goto out;
>     >>                      }
>     >>              }
>     >>
>     >> @@ -163,9 +177,10 @@ static void do_test(unsigned int n)
>     >>                      break;
>     >>      }
>     >>
>     >> +out:
>     >>      SAFE_KILL(cpid, SIGKILL);
>     >>      SAFE_WAITPID(cpid, &status, 0);
>     >> -    if (!WIFEXITED(status))
>     >> +    if (!WIFEXITED(status) && ptr != MAP_FAILED)
>     >>              tst_res(TPASS, "Bug not reproduced");
>     >>  }
>     >>
>     >> --
>     >> 2.20.1
>     >>
>     >>
>     >
>     > .
>     >
>
>
>
>
>
> -- 
> Regards,
> Li Wang



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.it/pipermail/ltp/attachments/20190627/b8863f73/attachment-0001.htm>

  reply	other threads:[~2019-06-27  2:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-07  9:52 [LTP] [PATCH RFC] move_pages12: handle errno EBUSY for madvise(..., MADV_SOFT_OFFLINE) Li Wang
2019-06-10  3:27 ` Naoya Horiguchi
2019-06-21  5:58   ` Yang Xu
2019-06-24  2:43     ` Li Wang
2019-06-27  2:50       ` Yang Xu [this message]
2019-07-04  3:29         ` Li Wang
2019-07-03 13:10 ` Cyril Hrubis
2019-07-04  5:48   ` Li Wang
2019-07-04  6:23     ` Li Wang
2019-07-10 12:59       ` Cyril Hrubis
2019-07-15  9:33         ` Li Wang
2019-07-24  9:33           ` Li Wang
2019-07-24 10:46             ` Cyril Hrubis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5D142EF6.6030402@cn.fujitsu.com \
    --to=xuyang2018.jy@cn.fujitsu.com \
    --cc=ltp@lists.linux.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.