From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yang Shi Subject: Re: mbind() breaks its API definition since v5.2 by commit d883544515aa (mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified) Date: Tue, 29 Oct 2019 21:32:54 -0700 Message-ID: References: <2019103010274679257634@gmail.com> <2019103011122763779044@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Return-path: In-Reply-To: <2019103011122763779044@gmail.com> Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org To: Li Xinhai , "linux-mm@kvack.org" , akpm , torvalds Cc: Vlastimil Babka , Linux API , Michal Hocko , Hugh Dickins , "linux-kernel@vger.kernel.org" , lixinhai_lxh List-Id: linux-api@vger.kernel.org On 10/29/19 8:12 PM, Li Xinhai wrote: > On 2019-10-30 at 10:50 Yang Shi wrote: >> >> On 10/29/19 7:27 PM, Li Xinhai wrote: >>> One change in do_mbind() of this commit has suspicious usage of return value of >>> queue_pages_range(), excerpt as below: >>> >>> --- >>> @@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start, unsigned long len, >>>     if (err) >>>     goto mpol_out; >>> >>> - err = queue_pages_range(mm, start, end, nmask, >>> + ret = queue_pages_range(mm, start, end, nmask, >>>      flags | MPOL_MF_INVERT, &pagelist); >>> - if (!err) >>> - err = mbind_range(mm, start, end, new); >>> + >>> + if (ret < 0) {      /////// convert to all possible 'ret' to '-EIO' <<<< >>> + err = -EIO; >>> + goto up_out; >>> + } >>> + >>> + err = mbind_range(mm, start, end, new); >>> >>>     if (!err) { >>>     int nr_failed = 0; >>> --- >>> >>> Note that inside queue_pages_range(), the call to walk_page_range() may return >>> errors from 'test_walk' of 'struct mm_walk_ops', e.g. -EFAULT. Now, those error >>> codes are no longer reported to user space application. >>> >>>   From user space, the mbind() call need to reported error, with EFAULT, as example: >>> EFAULT >>> Part or all of the memory range specified by nodemask and maxnode points >>> outside your accessible address space. Or, there was an unmapped hole in the >>> specified memory range specified by addr and len. >> Thanks for catching this. That commit was aimed to correct the return >> values for some corner cases in mbind(), but it should not alter the >> errno for other failure cases, i.e. -EFAULT. >> >> Could you please try the below patch (build test only)? >> >> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >> index 4ae967b..99df43a 100644 >> --- a/mm/mempolicy.c >> +++ b/mm/mempolicy.c >> @@ -1286,7 +1286,7 @@ static long do_mbind(unsigned long start, unsigned >> long len, >>                           flags | MPOL_MF_INVERT, &pagelist); >> >>         if (ret < 0) { >> -               err = -EIO; >> +               err = ret; >>                 goto up_out; >>         } >> >> > This seems do not work, because the 'pagelist' would have some pages queued > into it, need to put back those pages instead of return quickly. > > So, we need to remove this page leak as well. <<<<<< > > In my understanding, revert the changes as I quoted above may solve it, but not sure > the details about changes at end of do_mbind(), should keep them at there without > further change? Thanks for pointing this out. We don't have to revert this commit to handle the non-empty pagelist correctly. The simplest way is to just put those pages back and I'm supposed this is also the preferred way since mbind_range() is not called to really apply the policy so those pages should not be migrated. The below patch should solve this: diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4ae967b..d80025c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1286,7 +1286,10 @@ static long do_mbind(unsigned long start, unsigned long len,                           flags | MPOL_MF_INVERT, &pagelist);         if (ret < 0) { -               err = -EIO; +               if (!list_empty(&pagelist)) +                       putback_movable_pages(&pagelist); + +               err = ret;                 goto up_out;         } > > - Xinhai > >>> Please correct me if this is the intended change(and will have updated API >>> definition), or something was misunderstood. >>> >>> -Xinhai > >