From: Minchan Kim <minchan@kernel.org>
To: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
surenb@google.com, vbabka@suse.cz, rientjes@google.com,
sfr@canb.auug.org.au, edgararriaga@google.com,
nadav.amit@gmail.com, mhocko@suse.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
"# 5 . 10+" <stable@vger.kernel.org>
Subject: Re: [PATCH V2,2/2] mm: madvise: skip unmapped vma holes passed to process_madvise
Date: Thu, 17 Mar 2022 09:24:01 -0700 [thread overview]
Message-ID: <YjNgoeg1yOocsjWC@google.com> (raw)
In-Reply-To: <5428f192-1537-fa03-8e9c-4a8322772546@quicinc.com>
On Wed, Mar 16, 2022 at 07:49:38PM +0530, Charan Teja Kalla wrote:
> Thanks Andrew and Minchan.
>
> On 3/16/2022 7:13 AM, Minchan Kim wrote:
> > On Tue, Mar 15, 2022 at 04:48:07PM -0700, Andrew Morton wrote:
> >> On Tue, 15 Mar 2022 15:58:28 -0700 Minchan Kim <minchan@kernel.org> wrote:
> >>
> >>> On Fri, Mar 11, 2022 at 08:59:06PM +0530, Charan Teja Kalla wrote:
> >>>> The process_madvise() system call is expected to skip holes in vma
> >>>> passed through 'struct iovec' vector list. But do_madvise, which
> >>>> process_madvise() calls for each vma, returns ENOMEM in case of unmapped
> >>>> holes, despite the VMA is processed.
> >>>> Thus process_madvise() should treat ENOMEM as expected and consider the
> >>>> VMA passed to as processed and continue processing other vma's in the
> >>>> vector list. Returning -ENOMEM to user, despite the VMA is processed,
> >>>> will be unable to figure out where to start the next madvise.
> >>>> Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
> >>>> Cc: <stable@vger.kernel.org> # 5.10+
> >>>
> >>> Hmm, not sure whether it's stable material since it changes semantic of
> >>> API. It would be better to change the semantic from 5.19 with man page
> >>> update to specify the change.
> >>
> >> It's a very desirable change and it makes the code match the manpage
> >> and it's cc:stable. I think we should just absorb any transitory
> >> damage which this causes people. I doubt if there will be much - if
> >> anyone was affected by this they would have already told us that it's
> >> broken?
> >
> >
> > process_madvise fails to return exact processed bytes at several cases
> > if it encounters the error, such as, -EINVAL, -EINTR, -ENOMEM in the
> > middle of processing vmas. And now we are trying to make exception for
> > change for only hole?
> I think EINTR will never return in the middle of processing VMA's for
> the behaviours supported by process_madvise().
>
> It can return EINTR when:
> -------------------------
> 1) PTRACE_MODE_READ is being checked in mm_access() where it is waiting
> on task->signal->exec_update_lock. EINTR returned from here guarantees
> that process_madvise() didn't event start processing.
> https://elixir.bootlin.com/linux/v5.16.14/source/mm/madvise.c#L1264 -->
> https://elixir.bootlin.com/linux/v5.16.14/source/kernel/fork.c#L1318
>
> 2) The process_madvise() started processing VMA's but the required
> behavior on a VMA needs mmap_write_lock_killable(), from where EINTR is
> returned. The current behaviours supported by process_madvise(),
> MADV_COLD, PAGEOUT, WILLNEED, just need read lock here.
> https://elixir.bootlin.com/linux/v5.16.14/source/mm/madvise.c#L1164
> **Thus I think no way for EINTR can be returned by process_madvise() in
> the middle of processing.** . No?
>
> for EINVAL:
> -----------
> The only case, I can think of, where EINVAL can be returned in the
> middle of processing is in examples like, given range contains VMA's
> with a hole in between and one of the VMA contains the pages that fails
> can_madv_lru_vma() condition.
> So, it's a limitation that this returns -EINVAL though some bytes are
> processed.
> OR
> Since there exists still some invalid bytes processed it is valid to
> return -EINVAL here and user has to check the address range sent?
>
> for ENOMEM:
> ----------
> Though complete range is processed still returns ENOMEM. IMO, This
> shouldn't be treated as error which the patch is targeted for. Then
> there is limitation case that you mentioned below where it returns
> positive processes bytes even though it didn't process anything if it
> couldn't find any vma for the first iteration in madvise_walk_vmas
>
> I think the above limitations with EINVAL and ENOMEM are arising because
> we are relying on do_madvise() functionality which madvise() call uses
> to process a single VMA. When 'struct iovec' vector processing interface
> is given in a system call, it is the expectation by the caller that this
> system call should return the correct bytes processed to help the user
> to take the correct decisions. Please correct me If i am wrong here.
>
> So, should we add the new function say do_process_madvise(), which take
> cares of above limitations? or any alternative suggestions here please?
What I am thinking now is that the process_madvise needs own iterator(i.e.,
do_process_madvise) and it should represent exact bytes it addressed with
exacts ranges like process_vm_readv/writev. Poviding valid ranges is
responsiblity from the user.
>
> > IMO, it's worth to note in man page.
> >
>
> Or the current patch for just ENOMEM is sufficient here and we just have
> to update the man page?
>
> > In addition, this change returns positive processes bytes even though
> > it didn't process anything if it couldn't find any vma for the first
> > iteration in madvise_walk_vmas.
>
> Thanks,
> Charan
>
next prev parent reply other threads:[~2022-03-17 16:24 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-11 15:29 [PATCH V2,0/2]mm: madvise: return correct bytes processed with process_madvise Charan Teja Kalla
2022-03-11 15:29 ` [PATCH V2,1/2] mm: madvise: return correct bytes advised " Charan Teja Kalla
2022-03-15 22:20 ` Minchan Kim
2022-03-21 15:18 ` Michal Hocko
2022-03-11 15:29 ` [PATCH V2,2/2] mm: madvise: skip unmapped vma holes passed to process_madvise Charan Teja Kalla
2022-03-15 22:58 ` Minchan Kim
2022-03-15 23:48 ` Andrew Morton
2022-03-16 1:43 ` Minchan Kim
2022-03-16 14:19 ` Charan Teja Kalla
2022-03-16 21:29 ` Andrew Morton
2022-03-17 16:28 ` Minchan Kim
2022-03-17 16:53 ` Suren Baghdasaryan
2022-03-17 20:38 ` Nadav Amit
2022-03-18 14:05 ` Charan Teja Kalla
2022-03-18 15:37 ` Minchan Kim
2022-03-17 16:24 ` Minchan Kim [this message]
2022-03-21 15:02 ` Michal Hocko
2022-03-22 5:19 ` Charan Teja Kalla
2022-03-21 15:34 ` Michal Hocko
2022-03-22 7:10 ` Charan Teja Kalla
2022-03-22 8:40 ` Michal Hocko
2022-03-11 21:42 ` [PATCH V2,0/2]mm: madvise: return correct bytes processed with process_madvise Andrew Morton
2022-03-15 14:26 ` Charan Teja Kalla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YjNgoeg1yOocsjWC@google.com \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=edgararriaga@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=nadav.amit@gmail.com \
--cc=quic_charante@quicinc.com \
--cc=rientjes@google.com \
--cc=sfr@canb.auug.org.au \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.