linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Dan Carpenter <dan.carpenter@linaro.org>,
	linux-mm@kvack.org,  Andrew Morton <akpm@linux-foundation.org>,
	"Liam R. Howlett" <liam.howlett@oracle.com>,
	 Laurent Dufour <ldufour@linux.ibm.com>,
	Michel Lespinasse <michel@lespinasse.org>,
	 Jerome Glisse <jglisse@google.com>,
	Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Peter Xu <peterx@redhat.com>,
	 Dimitri Sivanich <dimitri.sivanich@hpe.com>,
	Mike Travis <mike.travis@hpe.com>,
	 Steve Wahl <steve.wahl@hpe.com>
Subject: Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
Date: Wed, 12 Jul 2023 08:01:18 -0700	[thread overview]
Message-ID: <CAJuCfpHSTCiCT_AWkFxSSjgu0r94ogRDcFotXKjTw0gLDjOsCg@mail.gmail.com> (raw)
In-Reply-To: <b2fa01d9-4a7c-a349-10c3-4461ad18e7e7@redhat.com>

On Wed, Jul 12, 2023 at 12:35 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 12.07.23 01:45, Suren Baghdasaryan wrote:
> > On Tue, Jul 11, 2023 at 3:21 PM Matthew Wilcox <willy@infradead.org> wrote:
> >>
> >> On Tue, Jul 11, 2023 at 02:55:20PM -0700, Suren Baghdasaryan wrote:
> >>> On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@linaro.org> wrote:
> >>>>
> >>>> Hello Suren Baghdasaryan,
> >>>>
> >>>> The patch 1c71222e5f23: "mm: replace vma->vm_flags direct
> >>>> modifications with modifier calls" from Jan 26, 2023, leads to the
> >>>> following Smatch static checker warning:
> >>>>
> >>>>          ./include/linux/mm.h:729 vma_start_write()
> >>>>          warn: sleeping in atomic context
> >>>>
> >>>> include/linux/mm.h
> >>>>      722 static inline void vma_start_write(struct vm_area_struct *vma)
> >>>>      723 {
> >>>>      724         int mm_lock_seq;
> >>>>      725
> >>>>      726         if (__is_vma_write_locked(vma, &mm_lock_seq))
> >>>>      727                 return;
> >>>>      728
> >>>> --> 729         down_write(&vma->vm_lock->lock);
> >>>>      730         vma->vm_lock_seq = mm_lock_seq;
> >>>>      731         up_write(&vma->vm_lock->lock);
> >>>>      732 }
> >>>>
> >>>> The call tree is:
> >>>>
> >>>> gru_fault() <- disables preempt
> >>>> -> remap_pfn_range()
> >>>>     -> track_pfn_remap()
> >>>>     -> remap_pfn_range_notrack()
> >>>>        -> vm_flags_set()
> >>>>           -> vma_start_write()
> >>>>
> >>>> Before track_pfn_remap() and remap_pfn_range_notrack() would just do |=
> >>>> to set the flags but now they use vm_flags_set() so there is a potential
> >>>> they could sleep.
> >>>
> >>> Hi Dan,
> >>> Thanks for reporting! Looks like the page fault handler is modifying
> >>> the VMA flags, which has to be done under write-locked mmap_lock and I
> >>> don't see that being done here... I wonder if that should be allowed.
> >>> I'm CC'ing some MM folks to check if this is a valid VMA modification
> >>> and should be allowed. Matthew, this might be especially interesting
> >>> for you since gru_fault() handles file-backed page faults AFAIKT.
> >>
> >> I don't run the ->fault handler under RCU, only the ->map_pages()
> >> method.  I don't intend to change that.
> >>
> >>> Back to the issue at hand. If such modification should be indeed
> >>> allowed then the simplest fix I think would be to add new
> >>> remap_pfn_range_locked() function to be called from gru_fault() which
> >>> would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod()
> >>> does not lock the VMA, so would not have this issue. If the conclusion
> >>> is that this is a valid scenario then I can post a fix I described.
> >>
> >> I'm not certain, but calling remap_pfn_range() in the fault handler
> >> is definitely weird.  It's normally called _instead_ of having a fault
> >> handler.  The fault handler usually calls set_pte_at() directly.
> >
> > Hmm. Is it weird enough to be considered invalid or weird but still ok?
> > Also, is it ok to modify VMA flags here without write-locking the
> > mmap_lock (and without write-locking the VMA)? The fault handler is
> > done under read-locked mmap_lock but I thought VMA modifications
> > require stronger locking...
> >
>
> The "easy" fix would be to have something like "remap_pfn_range_prepare"
> that the remap_pfn_range() caller calls during mmap().

Are you suggesting to break remap_pfn_range() into two stages
(remap_pfn_range_prepare() then remap_pfn_range())?
If so, there are many places remap_pfn_range() is called and IIUC all
of them would need to use that 2-stage approach (lots of code churn).
In addition, this is an exported function, so many more drivers might
expect the current behavior.
My suggestion was to have another function called smth like
remap_pfn_range_nolock() and internally use the same code with an
additional flag that would tell us whether we should lock the vma or
not. Such change should be quite simple and small.

>
> We can let that one set these flags, and then we can later let
> remap_pfn_range() fail if the relevant flags are not set.
>
> It's certainly one of these "always done like that and suboptimal but
> somehow it worked" thingies.
>
> I suspect, because only the first pagefault->remap_pfn_range() will
> actually modify the VMA flags, that this is ok. Otherwise, there usually
> wouldn't be any pagetables and nothing mapped, so who really cares about
> these VMA flags (e.g., GUP cannot pin anything if nothing is mapped).

Is my understanding correct that the reasoning remap_pfn_range() is
allowed here without write-locking the VMA (or write-locking
mmap_lock) is because it's done only if there are no pre-existing
patables?
Thanks,
Suren.

>
> --
> Cheers,
>
> David / dhildenb
>


  reply	other threads:[~2023-07-12 15:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-11  7:21 [bug report] mm: replace vma->vm_flags direct modifications with modifier calls Dan Carpenter
2023-07-11 21:55 ` Suren Baghdasaryan
2023-07-11 22:21   ` Matthew Wilcox
2023-07-11 23:45     ` Suren Baghdasaryan
2023-07-12  7:35       ` David Hildenbrand
2023-07-12 15:01         ` Suren Baghdasaryan [this message]
2023-07-12 15:52           ` Matthew Wilcox
2023-07-12 15:55             ` David Hildenbrand
2023-07-12 16:03               ` Suren Baghdasaryan
2023-07-12 18:49               ` Matthew Wilcox
2023-07-12 18:52                 ` David Hildenbrand
2023-07-12 19:48                   ` Suren Baghdasaryan
2023-07-17  6:13                     ` Yan Zhao
2023-07-17 16:18                       ` Suren Baghdasaryan
2023-07-18  0:27                         ` Yan Zhao
2023-07-18 16:27                           ` Suren Baghdasaryan
2023-07-12 19:34                 ` Matthew Wilcox
2023-07-17 19:54                   ` Dimitri Sivanich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpHSTCiCT_AWkFxSSjgu0r94ogRDcFotXKjTw0gLDjOsCg@mail.gmail.com \
    --to=surenb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.carpenter@linaro.org \
    --cc=david@redhat.com \
    --cc=dimitri.sivanich@hpe.com \
    --cc=hannes@cmpxchg.org \
    --cc=jglisse@google.com \
    --cc=ldufour@linux.ibm.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=michel@lespinasse.org \
    --cc=mike.travis@hpe.com \
    --cc=peterx@redhat.com \
    --cc=steve.wahl@hpe.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).