From: Oleg Nesterov <oleg@redhat.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Will Deacon <will.deacon@arm.com>,
hpa@zytor.com, luto@amacapital.net, dave.hansen@linux.intel.com,
mingo@elte.hu, minchan@kernel.org, tglx@linutronix.de,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: LTP regressions due to 6dc296e7df4c ("mm: make sure all file VMAs have ->vm_ops set")
Date: Tue, 15 Sep 2015 17:13:18 +0200 [thread overview]
Message-ID: <20150915151318.GA15866@redhat.com> (raw)
In-Reply-To: <20150915134216.GA16093@node.dhcp.inet.fi>
On 09/15, Kirill A. Shutemov wrote:
>
> On Tue, Sep 15, 2015 at 02:12:01PM +0200, Oleg Nesterov wrote:
> > On 09/14, Kirill A. Shutemov wrote:
> > >
> > > On Mon, Sep 14, 2015 at 07:05:47PM +0200, Oleg Nesterov wrote:
> > > > On 09/14, Kirill A. Shutemov wrote:
> > > > >
> > > > > Fix is below. I don't really like it, but I cannot find any better
> > > > > solution.
> > > >
> > > > Me too...
> > > >
> > > > But this change "documents" the nasty special "vm_file && !vm_ops" case, and
> > > > I am not sure how we can remove it later...
> > > >
> > > > So perhaps we should change vma_is_anonymous() back to check ->fault too,
> > > >
> > > > static inline bool vma_is_anonymous(struct vm_area_struct *vma)
> > > > {
> > > > - return !vma->vm_ops;
> > > > + return !vma->vm_ops || !vma->vm_ops->fault;
> > >
> > > No. This would give a lot false positives from drives which setup page
> > > tables upfront and don't use ->fault at all.
> >
> > And? I mean, I am not sure I understand what exactly do you dislike.
> >
> > Firstly, I still think that (in the long term) we should change them
> > to use .faul = no_fault() which just returns VM_FAULT_SIGBUS.
>
> I would rather like to see consolidated fault path between file and anon
> with ->vm_ops set for both. So vma_is_anonymous() will be trivial
> vma->vm_ops == anon_vm_ops.
I too thought about this. Perhaps but I guess this needs another
discussion.
In particular I am not sure we should just rely on vm_ops == anon_vm_ops.
Again, it is not that I think that the VM_MPX check in arch_vma_name() is
that bad. Still I think it would be better if mpx_mmap() could install
vma->vm_ops = mpx_vm_ops with ->name(). So perhaps ->anon_fault() makes
more sense. But lets not discuss this right now.
>
> > Until then I do not see why the change above can be really bad. The
> > VM_SHARED case is fine, do_anonymous_page() will return VM_FAULT_SIGBUS.
> >
> > So afaics the only problem is that after the change above the private
> > mapping can silently get an anonymous page after (say) MADV_DONTNEED
> > instead of the nice SIGBUS from do_fault(). I agree, this is not good,
> > but see above.
>
> So, what the point to introduce vma_is_anonymous() if it often produces
> false result? vma_is_anonymous_or_maybe_not()?
Heh.
Then what the point to demand that "All file mapping must have ->vm_ops set"
if mmap(MAP_PRIVATE, "/dev/zero") has ->vm_ops == NULL ? Because this is
not actually the file mapping, yes. And this is why we want vma_is_anonymous()
to return T in this case.
vma_is_anonymous() just says that a page fault will use do_anonymous_page().
I agree, it would be nice to ensure vma_is_anonymous() can only be true
if this vma can only have the anon pages. Let me repeat that I suggested
this change as a short-term fix (at least without other changes like we
discuss above). Because the mmap_zero() hack looks worse to me. Damn, even
the ugly hack below looks better to me.
> > Whether we need to keep the vm_ops/fault check in __vma_link_rb() and
> > mmap_region() is another issue. But if we keep them, then I think we
> > should at least turn the !vma->vm_ops check in mmap_region into
> > WARN_ON() as well.
>
> It would require first fix all known cases where ->f_op->mmap() returns
> vma->vm_ops == NULL. Not subject for 4.3, I think.
Kirill, I even sent you the private email to clarify that - of course! -
I only meant "in the longer term" ;)
Oleg.
--- x/include/linux/mm.h
+++ x/include/linux/mm.h
@@ -1289,9 +1289,11 @@ static inline int vma_growsdown(struct v
return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN);
}
+#define xxx_fault ((void*)1)
+
static inline bool vma_is_anonymous(struct vm_area_struct *vma)
{
- return !vma->vm_ops;
+ return !vma->vm_ops || vma->vm_ops->fault == xxx_fault;
}
static inline int stack_guard_page_start(struct vm_area_struct *vma,
--- x/drivers/char/mem.c
+++ x/drivers/char/mem.c
@@ -653,11 +653,17 @@ static ssize_t read_iter_zero(struct kio
static int mmap_zero(struct file *file, struct vm_area_struct *vma)
{
+ static const struct vm_operations_struct xxx_ops = {
+ .fault = xxx_fault,
+ };
+ }
#ifndef CONFIG_MMU
return -ENOSYS;
#endif
if (vma->vm_flags & VM_SHARED)
return shmem_zero_setup(vma);
+
+ vma->vm_ops = &xxx_ops;
return 0;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Oleg Nesterov <oleg@redhat.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Will Deacon <will.deacon@arm.com>,
hpa@zytor.com, luto@amacapital.net, dave.hansen@linux.intel.com,
mingo@elte.hu, minchan@kernel.org, tglx@linutronix.de,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: LTP regressions due to 6dc296e7df4c ("mm: make sure all file VMAs have ->vm_ops set")
Date: Tue, 15 Sep 2015 17:13:18 +0200 [thread overview]
Message-ID: <20150915151318.GA15866@redhat.com> (raw)
In-Reply-To: <20150915134216.GA16093@node.dhcp.inet.fi>
On 09/15, Kirill A. Shutemov wrote:
>
> On Tue, Sep 15, 2015 at 02:12:01PM +0200, Oleg Nesterov wrote:
> > On 09/14, Kirill A. Shutemov wrote:
> > >
> > > On Mon, Sep 14, 2015 at 07:05:47PM +0200, Oleg Nesterov wrote:
> > > > On 09/14, Kirill A. Shutemov wrote:
> > > > >
> > > > > Fix is below. I don't really like it, but I cannot find any better
> > > > > solution.
> > > >
> > > > Me too...
> > > >
> > > > But this change "documents" the nasty special "vm_file && !vm_ops" case, and
> > > > I am not sure how we can remove it later...
> > > >
> > > > So perhaps we should change vma_is_anonymous() back to check ->fault too,
> > > >
> > > > static inline bool vma_is_anonymous(struct vm_area_struct *vma)
> > > > {
> > > > - return !vma->vm_ops;
> > > > + return !vma->vm_ops || !vma->vm_ops->fault;
> > >
> > > No. This would give a lot false positives from drives which setup page
> > > tables upfront and don't use ->fault at all.
> >
> > And? I mean, I am not sure I understand what exactly do you dislike.
> >
> > Firstly, I still think that (in the long term) we should change them
> > to use .faul = no_fault() which just returns VM_FAULT_SIGBUS.
>
> I would rather like to see consolidated fault path between file and anon
> with ->vm_ops set for both. So vma_is_anonymous() will be trivial
> vma->vm_ops == anon_vm_ops.
I too thought about this. Perhaps but I guess this needs another
discussion.
In particular I am not sure we should just rely on vm_ops == anon_vm_ops.
Again, it is not that I think that the VM_MPX check in arch_vma_name() is
that bad. Still I think it would be better if mpx_mmap() could install
vma->vm_ops = mpx_vm_ops with ->name(). So perhaps ->anon_fault() makes
more sense. But lets not discuss this right now.
>
> > Until then I do not see why the change above can be really bad. The
> > VM_SHARED case is fine, do_anonymous_page() will return VM_FAULT_SIGBUS.
> >
> > So afaics the only problem is that after the change above the private
> > mapping can silently get an anonymous page after (say) MADV_DONTNEED
> > instead of the nice SIGBUS from do_fault(). I agree, this is not good,
> > but see above.
>
> So, what the point to introduce vma_is_anonymous() if it often produces
> false result? vma_is_anonymous_or_maybe_not()?
Heh.
Then what the point to demand that "All file mapping must have ->vm_ops set"
if mmap(MAP_PRIVATE, "/dev/zero") has ->vm_ops == NULL ? Because this is
not actually the file mapping, yes. And this is why we want vma_is_anonymous()
to return T in this case.
vma_is_anonymous() just says that a page fault will use do_anonymous_page().
I agree, it would be nice to ensure vma_is_anonymous() can only be true
if this vma can only have the anon pages. Let me repeat that I suggested
this change as a short-term fix (at least without other changes like we
discuss above). Because the mmap_zero() hack looks worse to me. Damn, even
the ugly hack below looks better to me.
> > Whether we need to keep the vm_ops/fault check in __vma_link_rb() and
> > mmap_region() is another issue. But if we keep them, then I think we
> > should at least turn the !vma->vm_ops check in mmap_region into
> > WARN_ON() as well.
>
> It would require first fix all known cases where ->f_op->mmap() returns
> vma->vm_ops == NULL. Not subject for 4.3, I think.
Kirill, I even sent you the private email to clarify that - of course! -
I only meant "in the longer term" ;)
Oleg.
--- x/include/linux/mm.h
+++ x/include/linux/mm.h
@@ -1289,9 +1289,11 @@ static inline int vma_growsdown(struct v
return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN);
}
+#define xxx_fault ((void*)1)
+
static inline bool vma_is_anonymous(struct vm_area_struct *vma)
{
- return !vma->vm_ops;
+ return !vma->vm_ops || vma->vm_ops->fault == xxx_fault;
}
static inline int stack_guard_page_start(struct vm_area_struct *vma,
--- x/drivers/char/mem.c
+++ x/drivers/char/mem.c
@@ -653,11 +653,17 @@ static ssize_t read_iter_zero(struct kio
static int mmap_zero(struct file *file, struct vm_area_struct *vma)
{
+ static const struct vm_operations_struct xxx_ops = {
+ .fault = xxx_fault,
+ };
+ }
#ifndef CONFIG_MMU
return -ENOSYS;
#endif
if (vma->vm_flags & VM_SHARED)
return shmem_zero_setup(vma);
+
+ vma->vm_ops = &xxx_ops;
return 0;
}
next prev parent reply other threads:[~2015-09-15 15:16 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-14 10:53 LTP regressions due to 6dc296e7df4c ("mm: make sure all file VMAs have ->vm_ops set") Will Deacon
2015-09-14 10:53 ` Will Deacon
2015-09-14 11:57 ` Kirill A. Shutemov
2015-09-14 11:57 ` Kirill A. Shutemov
2015-09-14 13:22 ` Will Deacon
2015-09-14 13:22 ` Will Deacon
2015-09-14 17:05 ` Oleg Nesterov
2015-09-14 17:05 ` Oleg Nesterov
2015-09-14 17:46 ` Oleg Nesterov
2015-09-14 17:46 ` Oleg Nesterov
2015-09-14 18:20 ` Kirill A. Shutemov
2015-09-14 18:20 ` Kirill A. Shutemov
2015-09-15 12:12 ` Oleg Nesterov
2015-09-15 12:12 ` Oleg Nesterov
2015-09-15 13:42 ` Kirill A. Shutemov
2015-09-15 13:42 ` Kirill A. Shutemov
2015-09-15 15:13 ` Oleg Nesterov [this message]
2015-09-15 15:13 ` Oleg Nesterov
2015-09-16 21:28 ` Andrew Morton
2015-09-16 21:28 ` Andrew Morton
2015-09-17 15:37 ` Kirill A. Shutemov
2015-09-17 15:37 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150915151318.GA15866@redhat.com \
--to=oleg@redhat.com \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=minchan@kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.