Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Richard Weinberger <richard@nod.at>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, "Dave Jones" <davej@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Sasha Levin" <sasha.levin@oracle.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Toralf Förster" <toralf.foerster@gmx.de>
Subject: Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()
Date: Sun, 04 May 2014 10:34:18 +0200	[thread overview]
Message-ID: <5365FB8A.8080303@nod.at> (raw)
In-Reply-To: <CA+55aFzbSUPGWyO42KM7geAy8WrP8e=q+KoqdOBY68zay0jrZA@mail.gmail.com>

Linus,

Am 04.05.2014 01:57, schrieb Linus Torvalds:
> On Sat, May 3, 2014 at 4:37 PM, Richard Weinberger <richard@nod.at> wrote:
>> Commit 1cf35d47 (mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts)
>> accidently changed the behavior of the force_flush variable.
> 
> No it didn't. There was nothing accidental about it, and it doesn't
> even change it the way you claim.
> 
>> Before the patch it was set by __tlb_remove_page(). Now it is only set to 1
>> if __tlb_remove_page() returns false but never set back to 0 if __tlb_remove_page()
>> returns true.
> 
> It starts out as zero. If __tlb_remove_page() returns true, it never
> gets set to anything *but* zero, except by the dirty shared mapping
> case that *needs* to set it to non-zero, exactly because it *needs* to
> flush the TLB before releasing the pte lock.
> 
> Which was the whole point of the patch.
> 
> Your explanation makes no sense for _another_ reason: even with your
> patch, it never gets set back to zero, since if it gets set to one you
> have that "break" in there. So the whole "gets set back to zero" is
> simply not relevant or true, with or with the patch.

Hmm, I got confused by:
                        if (PageAnon(page))
                                rss[MM_ANONPAGES]--;
                        else {
                                if (pte_dirty(ptent)) {
                                        force_flush = 1;

Here you set force_flush.

                                        set_page_dirty(page);
                                }
                                if (pte_young(ptent) &&
                                    likely(!(vma->vm_flags & VM_SEQ_READ)))
                                        mark_page_accessed(page);
                                rss[MM_FILEPAGES]--;
                        }
                        page_remove_rmap(page);
                        if (unlikely(page_mapcount(page) < 0))
                                print_bad_pte(vma, addr, ptent, page);
                        if (unlikely(!__tlb_remove_page(tlb, page))) {
                                force_flush = 1;
                                break;
                        }

And here it cannot get back to 0.

                        continue;



> The only place it actually gets zeroed (apart from initialization) is
> for the "goto again" case, which does it (and always did it)
> 
>> Fixes BUG: Bad rss-counter state ...
>> and
>> kernel BUG at mm/filemap.c:202!
> 
> So tell us more about those actual problems, because your patch and
> explanation is clearly wrong.
> 
> What hardware, what load, what "kernel BUG at filemap.c:202"?

With your patch applied I see lots of BUG: Bad rss-counter state messages on UML (x86_32)
when fuzzing with trinity the mremap syscall.
And sometimes I face BUG at mm/filemap.c:202.

UML is here a bit special. It maps two pages into every process (the stub pages)
to issue mmap(), munmap() or mprotect() upon a page fault to fix memory mappings
for the faulting process on the host side.
It has to make sure that a guest process cannot mess with its stub pages.
Otherwise a guest could execute code on the host side.

Trinity manages to destroy these stub pages, UML detects this upon TLB handling
and kills the current process immediately.
After killing a trinity child I start observing the said issues.

e.g.
fix_range_common: failed, killing current process: 841
fix_range_common: failed, killing current process: 842
fix_range_common: failed, killing current process: 843
BUG: Bad rss-counter state mm:28e69600 idx:0 val:2

> The shared dirty fix may certainly be exposing some other issue, but
> the only report I have seen about filemap.c:202 was reported by Dave
> Jones ten *days* before the commit you talk about was even done.

Mea culpa, I've not noticed that fact.
Back to the drawing board...

Thanks,
//richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Richard Weinberger <richard@nod.at>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, "Dave Jones" <davej@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Sasha Levin" <sasha.levin@oracle.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Toralf Förster" <toralf.foerster@gmx.de>
Subject: Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()
Date: Sun, 04 May 2014 10:34:18 +0200	[thread overview]
Message-ID: <5365FB8A.8080303@nod.at> (raw)
In-Reply-To: <CA+55aFzbSUPGWyO42KM7geAy8WrP8e=q+KoqdOBY68zay0jrZA@mail.gmail.com>

Linus,

Am 04.05.2014 01:57, schrieb Linus Torvalds:
> On Sat, May 3, 2014 at 4:37 PM, Richard Weinberger <richard@nod.at> wrote:
>> Commit 1cf35d47 (mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts)
>> accidently changed the behavior of the force_flush variable.
> 
> No it didn't. There was nothing accidental about it, and it doesn't
> even change it the way you claim.
> 
>> Before the patch it was set by __tlb_remove_page(). Now it is only set to 1
>> if __tlb_remove_page() returns false but never set back to 0 if __tlb_remove_page()
>> returns true.
> 
> It starts out as zero. If __tlb_remove_page() returns true, it never
> gets set to anything *but* zero, except by the dirty shared mapping
> case that *needs* to set it to non-zero, exactly because it *needs* to
> flush the TLB before releasing the pte lock.
> 
> Which was the whole point of the patch.
> 
> Your explanation makes no sense for _another_ reason: even with your
> patch, it never gets set back to zero, since if it gets set to one you
> have that "break" in there. So the whole "gets set back to zero" is
> simply not relevant or true, with or with the patch.

Hmm, I got confused by:
                        if (PageAnon(page))
                                rss[MM_ANONPAGES]--;
                        else {
                                if (pte_dirty(ptent)) {
                                        force_flush = 1;

Here you set force_flush.

                                        set_page_dirty(page);
                                }
                                if (pte_young(ptent) &&
                                    likely(!(vma->vm_flags & VM_SEQ_READ)))
                                        mark_page_accessed(page);
                                rss[MM_FILEPAGES]--;
                        }
                        page_remove_rmap(page);
                        if (unlikely(page_mapcount(page) < 0))
                                print_bad_pte(vma, addr, ptent, page);
                        if (unlikely(!__tlb_remove_page(tlb, page))) {
                                force_flush = 1;
                                break;
                        }

And here it cannot get back to 0.

                        continue;



> The only place it actually gets zeroed (apart from initialization) is
> for the "goto again" case, which does it (and always did it)
> 
>> Fixes BUG: Bad rss-counter state ...
>> and
>> kernel BUG at mm/filemap.c:202!
> 
> So tell us more about those actual problems, because your patch and
> explanation is clearly wrong.
> 
> What hardware, what load, what "kernel BUG at filemap.c:202"?

With your patch applied I see lots of BUG: Bad rss-counter state messages on UML (x86_32)
when fuzzing with trinity the mremap syscall.
And sometimes I face BUG at mm/filemap.c:202.

UML is here a bit special. It maps two pages into every process (the stub pages)
to issue mmap(), munmap() or mprotect() upon a page fault to fix memory mappings
for the faulting process on the host side.
It has to make sure that a guest process cannot mess with its stub pages.
Otherwise a guest could execute code on the host side.

Trinity manages to destroy these stub pages, UML detects this upon TLB handling
and kills the current process immediately.
After killing a trinity child I start observing the said issues.

e.g.
fix_range_common: failed, killing current process: 841
fix_range_common: failed, killing current process: 842
fix_range_common: failed, killing current process: 843
BUG: Bad rss-counter state mm:28e69600 idx:0 val:2

> The shared dirty fix may certainly be exposing some other issue, but
> the only report I have seen about filemap.c:202 was reported by Dave
> Jones ten *days* before the commit you talk about was even done.

Mea culpa, I've not noticed that fact.
Back to the drawing board...

Thanks,
//richard

next prev parent reply	other threads:[~2014-05-04  8:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-15 19:09 [3.15rc1] BUG at mm/filemap.c:202! Dave Jones
2014-04-15 19:09 ` Dave Jones
2014-04-16 20:40 ` Hugh Dickins
2014-04-16 20:40   ` Hugh Dickins
2014-05-01 16:20   ` Richard Weinberger
2014-05-01 16:20     ` Richard Weinberger
2014-05-03 19:24     ` Richard Weinberger
2014-05-03 19:24       ` Richard Weinberger
2014-05-04 20:37       ` Hugh Dickins
2014-05-04 20:37         ` Hugh Dickins
2014-05-04 20:58         ` Richard Weinberger
2014-05-04 20:58           ` Richard Weinberger
2014-05-04 21:46           ` 502304919
2014-05-03 23:37   ` [PATCH] mm: Fix force_flush behavior in zap_pte_range() Richard Weinberger
2014-05-03 23:37     ` Richard Weinberger
2014-05-03 23:57     ` Linus Torvalds
2014-05-03 23:57       ` Linus Torvalds
2014-05-04  8:34       ` Richard Weinberger [this message]
2014-05-04  8:34         ` Richard Weinberger
2014-05-04 18:31         ` Linus Torvalds
2014-05-04 18:31           ` Linus Torvalds
2014-05-04 20:42           ` Richard Weinberger
2014-05-04 20:42             ` Richard Weinberger
2014-05-04 21:19             ` Linus Torvalds
2014-05-04 21:19               ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5365FB8A.8080303@nod.at \
    --to=richard@nod.at \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sasha.levin@oracle.com \
    --cc=toralf.foerster@gmx.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.