All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@linux.ibm.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	David Rientjes <rientjes@google.com>,
	kirill.shutemov@linux.intel.com, adobriyan@gmail.com,
	Linux API <linux-api@vger.kernel.org>,
	Andrei Vagin <avagin@gmail.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Linux-MM layout <linux-mm@kvack.org>
Subject: Re: + mm-thp-always-specify-disabled-vmas-as-nh-in-smaps.patch added to -mm tree
Date: Sun, 30 Dec 2018 19:55:56 +0200	[thread overview]
Message-ID: <20181230175555.GA31291@rapoport-lnx> (raw)
In-Reply-To: <00ec4644-70c2-4bd1-ec3f-b994fa0669e8@suse.cz>

On Fri, Dec 28, 2018 at 11:54:17AM +0100, Vlastimil Babka wrote:
> On 12/28/18 9:18 AM, Michal Hocko wrote:
> > On Thu 27-12-18 21:31:00, Andrew Morton wrote:
> >> On Thu, 27 Dec 2018 14:11:14 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >>
> >>> On Mon, Dec 24, 2018 at 10:17:31AM +0100, Michal Hocko wrote:
> >>>> On Mon 24-12-18 01:05:57, David Rientjes wrote:
> >>>> [...]
> >>>>> I'm not interested in having a 100 email thread about this when a clear 
> >>>>> and simple fix exists that actually doesn't break user code.
> >>>>
> >>>> You are breaking everybody who really wants to query MADV_NOHUGEPAGE
> >>>> status by this flag. Is there anybody doing that?
> >>>
> >>> Yes.
> >>>
> >>> https://github.com/checkpoint-restore/criu/blob/v3.11/criu/proc_parse.c#L143
> >>
> >> Ugh.  So the regression fix causes a regression?
> > 
> > Yes. The patch from David will hardcode the nohugepage vm flag if the
> > THP was disabled by the prctl at the time of the snapshot. And if the
> > application later enables THP by the prctl then existing mappings would
> > never get the THP enabled status back.
> > 
> > This is the kind of a potential regression I was poiting out earlier
> > when explaining that the patch encodes the logic into the flag exporting
> > and that means that whoever wants to get the raw value of the flag will
> > not be able to do so. Please note that the raw value is exactly what
> > this interface is documented and supposed to export. And as the
> > documentation explains it is implementation specific and anybody to use
> > it should be careful.
> 
> Let's add some CRIU guys in the loop (dunno if the right ones). We're
> discussing David's patch [1] that makes 'nh' and 'hg' flags reported in
> /proc/pid/smaps (and set by madvise) overridable by
> prctl(PR_SET_THP_DISABLE). This was sort of accidental behavior (but
> only for mappings created after the prctl call) before 4.13 commit
> 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active").
> 
> For David's userspace that commit is a regression as there are false
> positives when checking for vma's that are eligible for THP (=don't have
> the 'nh' flag in smaps) but didn't really obtain THP's. The userspace
> assumes it's due to fragmentation (=bad) and cannot know that it's due
> to the prctl(). But we fear that making prctl() affect smaps vma flags
> means that CRIU can't query them accurately anymore, and thus it's a
> regression for CRIU. Can you comment on that?

CRIU parses VmFlags from /proc/pid/smaps during the checkpoint and then
uses their raw values to recreate exactly the same mappings during restore.
We use appropriate madvise() calls to re-establish VMA flags.

Implicitly setting the 'nh' flag in /proc/pid/smaps when THP usage was
restricted with prctl() will make CRIU to call madvise(MADV_NOHUPEPAGE) for
all the mappings with 'nh' set as CRIU cannot detect the actual reason for
VMA being marked as VM_NOHUGEPAGE.

As Andrew pointed out, if the restored process will re-enable THP with
prtcl(), the VMAs will retail VM_NOHUGEPAGE flag. And, I don't see how can
we work around this in CRIU.

> Michal has a patch [2] that reports the prctl() status separately, but
> that doesn't help David's existing userspace. For CRIU this also won't
> help as long the smaps vma flags still silently included the prctl()
> status. Do you see some solution that would work for everybody?

Nothing comes to mind at the moment, maybe next year ;-)

> [1]
> https://www.ozlabs.org/~akpm/mmots/broken-out/mm-thp-always-specify-disabled-vmas-as-nh-in-smaps.patch
> [2]
> https://www.ozlabs.org/~akpm/mmots/broken-out/mm-proc-report-pr_set_thp_disable-in-proc.patch
> 

-- 
Sincerely yours,
Mike.

  parent reply	other threads:[~2018-12-30 17:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <alpine.DEB.2.21.1812210123210.232416@chino.kir.corp.google.com>
     [not found] ` <14e15543-c18b-6fa0-e107-194216ef3ada@suse.cz>
     [not found]   ` <20181221151256.GA6410@dhcp22.suse.cz>
     [not found]     ` <20181221140301.0e87b79b923ceb6d0f683749@linux-foundation.org>
     [not found]       ` <alpine.DEB.2.21.1812211419320.219499@chino.kir.corp.google.com>
     [not found]         ` <20181224080426.GC9063@dhcp22.suse.cz>
     [not found]           ` <alpine.DEB.2.21.1812240058060.114867@chino.kir.corp.google.com>
     [not found]             ` <20181224091731.GB16738@dhcp22.suse.cz>
     [not found]               ` <20181227111114.5tvvkddyp7cytzeb@kshutemo-mobl1>
     [not found]                 ` <20181227213100.aeee730c1f9ec5cb11de39a3@linux-foundation.org>
     [not found]                   ` <20181228081847.GP16738@dhcp22.suse.cz>
2018-12-28 10:54                     ` + mm-thp-always-specify-disabled-vmas-as-nh-in-smaps.patch added to -mm tree Vlastimil Babka
2018-12-28 12:19                       ` Michal Hocko
2018-12-28 12:35                       ` Cyrill Gorcunov
2018-12-30 17:55                       ` Mike Rapoport [this message]
2019-01-15  6:32                       ` Mike Rapoport
2019-01-21 10:21                         ` Michal Hocko
2019-01-21 18:00                           ` Cyrill Gorcunov
2019-01-21 18:18                             ` Michal Hocko
2019-01-21 18:24                               ` Cyrill Gorcunov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181230175555.GA31291@rapoport-lnx \
    --to=rppt@linux.ibm.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=gorcunov@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.