All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Michal Hocko <mhocko@suse.com>,
	Jan Kara <jack@suse.cz>, Dan Williams <dan.j.williams@intel.com>,
	David Rientjes <rientjes@google.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Paul Oppenheimer <bepvte@gmail.com>,
	William Kucharski <william.kucharski@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.9 37/39] mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps
Date: Thu, 24 Jan 2019 20:20:40 +0100	[thread overview]
Message-ID: <20190124190449.667104133@linuxfoundation.org> (raw)
In-Reply-To: <20190124190448.232316246@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 7550c6079846a24f30d15ac75a941c8515dbedfb ]

Patch series "THP eligibility reporting via proc".

This series of three patches aims at making THP eligibility reporting much
more robust and long term sustainable.  The trigger for the change is a
regression report [2] and the long follow up discussion.  In short the
specific application didn't have good API to query whether a particular
mapping can be backed by THP so it has used VMA flags to workaround that.
These flags represent a deep internal state of VMAs and as such they
should be used by userspace with a great deal of caution.

A similar has happened for [3] when users complained that VM_MIXEDMAP is
no longer set on DAX mappings.  Again a lack of a proper API led to an
abuse.

The first patch in the series tries to emphasise that that the semantic of
flags might change and any application consuming those should be really
careful.

The remaining two patches provide a more suitable interface to address [2]
and provide a consistent API to query the THP status both for each VMA and
process wide as well.  [1]

http://lkml.kernel.org/r/20181120103515.25280-1-mhocko@kernel.org [2]
http://lkml.kernel.org/r/http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com
[3] http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz

This patch (of 3):

Even though vma flags exported via /proc/<pid>/smaps are explicitly
documented to be not guaranteed for future compatibility the warning
doesn't go far enough because it doesn't mention semantic changes to those
flags.  And they are important as well because these flags are a deep
implementation internal to the MM code and the semantic might change at
any time.

Let's consider two recent examples:
http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz
: commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has
: removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the
: mean time certain customer of ours started poking into /proc/<pid>/smaps
: and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA
: flags, the application just fails to start complaining that DAX support is
: missing in the kernel.

http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com
: Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
: introduced a regression in that userspace cannot always determine the set
: of vmas where thp is ineligible.
: Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
: to determine if a vma is eligible to be backed by hugepages.
: Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
: be disabled and emit "nh" as a flag for the corresponding vmas as part of
: /proc/pid/smaps.  After the commit, thp is disabled by means of an mm
: flag and "nh" is not emitted.
: This causes smaps parsing libraries to assume a vma is eligible for thp
: and ends up puzzling the user on why its memory is not backed by thp.

In both cases userspace was relying on a semantic of a specific VMA flag.
The primary reason why that happened is a lack of a proper interface.
While this has been worked on and it will be fixed properly, it seems that
our wording could see some refinement and be more vocal about semantic
aspect of these flags as well.

Link: http://lkml.kernel.org/r/20181211143641.3503-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Paul Oppenheimer <bepvte@gmail.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 Documentation/filesystems/proc.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 74329fd0add2..077c37ed976f 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -484,7 +484,9 @@ manner. The codes are the following:
 
 Note that there is no guarantee that every flag and associated mnemonic will
 be present in all further kernel releases. Things get changed, the flags may
-be vanished or the reverse -- new added.
+be vanished or the reverse -- new added. Interpretation of their meaning
+might change in future as well. So each consumer of these flags has to
+follow each specific kernel version for the exact semantic.
 
 This file is only present if the CONFIG_MMU kernel configuration option is
 enabled.
-- 
2.19.1




  parent reply	other threads:[~2019-01-24 19:30 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-24 19:20 [PATCH 4.9 00/39] 4.9.153-stable review Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 01/39] r8169: Add support for new Realtek Ethernet Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 02/39] ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 03/39] ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 04/39] platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 05/39] e1000e: allow non-monotonic SYSTIM readings Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 06/39] writeback: dont decrement wb->refcnt if !wb->bdi Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 07/39] serial: set suppress_bind_attrs flag only if builtin Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 08/39] ALSA: oxfw: add support for APOGEE duet FireWire Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 09/39] MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 10/39] arm64: perf: set suppress_bind_attrs flag to true Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 11/39] selinux: always allow mounting submounts Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 12/39] rxe: IB_WR_REG_MR does not capture MRs iova field Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 13/39] jffs2: Fix use of uninitialized delayed_work, lockdep breakage Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 14/39] pstore/ram: Do not treat empty buffers as valid Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 15/39] powerpc/xmon: Fix invocation inside lock region Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 16/39] powerpc/pseries/cpuidle: Fix preempt warning Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 17/39] media: firewire: Fix app_info parameter type in avc_ca{,_app}_info Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 18/39] net: call sk_dst_reset when set SO_DONTROUTE Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 19/39] scsi: target: use consistent left-aligned ASCII INQUIRY data Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 20/39] clk: imx6q: reset exclusive gates on init Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 21/39] kconfig: fix file name and line number of warn_ignored_character() Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 22/39] kconfig: fix memory leak when EOF is encountered in quotation Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 23/39] mmc: atmel-mci: do not assume idle after atmci_request_end Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 24/39] tty/serial: do not free trasnmit buffer page under port lock Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 25/39] perf intel-pt: Fix error with config term "pt=0" Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 26/39] perf svghelper: Fix unchecked usage of strncpy() Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 27/39] perf parse-events: " Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 28/39] dm kcopyd: Fix bug causing workqueue stalls Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 29/39] tools lib subcmd: Dont add the kernel sources to the include path Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 30/39] dm snapshot: Fix excessive memory usage and workqueue stalls Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 31/39] ALSA: bebob: fix model-id of unit for Apogee Ensemble Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 32/39] sysfs: Disable lockdep for driver bind/unbind files Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 33/39] scsi: smartpqi: correct lun reset issues Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 34/39] scsi: megaraid: fix out-of-bound array accesses Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 35/39] ocfs2: fix panic due to unrecovered local alloc Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 36/39] mm/page-writeback.c: dont break integrity writeback on ->writepage() error Greg Kroah-Hartman
2019-01-24 19:20 ` Greg Kroah-Hartman [this message]
2019-01-24 19:20 ` [PATCH 4.9 38/39] ipmi:ssif: Fix handling of multi-part return messages Greg Kroah-Hartman
2019-01-24 19:20 ` [PATCH 4.9 39/39] locking/qspinlock: Pull in asm/byteorder.h to ensure correct endianness Greg Kroah-Hartman
2019-01-25 14:43 ` [PATCH 4.9 00/39] 4.9.153-stable review shuah
2019-01-25 16:28 ` Naresh Kamboju
2019-01-25 23:17 ` Guenter Roeck
2019-01-26 12:07 ` Jon Hunter
2019-01-26 12:07   ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190124190449.667104133@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=bepvte@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=william.kucharski@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.