linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/3] reading proc/pid/maps under RCU
@ 2024-01-15 18:38 Suren Baghdasaryan
  2024-01-15 18:38 ` [RFC 1/3] mm: make vm_area_struct anon_name field RCU-safe Suren Baghdasaryan
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Suren Baghdasaryan @ 2024-01-15 18:38 UTC (permalink / raw)
  To: akpm
  Cc: viro, brauner, jack, dchinner, casey, ben.wolsieffer, paulmck,
	david, avagin, usama.anjum, peterx, hughd, ryan.roberts,
	wangkefeng.wang, Liam.Howlett, yuzhao, axelrasmussen, lstoakes,
	talumbau, willy, vbabka, mgorman, jhubbard, vishal.moola,
	mathieu.desnoyers, dhowells, jgg, sidhartha.kumar,
	andriy.shevchenko, yangxingui, keescook, linux-kernel,
	linux-fsdevel, linux-mm, kernel-team, surenb

The issue this patchset is trying to address is mmap_lock contention when
a low priority task (monitoring, data collecting, etc.) blocks a higher
priority task from making updated to the address space. The contention is
due to the mmap_lock being held for read when reading proc/pid/maps.
With maple_tree introduction, VMA tree traversals are RCU-safe and per-vma
locks make VMA access RCU-safe. this provides an opportunity for lock-less
reading of proc/pid/maps. We still need to overcome a couple obstacles:
1. Make all VMA pointer fields used for proc/pid/maps content generation
RCU-safe;
2. Ensure that proc/pid/maps data tearing, which is currently possible at
page boundaries only, does not get worse.

The patchset deals with these issues but there is a downside which I would
like to get input on:
This change introduces unfairness towards the reader of proc/pid/maps,
which can be blocked by an overly active/malicious address space modifyer.
A couple of ways I though we can address this issue are:
1. After several lock-less retries (or some time limit) to fall back to
taking mmap_lock.
2. Employ lock-less reading only if the reader has low priority,
indicating that blocking it is not critical.
3. Introducing a separate procfs file which publishes the same data in
lock-less manner.

I imagine a combination of these approaches can also be employed.
I would like to get feedback on this from the Linux community.

Note: mmap_read_lock/mmap_read_unlock sequence inside validate_map()
can be replaced with more efficiend rwsem_wait() proposed by Matthew
in [1].

[1] https://lore.kernel.org/all/ZZ1+ZicgN8dZ3zj3@casper.infradead.org/

Suren Baghdasaryan (3):
  mm: make vm_area_struct anon_name field RCU-safe
  seq_file: add validate() operation to seq_operations
  mm/maps: read proc/pid/maps under RCU

 fs/proc/internal.h        |   3 +
 fs/proc/task_mmu.c        | 130 ++++++++++++++++++++++++++++++++++----
 fs/seq_file.c             |  24 ++++++-
 include/linux/mm_inline.h |  10 ++-
 include/linux/mm_types.h  |   3 +-
 include/linux/seq_file.h  |   1 +
 mm/madvise.c              |  30 +++++++--
 7 files changed, 181 insertions(+), 20 deletions(-)

-- 
2.43.0.381.gb435a96ce8-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-01-22  7:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-15 18:38 [RFC 0/3] reading proc/pid/maps under RCU Suren Baghdasaryan
2024-01-15 18:38 ` [RFC 1/3] mm: make vm_area_struct anon_name field RCU-safe Suren Baghdasaryan
2024-01-15 18:38 ` [RFC 2/3] seq_file: add validate() operation to seq_operations Suren Baghdasaryan
2024-01-15 18:38 ` [RFC 3/3] mm/maps: read proc/pid/maps under RCU Suren Baghdasaryan
2024-01-16 14:42 ` [RFC 0/3] reading " Vlastimil Babka
2024-01-16 14:46   ` Vlastimil Babka
2024-01-16 17:57     ` Suren Baghdasaryan
2024-01-18 17:58       ` Suren Baghdasaryan
2024-01-22  7:23         ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).