All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: George Spelvin <linux@horizon.com>
Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com,
	rientjes@google.com, torvalds@linux-foundation.org
Subject: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
Date: Sun, 23 Aug 2015 10:17:51 +0200	[thread overview]
Message-ID: <20150823081750.GA28349@gmail.com> (raw)
In-Reply-To: <20150823064603.14050.qmail@ns.horizon.com>


* George Spelvin <linux@horizon.com> wrote:

> Ingo Molnar <mingo@kernel.org> wrote:
> > I think this is too complex.
> > 
> > How about something simple like the patch below (on top of the third patch)?
> 
> > It makes the vmalloc info transactional - /proc/meminfo will always print a 
> > consistent set of numbers. (Not that we really care about races there, but it 
> > looks really simple to solve so why not.)
> 
> Looks like a huge simplification!
> 
> It needs a comment about the approximate nature of the locking and
> the obvious race conditions:
> 1) The first caller to get_vmalloc_info() clears vmap_info_changed
>    before updating vmap_info_cache, so a second caller is likely to
>    get stale data for the duration of a calc_vmalloc_info call.
> 2) Although unlikely, it's possible for two threads to race calling
>    calc_vmalloc_info, and the one that computes fresher data updates
>    the cache first, so the later write leaves stale data.
> 
> Other issues:
> 3) Me, I'd make vmap_info_changed a bool, for documentation more than
>    any space saving.
> 4) I wish there were a trylock version of write_seqlock, so we could
>    avoid blocking entirely.  (You *could* hand-roll it, but that eats
>    into the simplicity.)

Ok, fair enough - so how about the attached approach instead, which uses a 64-bit 
generation counter to track changes to the vmalloc state.

This is still very simple, but should not suffer from stale data being returned 
indefinitely in /proc/meminfo. We might race - but that was true before as well 
due to the lock-less RCU list walk - but we'll always return a correct and 
consistent version of the information.

Lightly tested. This is a replacement patch to make it easier to read via email.

I also made sure there's no extra overhead in the !CONFIG_PROC_FS case.

Note that there's an even simpler variant possible I think: we could use just the 
two generation counters and barriers to remove the seqlock.

Thanks,

	Ingo

==============================>

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: George Spelvin <linux@horizon.com>
Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com,
	rientjes@google.com, torvalds@linux-foundation.org
Subject: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
Date: Sun, 23 Aug 2015 10:17:51 +0200	[thread overview]
Message-ID: <20150823081750.GA28349@gmail.com> (raw)
In-Reply-To: <20150823064603.14050.qmail@ns.horizon.com>


* George Spelvin <linux@horizon.com> wrote:

> Ingo Molnar <mingo@kernel.org> wrote:
> > I think this is too complex.
> > 
> > How about something simple like the patch below (on top of the third patch)?
> 
> > It makes the vmalloc info transactional - /proc/meminfo will always print a 
> > consistent set of numbers. (Not that we really care about races there, but it 
> > looks really simple to solve so why not.)
> 
> Looks like a huge simplification!
> 
> It needs a comment about the approximate nature of the locking and
> the obvious race conditions:
> 1) The first caller to get_vmalloc_info() clears vmap_info_changed
>    before updating vmap_info_cache, so a second caller is likely to
>    get stale data for the duration of a calc_vmalloc_info call.
> 2) Although unlikely, it's possible for two threads to race calling
>    calc_vmalloc_info, and the one that computes fresher data updates
>    the cache first, so the later write leaves stale data.
> 
> Other issues:
> 3) Me, I'd make vmap_info_changed a bool, for documentation more than
>    any space saving.
> 4) I wish there were a trylock version of write_seqlock, so we could
>    avoid blocking entirely.  (You *could* hand-roll it, but that eats
>    into the simplicity.)

Ok, fair enough - so how about the attached approach instead, which uses a 64-bit 
generation counter to track changes to the vmalloc state.

This is still very simple, but should not suffer from stale data being returned 
indefinitely in /proc/meminfo. We might race - but that was true before as well 
due to the lock-less RCU list walk - but we'll always return a correct and 
consistent version of the information.

Lightly tested. This is a replacement patch to make it easier to read via email.

I also made sure there's no extra overhead in the !CONFIG_PROC_FS case.

Note that there's an even simpler variant possible I think: we could use just the 
two generation counters and barriers to remove the seqlock.

Thanks,

	Ingo

==============================>
>From f9fd770e75e2edb4143f32ced0b53d7a77969c94 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@kernel.org>
Date: Sat, 22 Aug 2015 12:28:01 +0200
Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info

Linus reported that glibc (rather stupidly) reads /proc/meminfo
for every sysinfo() call, which causes the Git build to use
a surprising amount of CPU time, mostly due to the overhead
of get_vmalloc_info() - which walks a long list to do its
statistics.

Modify Linus's jiffies based patch to use generation counters
to cache the vmalloc info: vmap_unlock() increases the generation
counter, and the get_vmalloc_info() reads it and compares it
against a cached generation counter.

Also use a seqlock to make sure we always print a consistent
set of vmalloc statistics.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 mm/vmalloc.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 56 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 605138083880..d72b23436906 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -276,7 +276,21 @@ EXPORT_SYMBOL(vmalloc_to_pfn);
 #define VM_LAZY_FREEING	0x02
 #define VM_VM_AREA	0x04
 
-static DEFINE_SPINLOCK(vmap_area_lock);
+static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock);
+
+#ifdef CONFIG_PROC_FS
+/*
+ * A seqlock and two generation counters for a simple cache of the
+ * vmalloc allocation statistics info printed in /proc/meminfo.
+ *
+ * ( The assumption of the optimization is that it's read frequently, but
+ *   modified infrequently. )
+ */
+static DEFINE_SEQLOCK(vmap_info_lock);
+static u64 vmap_info_gen;
+static u64 vmap_info_cache_gen;
+static struct vmalloc_info vmap_info_cache;
+#endif
 
 static inline void vmap_lock(void)
 {
@@ -285,6 +299,9 @@ static inline void vmap_lock(void)
 
 static inline void vmap_unlock(void)
 {
+#ifdef CONFIG_PROC_FS
+	WRITE_ONCE(vmap_info_gen, vmap_info_gen+1);
+#endif
 	spin_unlock(&vmap_area_lock);
 }
 
@@ -2699,7 +2716,7 @@ static int __init proc_vmalloc_init(void)
 }
 module_init(proc_vmalloc_init);
 
-void get_vmalloc_info(struct vmalloc_info *vmi)
+static void calc_vmalloc_info(struct vmalloc_info *vmi)
 {
 	struct vmap_area *va;
 	unsigned long free_area_size;
@@ -2746,5 +2763,41 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
 out:
 	rcu_read_unlock();
 }
-#endif
 
+/*
+ * Return a consistent snapshot of the current vmalloc allocation
+ * statistics, for /proc/meminfo:
+ */
+void get_vmalloc_info(struct vmalloc_info *vmi)
+{
+	u64 gen = READ_ONCE(vmap_info_gen);
+
+	/*
+	 * If the generation counter of the cache matches that of
+	 * the vmalloc generation counter then return the cache:
+	 */
+	if (READ_ONCE(vmap_info_cache_gen) == gen) {
+		unsigned int seq;
+
+		do {
+			seq = read_seqbegin(&vmap_info_lock);
+			*vmi = vmap_info_cache;
+		} while (read_seqretry(&vmap_info_lock, seq));
+
+		return;
+	}
+
+	calc_vmalloc_info(vmi);
+
+	/*
+	 * If are racing with a new vmalloc() then we might write
+	 * the old generation counter here - and the next call to
+	 * get_vmalloc_info() will fix things up:
+	 */
+	write_seqlock(&vmap_info_lock);
+	vmap_info_cache = *vmi;
+	WRITE_ONCE(vmap_info_cache_gen, gen);
+	write_sequnlock(&vmap_info_lock);
+}
+
+#endif /* CONFIG_PROC_FS */

  reply	other threads:[~2015-08-23  8:17 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-23  4:48 [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics George Spelvin
2015-08-23  4:48 ` George Spelvin
2015-08-23  6:04 ` Ingo Molnar
2015-08-23  6:04   ` Ingo Molnar
2015-08-23  6:46   ` George Spelvin
2015-08-23  6:46     ` George Spelvin
2015-08-23  8:17     ` Ingo Molnar [this message]
2015-08-23  8:17       ` [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Ingo Molnar
2015-08-23 20:53       ` Rasmus Villemoes
2015-08-23 20:53         ` Rasmus Villemoes
2015-08-24  6:58         ` Ingo Molnar
2015-08-24  6:58           ` Ingo Molnar
2015-08-24  8:39           ` Rasmus Villemoes
2015-08-24  8:39             ` Rasmus Villemoes
2015-08-23 21:56       ` Rasmus Villemoes
2015-08-23 21:56         ` Rasmus Villemoes
2015-08-24  7:00         ` Ingo Molnar
2015-08-24  7:00           ` Ingo Molnar
2015-08-25 16:39         ` Linus Torvalds
2015-08-25 16:39           ` Linus Torvalds
2015-08-25 17:03           ` Linus Torvalds
2015-08-25 17:03             ` Linus Torvalds
2015-08-24  1:04       ` George Spelvin
2015-08-24  1:04         ` George Spelvin
2015-08-24  7:34         ` [PATCH 3/3 v4] " Ingo Molnar
2015-08-24  7:34           ` Ingo Molnar
2015-08-24  7:47           ` Ingo Molnar
2015-08-24  7:47             ` Ingo Molnar
2015-08-24  7:50             ` [PATCH 3/3 v5] " Ingo Molnar
2015-08-24  7:50               ` Ingo Molnar
2015-08-24 12:54               ` George Spelvin
2015-08-24 12:54                 ` George Spelvin
2015-08-25  9:56                 ` [PATCH 3/3 v6] " Ingo Molnar
2015-08-25  9:56                   ` Ingo Molnar
2015-08-25 10:36                   ` George Spelvin
2015-08-25 10:36                     ` George Spelvin
2015-08-25 12:59                   ` Peter Zijlstra
2015-08-25 12:59                     ` Peter Zijlstra
2015-08-25 14:19                   ` Rasmus Villemoes
2015-08-25 14:19                     ` Rasmus Villemoes
2015-08-25 15:11                     ` George Spelvin
2015-08-25 15:11                       ` George Spelvin
2015-08-24 13:11           ` [PATCH 3/3 v4] " John Stoffel
2015-08-24 13:11             ` John Stoffel
2015-08-24 15:11             ` George Spelvin
2015-08-24 15:11               ` George Spelvin
2015-08-24 15:55               ` John Stoffel
2015-08-24 15:55                 ` John Stoffel
2015-08-25 12:46       ` [PATCH 3/3 v3] " Peter Zijlstra
2015-08-25 12:46         ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150823081750.GA28349@gmail.com \
    --to=mingo@kernel.org \
    --cc=dave@sr71.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@horizon.com \
    --cc=linux@rasmusvillemoes.dk \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.