Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics
@ 2015-08-23  4:48 George Spelvin
  2015-08-23  6:04 ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: George Spelvin @ 2015-08-23  4:48 UTC (permalink / raw)
  To: mingo; +Cc: linux, linux-kernel, linux-mm, torvalds

Linus wrote:
> I don't think any of this can be called "correct", in that the
> unlocked accesses to the cached state are clearly racy, but I think
> it's very much "acceptable".

I'd think you could easily fix that with a seqlock-like system.

What makes it so simple is that you can always fall back to
calc_vmalloc_info if there's any problem, rather than looping or blocking.

The basic idea is that you have a seqlock counter, but if either of
the two lsbits are set, the cached information is stale.

Basically, you need a seqlock and a spinlock.  The seqlock does
most of the work, and the spinlock ensures that there's only one
updater of the cache.

vmap_unlock() does set_bit(0, &seq->sequence).  This marks the information
as stale.

get_vmalloc_info reads the seqlock.  There are two case:
- If the two lsbits are 10, the cached information is valid.
  Copy it out, re-check the seqlock, and loop if the sequence
  number changes.
- In any other case, the cached information is
  not valid.
  - Try to obtain the spinlock.  Do not block if it's unavailable.
    - If unavailable, do not block.
    - If the lock is acquired:
      - Set the sequence to (sequence | 3) + 1 (we're the only writer)
      - This bumps the sequence number and leaves the lsbits at 00 (invalid)
      - Memory barrier TBD.  Do the RCU ops in calc_vmalloc_info do it for us?
  - Call calc_vmalloc_info
  - If we obtained the spinlock earlier:
    - Copy our vmi to cached_info
    - smp_wmb()
    - set_bit(1, &seq->sequence).  This marks the information as valid,
      as long as bit 0 is still clear.
    - Release the spinlock.

Basically, bit 0 says "vmalloc info has changed", and bit 1 says
"vmalloc cache has been updated".  This clears bit 0 before
starting the update so that an update during calc_vmalloc_info
will force a new update.

So the three case are basically:
00 - calc_vmalloc_info() in progress
01 - vmap_unlock() during calc_vmalloc_info()
10 - cached_info is valid
11 - vmap_unlock has invalidated cached_info, awaiting refresh

Logically, the sequence number should be initialized to ...01,
but the code above handles 00 okay.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics
  2015-08-23  4:48 [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics George Spelvin
@ 2015-08-23  6:04 ` Ingo Molnar
  2015-08-23  6:46   ` George Spelvin
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2015-08-23  6:04 UTC (permalink / raw)
  To: George Spelvin
  Cc: linux-kernel, linux-mm, torvalds, Dave Hansen, Peter Zijlstra,
	David Rientjes, Rik van Riel, Rasmus Villemoes


* George Spelvin <linux@horizon.com> wrote:

> Linus wrote:
> > I don't think any of this can be called "correct", in that the
> > unlocked accesses to the cached state are clearly racy, but I think
> > it's very much "acceptable".
> 
> I'd think you could easily fix that with a seqlock-like system.
> 
> What makes it so simple is that you can always fall back to
> calc_vmalloc_info if there's any problem, rather than looping or blocking.
> 
> The basic idea is that you have a seqlock counter, but if either of
> the two lsbits are set, the cached information is stale.
> 
> Basically, you need a seqlock and a spinlock.  The seqlock does
> most of the work, and the spinlock ensures that there's only one
> updater of the cache.
> 
> vmap_unlock() does set_bit(0, &seq->sequence).  This marks the information
> as stale.
> 
> get_vmalloc_info reads the seqlock.  There are two case:
> - If the two lsbits are 10, the cached information is valid.
>   Copy it out, re-check the seqlock, and loop if the sequence
>   number changes.
> - In any other case, the cached information is
>   not valid.
>   - Try to obtain the spinlock.  Do not block if it's unavailable.
>     - If unavailable, do not block.
>     - If the lock is acquired:
>       - Set the sequence to (sequence | 3) + 1 (we're the only writer)
>       - This bumps the sequence number and leaves the lsbits at 00 (invalid)
>       - Memory barrier TBD.  Do the RCU ops in calc_vmalloc_info do it for us?
>   - Call calc_vmalloc_info
>   - If we obtained the spinlock earlier:
>     - Copy our vmi to cached_info
>     - smp_wmb()
>     - set_bit(1, &seq->sequence).  This marks the information as valid,
>       as long as bit 0 is still clear.
>     - Release the spinlock.
> 
> Basically, bit 0 says "vmalloc info has changed", and bit 1 says
> "vmalloc cache has been updated".  This clears bit 0 before
> starting the update so that an update during calc_vmalloc_info
> will force a new update.
> 
> So the three case are basically:
> 00 - calc_vmalloc_info() in progress
> 01 - vmap_unlock() during calc_vmalloc_info()
> 10 - cached_info is valid
> 11 - vmap_unlock has invalidated cached_info, awaiting refresh
> 
> Logically, the sequence number should be initialized to ...01,
> but the code above handles 00 okay.

I think this is too complex.

How about something simple like the patch below (on top of the third patch)?

It makes the vmalloc info transactional - /proc/meminfo will always print a 
consistent set of numbers. (Not that we really care about races there, but it 
looks really simple to solve so why not.)

( I also moved the function-static cache next to the flag and seqlock - this
  should further compress the cache footprint. )

Have I missed anything? Very lightly tested: booted in a VM.

Thanks,

	Ingo

=========================>

 mm/vmalloc.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ef48e557df5a..66726f41e726 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -278,7 +278,15 @@ EXPORT_SYMBOL(vmalloc_to_pfn);
 
 static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock);
 
+/*
+ * Seqlock and flag for the vmalloc info cache printed in /proc/meminfo.
+ *
+ * The assumption of the optimization is that it's read frequently, but
+ * modified infrequently.
+ */
+static DEFINE_SEQLOCK(vmap_info_lock);
 static int vmap_info_changed;
+static struct vmalloc_info vmap_info_cache;
 
 static inline void vmap_lock(void)
 {
@@ -2752,10 +2760,14 @@ static void calc_vmalloc_info(struct vmalloc_info *vmi)
 
 void get_vmalloc_info(struct vmalloc_info *vmi)
 {
-	static struct vmalloc_info cached_info;
+	if (!READ_ONCE(vmap_info_changed)) {
+		unsigned int seq;
+
+		do {
+			seq = read_seqbegin(&vmap_info_lock);
+			*vmi = vmap_info_cache;
+		} while (read_seqretry(&vmap_info_lock, seq));
 
-	if (!vmap_info_changed) {
-		*vmi = cached_info;
 		return;
 	}
 
@@ -2764,8 +2776,9 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
 
 	calc_vmalloc_info(vmi);
 
-	barrier();
-	cached_info = *vmi;
+	write_seqlock(&vmap_info_lock);
+	vmap_info_cache = *vmi;
+	write_sequnlock(&vmap_info_lock);
 }
 
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics
  2015-08-23  6:04 ` Ingo Molnar
@ 2015-08-23  6:46   ` George Spelvin
  2015-08-23  8:17     ` [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: George Spelvin @ 2015-08-23  6:46 UTC (permalink / raw)
  To: linux, mingo
  Cc: dave, linux-kernel, linux-mm, linux, peterz, riel, rientjes,
	torvalds

Ingo Molnar <mingo@kernel.org> wrote:
> I think this is too complex.
> 
> How about something simple like the patch below (on top of the third patch)?

> It makes the vmalloc info transactional - /proc/meminfo will always print a 
> consistent set of numbers. (Not that we really care about races there, but it 
> looks really simple to solve so why not.)

Looks like a huge simplification!

It needs a comment about the approximate nature of the locking and
the obvious race conditions:
1) The first caller to get_vmalloc_info() clears vmap_info_changed
   before updating vmap_info_cache, so a second caller is likely to
   get stale data for the duration of a calc_vmalloc_info call.
2) Although unlikely, it's possible for two threads to race calling
   calc_vmalloc_info, and the one that computes fresher data updates
   the cache first, so the later write leaves stale data.

Other issues:
3) Me, I'd make vmap_info_changed a bool, for documentation more than
   any space saving.
4) I wish there were a trylock version of write_seqlock, so we could
   avoid blocking entirely.  (You *could* hand-roll it, but that eats
   into the simplicity.)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
  2015-08-23  6:46   ` George Spelvin
@ 2015-08-23  8:17     ` Ingo Molnar
  2015-08-23 20:53       ` Rasmus Villemoes
                         ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Ingo Molnar @ 2015-08-23  8:17 UTC (permalink / raw)
  To: George Spelvin
  Cc: dave, linux-kernel, linux-mm, linux, peterz, riel, rientjes,
	torvalds


* George Spelvin <linux@horizon.com> wrote:

> Ingo Molnar <mingo@kernel.org> wrote:
> > I think this is too complex.
> > 
> > How about something simple like the patch below (on top of the third patch)?
> 
> > It makes the vmalloc info transactional - /proc/meminfo will always print a 
> > consistent set of numbers. (Not that we really care about races there, but it 
> > looks really simple to solve so why not.)
> 
> Looks like a huge simplification!
> 
> It needs a comment about the approximate nature of the locking and
> the obvious race conditions:
> 1) The first caller to get_vmalloc_info() clears vmap_info_changed
>    before updating vmap_info_cache, so a second caller is likely to
>    get stale data for the duration of a calc_vmalloc_info call.
> 2) Although unlikely, it's possible for two threads to race calling
>    calc_vmalloc_info, and the one that computes fresher data updates
>    the cache first, so the later write leaves stale data.
> 
> Other issues:
> 3) Me, I'd make vmap_info_changed a bool, for documentation more than
>    any space saving.
> 4) I wish there were a trylock version of write_seqlock, so we could
>    avoid blocking entirely.  (You *could* hand-roll it, but that eats
>    into the simplicity.)

Ok, fair enough - so how about the attached approach instead, which uses a 64-bit 
generation counter to track changes to the vmalloc state.

This is still very simple, but should not suffer from stale data being returned 
indefinitely in /proc/meminfo. We might race - but that was true before as well 
due to the lock-less RCU list walk - but we'll always return a correct and 
consistent version of the information.

Lightly tested. This is a replacement patch to make it easier to read via email.

I also made sure there's no extra overhead in the !CONFIG_PROC_FS case.

Note that there's an even simpler variant possible I think: we could use just the 
two generation counters and barriers to remove the seqlock.

Thanks,

	Ingo

==============================>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
  2015-08-23  8:17     ` [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Ingo Molnar
@ 2015-08-23 20:53       ` Rasmus Villemoes
  2015-08-24  6:58         ` Ingo Molnar
  2015-08-23 21:56       ` Rasmus Villemoes
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Rasmus Villemoes @ 2015-08-23 20:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: George Spelvin, dave, linux-kernel, linux-mm, peterz, riel,
	rientjes, torvalds

On Sun, Aug 23 2015, Ingo Molnar <mingo@kernel.org> wrote:

> Ok, fair enough - so how about the attached approach instead, which
> uses a 64-bit generation counter to track changes to the vmalloc
> state.

How does this invalidation approach compare to the jiffies approach? In
other words, how often does the vmalloc info actually change (or rather,
in this approximation, how often is vmap_area_lock taken)? In
particular, does it also solve the problem with git's test suite and
similar situations with lots of short-lived processes?

> ==============================>
> From f9fd770e75e2edb4143f32ced0b53d7a77969c94 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@kernel.org>
> Date: Sat, 22 Aug 2015 12:28:01 +0200
> Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info
>
> Linus reported that glibc (rather stupidly) reads /proc/meminfo
> for every sysinfo() call,

Not quite: It is done by the two functions get_{av,}phys_pages
functions; and get_phys_pages is called (once per process) by glibc's
qsort implementation. In fact, sysinfo() is (at least part of) the cure,
not the disease. Whether qsort should care about the total amount of
memory is another discussion.

<http://thread.gmane.org/gmane.comp.lib.glibc.alpha/54342/focus=54558>

Rasmus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
  2015-08-23 20:53       ` Rasmus Villemoes
@ 2015-08-24  6:58         ` Ingo Molnar
  2015-08-24  8:39           ` Rasmus Villemoes
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2015-08-24  6:58 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: George Spelvin, dave, linux-kernel, linux-mm, peterz, riel,
	rientjes, torvalds

* Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:

> On Sun, Aug 23 2015, Ingo Molnar <mingo@kernel.org> wrote:
> 
> > Ok, fair enough - so how about the attached approach instead, which
> > uses a 64-bit generation counter to track changes to the vmalloc
> > state.
> 
> How does this invalidation approach compare to the jiffies approach? In
> other words, how often does the vmalloc info actually change (or rather,
> in this approximation, how often is vmap_area_lock taken)? In
> particular, does it also solve the problem with git's test suite and
> similar situations with lots of short-lived processes?

The two approaches are pretty similar, and in a typical distro with typical 
workload vmalloc() is mostly a boot time affair.

But vmalloc() can be used more often in certain corner cases - neither of the 
patches makes that in any way slower, just the optimization won't trigger that 
often.

Since vmalloc() use is suboptimal for several reasons (it does not use large pages 
for kernel space allocations, etc.), this is all pretty OK IMHO.

> > ==============================>
> > From f9fd770e75e2edb4143f32ced0b53d7a77969c94 Mon Sep 17 00:00:00 2001
> > From: Ingo Molnar <mingo@kernel.org>
> > Date: Sat, 22 Aug 2015 12:28:01 +0200
> > Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info
> >
> > Linus reported that glibc (rather stupidly) reads /proc/meminfo
> > for every sysinfo() call,
> 
> Not quite: It is done by the two functions get_{av,}phys_pages
> functions; and get_phys_pages is called (once per process) by glibc's
> qsort implementation. In fact, sysinfo() is (at least part of) the cure,
> not the disease. Whether qsort should care about the total amount of
> memory is another discussion.
> 
> <http://thread.gmane.org/gmane.comp.lib.glibc.alpha/54342/focus=54558>

Thanks, is the fixed up changelog below better?

	Ingo

===============>

mm/vmalloc: Cache the vmalloc memory info

Linus reported that for scripting-intense workloads such as the
Git build, glibc's qsort will read /proc/meminfo for every process
created (by way of get_phys_pages()), which causes the Git build 
to generate a surprising amount of kernel overhead.

A fair chunk of the overhead is due to get_vmalloc_info() - which 
walks a potentially long list to do its statistics.

Modify Linus's jiffies based patch to use generation counters
to cache the vmalloc info: vmap_unlock() increases the generation
counter, and the get_vmalloc_info() reads it and compares it
against a cached generation counter.

Also use a seqlock to make sure we always print a consistent
set of vmalloc statistics.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
  2015-08-24  6:58         ` Ingo Molnar
@ 2015-08-24  8:39           ` Rasmus Villemoes
  0 siblings, 0 replies; 25+ messages in thread
From: Rasmus Villemoes @ 2015-08-24  8:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: George Spelvin, dave, linux-kernel, linux-mm, peterz, riel,
	rientjes, torvalds

On Mon, Aug 24 2015, Ingo Molnar <mingo@kernel.org> wrote:

>
> Thanks, is the fixed up changelog below better?
>

Yes, though Linus specifically referred to "make test" (but I guess one
could/should consider that part of the build process).

Rasmus


> mm/vmalloc: Cache the vmalloc memory info
>
> Linus reported that for scripting-intense workloads such as the
> Git build, glibc's qsort will read /proc/meminfo for every process
> created (by way of get_phys_pages()), which causes the Git build 
> to generate a surprising amount of kernel overhead.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
  2015-08-23  8:17     ` [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Ingo Molnar
  2015-08-23 20:53       ` Rasmus Villemoes
@ 2015-08-23 21:56       ` Rasmus Villemoes
  2015-08-24  7:00         ` Ingo Molnar
  2015-08-25 16:39         ` Linus Torvalds
  2015-08-24  1:04       ` George Spelvin
  2015-08-25 12:46       ` [PATCH 3/3 v3] " Peter Zijlstra
  3 siblings, 2 replies; 25+ messages in thread
From: Rasmus Villemoes @ 2015-08-23 21:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: George Spelvin, dave, linux-kernel, linux-mm, peterz, riel,
	rientjes, torvalds

I was curious why these fields were ever added to /proc/meminfo, and dug
up this:

commit d262ee3ee6ba4f5f6125571d93d9d63191d2ef76
Author: Andrew Morton <akpm@digeo.com>
Date:   Sat Apr 12 12:59:04 2003 -0700

    [PATCH] vmalloc stats in /proc/meminfo
    
    From: Matt Porter <porter@cox.net>
    
    There was a thread a while back on lkml where Dave Hansen proposed this
    simple vmalloc usage reporting patch.  The thread pretty much died out as
    most people seemed focused on what VM loading type bugs it could solve.  I
    had posted that this type of information was really valuable in debugging
    embedded Linux board ports.  A common example is where people do arch
    specific setup that limits there vmalloc space and then they find modules
    won't load.  ;) Having the Vmalloc* info readily available is real useful in
    helping folks to fix their kernel ports.

That thread is at <http://thread.gmane.org/gmane.linux.kernel/53360>.

[Maybe one could just remove the fields and see if anybody actually
notices/cares any longer. Or, if they are only used by kernel
developers, put them in their own file.]

Rasmus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
  2015-08-23 21:56       ` Rasmus Villemoes
@ 2015-08-24  7:00         ` Ingo Molnar
  2015-08-25 16:39         ` Linus Torvalds
  1 sibling, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2015-08-24  7:00 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: George Spelvin, dave, linux-kernel, linux-mm, peterz, riel,
	rientjes, torvalds


* Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:

> I was curious why these fields were ever added to /proc/meminfo, and dug
> up this:
> 
> commit d262ee3ee6ba4f5f6125571d93d9d63191d2ef76
> Author: Andrew Morton <akpm@digeo.com>
> Date:   Sat Apr 12 12:59:04 2003 -0700
> 
>     [PATCH] vmalloc stats in /proc/meminfo
>     
>     From: Matt Porter <porter@cox.net>
>     
>     There was a thread a while back on lkml where Dave Hansen proposed this
>     simple vmalloc usage reporting patch.  The thread pretty much died out as
>     most people seemed focused on what VM loading type bugs it could solve.  I
>     had posted that this type of information was really valuable in debugging
>     embedded Linux board ports.  A common example is where people do arch
>     specific setup that limits there vmalloc space and then they find modules
>     won't load.  ;) Having the Vmalloc* info readily available is real useful in
>     helping folks to fix their kernel ports.
> 
> That thread is at <http://thread.gmane.org/gmane.linux.kernel/53360>.
> 
> [Maybe one could just remove the fields and see if anybody actually
> notices/cares any longer. Or, if they are only used by kernel
> developers, put them in their own file.]

So instead of removing the fields (which I'm quite sure is an ABI breaker as it 
could break less robust /proc/meminfo parsers and scripts), we could just report 
'0' all the time - and have the real info somewhere else?

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
  2015-08-23 21:56       ` Rasmus Villemoes
  2015-08-24  7:00         ` Ingo Molnar
@ 2015-08-25 16:39         ` Linus Torvalds
  2015-08-25 17:03           ` Linus Torvalds
  1 sibling, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2015-08-25 16:39 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Ingo Molnar, George Spelvin, Dave Hansen,
	Linux Kernel Mailing List, linux-mm, Peter Zijlstra, Rik van Riel,
	David Rientjes

On Sun, Aug 23, 2015 at 2:56 PM, Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> [Maybe one could just remove the fields and see if anybody actually
> notices/cares any longer. Or, if they are only used by kernel
> developers, put them in their own file.]

I'm actually inclined to try exactly that for 4.3, and then take
Ingo's patch as a fallback in case somebody actually notices.

I'm not convinced anybody actually uses those values, and they are
getting *less* relevant rather than more (on 64-bit, those values
really don't matter, since the vmalloc space isn't really a
limitation), so let's try removing them and seeing what happens. And
then we know what we can do if somebody does actually notice.

                     Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
  2015-08-25 16:39         ` Linus Torvalds
@ 2015-08-25 17:03           ` Linus Torvalds
  0 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2015-08-25 17:03 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Ingo Molnar, George Spelvin, Dave Hansen,
	Linux Kernel Mailing List, linux-mm, Peter Zijlstra, Rik van Riel,
	David Rientjes

On Tue, Aug 25, 2015 at 9:39 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'm not convinced anybody actually uses those values, and they are
> getting *less* relevant rather than more (on 64-bit, those values
> really don't matter, since the vmalloc space isn't really a
> limitation)

Side note: the people who actually care about "my vmalloc area is too
full, what's up?" would use /proc/vmallocinfo anyway, since that's
what shows things like fragmentation etc.

So I'm just talking about removing the /proc/meminfo part. First try
to remove it *all*, and if there is some script that hollers because
it wants to parse them, print out the values as zero.

                 Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
  2015-08-23  8:17     ` [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Ingo Molnar
  2015-08-23 20:53       ` Rasmus Villemoes
  2015-08-23 21:56       ` Rasmus Villemoes
@ 2015-08-24  1:04       ` George Spelvin
  2015-08-24  7:34         ` [PATCH 3/3 v4] " Ingo Molnar
  2015-08-25 12:46       ` [PATCH 3/3 v3] " Peter Zijlstra
  3 siblings, 1 reply; 25+ messages in thread
From: George Spelvin @ 2015-08-24  1:04 UTC (permalink / raw)
  To: linux, mingo
  Cc: dave, linux-kernel, linux-mm, linux, peterz, riel, rientjes,
	torvalds

First, an actual, albeit minor, bug: initializing both vmap_info_gen
and vmap_info_cache_gen to 0 marks the cache as valid, which it's not.

vmap_info_gen should be initialized to 1 to force an initial
cache update.

Second, I don't see why you need a 64-bit counter.  Seqlocks consider
32 bits (31 bits, actually, the lsbit means "update in progress") quite
a strong enough guarantee.

Third, it seems as though vmap_info_cache_gen is basically a duplicate
of vmap_info_lock.sequence.  It should be possible to make one variable
serve both purposes.

You just need a kludge to handle the case of multiple vamp_info updates
between cache updates.

There are two simple ones:

1) Avoid bumping vmap_info_gen unnecessarily.  In vmap_unlock(), do
	vmap_info_gen = (vmap_info_lock.sequence | 1) + 1;
2) - Make vmap_info_gen a seqcount_t
   - In vmap_unlock(), do write_seqcount_barrier(&vmap_info_gen)
   - In get_vmalloc_info, inside the seqlock critical section, do
     vmap_info_lock.seqcount.sequence = vmap_info_gen.sequence - 1;
     (Using the vmap_info_gen.sequence read while validating the
     cache in the first place.)

I should try to write an actual patch illustrating this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info
  2015-08-24  1:04       ` George Spelvin
@ 2015-08-24  7:34         ` Ingo Molnar
  2015-08-24  7:47           ` Ingo Molnar
  2015-08-24 13:11           ` [PATCH 3/3 v4] " John Stoffel
  0 siblings, 2 replies; 25+ messages in thread
From: Ingo Molnar @ 2015-08-24  7:34 UTC (permalink / raw)
  To: George Spelvin
  Cc: dave, linux-kernel, linux-mm, linux, peterz, riel, rientjes,
	torvalds


* George Spelvin <linux@horizon.com> wrote:

> First, an actual, albeit minor, bug: initializing both vmap_info_gen
> and vmap_info_cache_gen to 0 marks the cache as valid, which it's not.

Ha! :-) Fixed.

> vmap_info_gen should be initialized to 1 to force an initial
> cache update.

Yeah.

> Second, I don't see why you need a 64-bit counter.  Seqlocks consider
> 32 bits (31 bits, actually, the lsbit means "update in progress") quite
> a strong enough guarantee.

Just out of general paranoia - but you are right, and this would lower the 
overhead on 32-bit SMP platforms a bit, plus it avoids 64-bit word tearing 
artifacts on 32 bit platforms as well.

I modified it to u32.

> Third, it seems as though vmap_info_cache_gen is basically a duplicate
> of vmap_info_lock.sequence.  It should be possible to make one variable
> serve both purposes.

Correct, I alluded to that in my description:

> > Note that there's an even simpler variant possible I think: we could use just 
> > the two generation counters and barriers to remove the seqlock.

> You just need a kludge to handle the case of multiple vamp_info updates
> between cache updates.
> 
> There are two simple ones:
> 
> 1) Avoid bumping vmap_info_gen unnecessarily.  In vmap_unlock(), do
> 	vmap_info_gen = (vmap_info_lock.sequence | 1) + 1;
> 2) - Make vmap_info_gen a seqcount_t
>    - In vmap_unlock(), do write_seqcount_barrier(&vmap_info_gen)
>    - In get_vmalloc_info, inside the seqlock critical section, do
>      vmap_info_lock.seqcount.sequence = vmap_info_gen.sequence - 1;
>      (Using the vmap_info_gen.sequence read while validating the
>      cache in the first place.)
> 
> I should try to write an actual patch illustrating this.

So I think something like the patch below is even simpler than trying to kludge 
generation counter semantics into seqcounts.

I used two generation counters and a spinlock. The fast path is completely 
lockless and lightweight on modern SMP platforms. (where smp_rmb() is a no-op or 
very cheap.)

There's not even a seqlock retry loop, instead an invalid cache causes us to fall 
back to the old behavior - and the freshest result is guaranteed to end up in the 
cache.

The linecount got a bit larger: but half of it is comments.

Note that the generation counters are signed integers so that this comparison can 
be done:

+       if (gen-vmap_info_cache_gen > 0) {

Thanks,

	Ingo

======================>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info
  2015-08-24  7:34         ` [PATCH 3/3 v4] " Ingo Molnar
@ 2015-08-24  7:47           ` Ingo Molnar
  2015-08-24  7:50             ` [PATCH 3/3 v5] " Ingo Molnar
  2015-08-24 13:11           ` [PATCH 3/3 v4] " John Stoffel
  1 sibling, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2015-08-24  7:47 UTC (permalink / raw)
  To: George Spelvin
  Cc: dave, linux-kernel, linux-mm, linux, peterz, riel, rientjes,
	torvalds


* Ingo Molnar <mingo@kernel.org> wrote:

> +/*
> + * Return a consistent snapshot of the current vmalloc allocation
> + * statistics, for /proc/meminfo:
> + */
> +void get_vmalloc_info(struct vmalloc_info *vmi)
> +{
> +	int gen = READ_ONCE(vmap_info_gen);
> +
> +	/*
> +	 * If the generation counter of the cache matches that of
> +	 * the vmalloc generation counter then return the cache:
> +	 */
> +	if (READ_ONCE(vmap_info_cache_gen) == gen) {
> +		int gen_after;
> +
> +		/*
> +		 * The two read barriers make sure that we read
> +		 * 'gen', 'vmap_info_cache' and 'gen_after' in
> +		 * precisely that order:
> +		 */
> +		smp_rmb();
> +		*vmi = vmap_info_cache;
> +
> +		smp_rmb();
> +		gen_after = READ_ONCE(vmap_info_gen);
> +
> +		/* The cache is still valid: */
> +		if (gen == gen_after)
> +			return;
> +
> +		/* Ok, the cache got invalidated just now, regenerate it */
> +		gen = gen_after;
> +	}

One more detail: I just realized that with the read barriers, the READ_ONCE() 
accesses are not needed anymore - the barriers and the control dependencies are 
enough.

This will further simplify the code.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/3 v5] mm/vmalloc: Cache the vmalloc memory info
  2015-08-24  7:47           ` Ingo Molnar
@ 2015-08-24  7:50             ` Ingo Molnar
  2015-08-24 12:54               ` George Spelvin
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2015-08-24  7:50 UTC (permalink / raw)
  To: George Spelvin
  Cc: dave, linux-kernel, linux-mm, linux, peterz, riel, rientjes,
	torvalds


* Ingo Molnar <mingo@kernel.org> wrote:

> One more detail: I just realized that with the read barriers, the READ_ONCE() 
> accesses are not needed anymore - the barriers and the control dependencies are 
> enough.
> 
> This will further simplify the code.

I.e. something like the updated patch below. (We still need the WRITE_ONCE() for 
vmap_info_gen update.)

Thanks,

	Ingo

========================>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v5] mm/vmalloc: Cache the vmalloc memory info
  2015-08-24  7:50             ` [PATCH 3/3 v5] " Ingo Molnar
@ 2015-08-24 12:54               ` George Spelvin
  2015-08-25  9:56                 ` [PATCH 3/3 v6] " Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: George Spelvin @ 2015-08-24 12:54 UTC (permalink / raw)
  To: linux, mingo
  Cc: dave, linux-kernel, linux-mm, linux, peterz, riel, rientjes,
	torvalds

(I hope I'm not annoying you by bikeshedding this too much, although I
think this is improving.)

You've sort of re-invented spinlocks, but after thinking a bit,
it all works.

Rather than using a single word, which is incremented to an odd number
at the start of an update and an even number at the end, there are
two.  An update is in progress when they're unequal.

vmap_info_gen is incremented early, when the cache needs updating, and
read late (after the cache is copied).

vmap_info_cache_gen is incremented after the cache is updated, and read
early (before the cache is copied).

This is logically equivalent to my complicated scheme with atomic updates
to various bits in a single generation word, but greatly simplified by
having two separate words.  In particular, there's no longer a need to
distinguish "vmap has updated list" from "calc_vmalloc_info in progress".

I particularly like the "gen - vmap_info_cache_gen > 0" test.
You *must* test for inequality to prevent tearing of a valid cache
(...grr...English heteronyms...), and given that, might as well
require it be fresher.

Anyway, suggested changes for v6 (sigh...):

First: you do a second read of vmap_info_gen to optimize out the copy
of vmalloc_info if it's easily seen as pointless, but given how small
vmalloc_info is (two words!), i'd be inclined to omit that optimization.

Copy always, *then* see if it's worth keeping.  Smaller code, faster
fast path, and is barely noticeable on the slow path.

Second, and this is up to you, I'd be inclined to go fully non-blocking and
only spin_trylock().  If that fails, just skip the cache update.

Third, ANSI C rules allow a compiler to assume that signed integer
overflow does not occur.  That means that gcc is allowed to optimize
"if (x - y > 0)" to "if (x > y)".

Given that gcc has annoyed us by using this optimization in other
contexts, It might be safer to make them unsigned (which is required to
wrap properly) and cast to integer after subtraction.

Basically, the following (untested, but pretty damn simple):

+/*
+ * Return a consistent snapshot of the current vmalloc allocation
+ * statistics, for /proc/meminfo:
+ */
+void get_vmalloc_info(struct vmalloc_info *vmi)
+{
+	unsigned gen, cache_gen = READ_ONCE(vmap_info_cache_gen);
+
+	/*
+	 * The two read barriers make sure that we read
+	 * 'cache_gen', 'vmap_info_cache' and 'gen' in
+	 * precisely that order:
+	 */
+	smp_rmb();
+	*vmi = vmap_info_cache;
+
+	smp_rmb();
+	gen = READ_ONCE(vmap_info_gen);
+
+	/*
+	 * If the generation counter of the cache matches that of
+	 * the vmalloc generation counter then return the cache:
+	 */
+	if (gen == cache_gen)
+		return;
+
+	/* Make sure 'gen' is read before the vmalloc info */
+	smp_rmb();
+	calc_vmalloc_info(vmi);
+
+	/*
+	 * All updates to vmap_info_cache_gen go through this spinlock,
+	 * so when the cache got invalidated, we'll only mark it valid
+	 * again if we first fully write the new vmap_info_cache.
+	 *
+	 * This ensures that partial results won't be used.
+	 */
+	if (spin_trylock(&vmap_info_lock)) {
+		if ((int)(gen - vmap_info_cache_gen) > 0) {
+			vmap_info_cache = *vmi;
+			/*
+			 * Make sure the new cached data is visible before
+			 * the generation counter update:
+			 */
+			smp_wmb();
+			WRITE_ONCE(vmap_info_cache_gen, gen);
+		}
+		spin_unlock(&vmap_info_lock);
+	}
+}
+
+#endif /* CONFIG_PROC_FS */

The only remaining *very small* nit is that this function is a mix of
"return early" and "wrap it in an if()" style.  If you want to make that
"if (!spin_trylock(...)) return;", I leave that you your esthetic judgement.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info
  2015-08-24 12:54               ` George Spelvin
@ 2015-08-25  9:56                 ` Ingo Molnar
  2015-08-25 10:36                   ` George Spelvin
                                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Ingo Molnar @ 2015-08-25  9:56 UTC (permalink / raw)
  To: George Spelvin
  Cc: dave, linux-kernel, linux-mm, linux, peterz, riel, rientjes,
	torvalds


* George Spelvin <linux@horizon.com> wrote:

> (I hope I'm not annoying you by bikeshedding this too much, although I
> think this is improving.)

[ I don't mind, although I wish other, more critical parts of the kernel got this
  much attention as well ;-) ]

> Anyway, suggested changes for v6 (sigh...):
> 
> First: you do a second read of vmap_info_gen to optimize out the copy
> of vmalloc_info if it's easily seen as pointless, but given how small
> vmalloc_info is (two words!), i'd be inclined to omit that optimization.
> 
> Copy always, *then* see if it's worth keeping.  Smaller code, faster
> fast path, and is barely noticeable on the slow path.

Ok, done.

> Second, and this is up to you, I'd be inclined to go fully non-blocking and
> only spin_trylock().  If that fails, just skip the cache update.

So I'm not sure about this one: we have no guarantee of the order every updater 
reaches the spinlock, and we want the 'freshest' updater to do the update. The 
trylock might cause us to drop the 'freshest' update erroneously - so this change 
would introduce a 'stale data' bug I think.

> Third, ANSI C rules allow a compiler to assume that signed integer
> overflow does not occur.  That means that gcc is allowed to optimize
> "if (x - y > 0)" to "if (x > y)".

That's annoying ...

> Given that gcc has annoyed us by using this optimization in other
> contexts, It might be safer to make them unsigned (which is required to
> wrap properly) and cast to integer after subtraction.

Ok, done.

> Basically, the following (untested, but pretty damn simple):

I've attached v6 which applies your first and last suggestion, but not the trylock 
one.

I also removed _ONCE() accesses from the places that didn't need them.

I added your Reviewed-by optimistically, saving a v7 submission hopefully ;-)

Lightly tested.

Thanks,

	Ingo

==============================>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info
  2015-08-25  9:56                 ` [PATCH 3/3 v6] " Ingo Molnar
@ 2015-08-25 10:36                   ` George Spelvin
  2015-08-25 12:59                   ` Peter Zijlstra
  2015-08-25 14:19                   ` Rasmus Villemoes
  2 siblings, 0 replies; 25+ messages in thread
From: George Spelvin @ 2015-08-25 10:36 UTC (permalink / raw)
  To: linux, mingo
  Cc: dave, linux-kernel, linux-mm, linux, peterz, riel, rientjes,
	torvalds

>> Second, and this is up to you, I'd be inclined to go fully non-blocking
>> and only spin_trylock().  If that fails, just skip the cache update.

> So I'm not sure about this one: we have no guarantee of the order every
> updater reaches the spinlock, and we want the 'freshest' updater to do
> the update. The trylock might cause us to drop the 'freshest' update
> erroneously - so this change would introduce a 'stale data' bug I think.

Er, no it wouldn't.  If someone leaves the cache stale, they'll leave
vmap_info_cache_gen != vmap_info_gen and the next reader to come along
will see the cache is stale and refresh it.

If there's lock contention, there's a risk of more work, because the
callers fall back to calc_vmalloc_info rather than waiting.

But it's not a big risk.  With the blocking code, if two readers
arrive simultaneously and see a stale cache, they'll both call
calc_vmalloc_info() and then line up to update the cache.

The second will get the lock but then *not* update the cache.

With trylock(), the second will just skip the update faster.
Same number of calc_vmalloc_info() calls.

The only inefficient case is if two readers arrive far enough apart that
vmap_info_gen is updated between them, yet close enough together than
the second arrives at the lock while the first is updating the cache.

In that case, the second reader will not update the cache and (assuming
no more changes to vmap_info_gen) some future third reader will have to
duplicate the effort of calling calc_vmalloc_info().

But that's such a tiny window I give preference to my fondness
for non-blocking code.

> I added your Reviewed-by optimistically, saving a v7 submission hopefully ;-)

You did right.  As I said, the non-blocking part is your preference.

I've done so much nasty stuff in interrupt handlers ($DAY_JOB more than
kernel) that I go for the non-blocking algorithm whenever possible.

Reviewed-by: George Spelvin <linux@horizon.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info
  2015-08-25  9:56                 ` [PATCH 3/3 v6] " Ingo Molnar
  2015-08-25 10:36                   ` George Spelvin
@ 2015-08-25 12:59                   ` Peter Zijlstra
  2015-08-25 14:19                   ` Rasmus Villemoes
  2 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2015-08-25 12:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: George Spelvin, dave, linux-kernel, linux-mm, linux, riel,
	rientjes, torvalds

On Tue, Aug 25, 2015 at 11:56:38AM +0200, Ingo Molnar wrote:
> +void get_vmalloc_info(struct vmalloc_info *vmi)
> +{
> +	unsigned int cache_gen, gen;

I see you dropped the u64 thing, good, ignore that previous email.

> +
> +	/*
> +	 * The common case is that the cache is valid, so first
> +	 * read it, then check its validity.
> +	 *
> +	 * The two read barriers make sure that we read
> +	 * 'cache_gen', 'vmap_info_cache' and 'gen' in
> +	 * precisely that order:
> +	 */
> +	cache_gen = vmap_info_cache_gen;
> +	smp_rmb();
> +	*vmi = vmap_info_cache;
> +	smp_rmb();
> +	gen = vmap_info_gen;
> +
> +	/*
> +	 * If the generation counter of the cache matches that of
> +	 * the vmalloc generation counter then return the cache:
> +	 */
> +	if (cache_gen == gen)
> +		return;

There is one aspect of READ_ONCE() that is not replaced with smp_rmb(),
and that is that READ_ONCE() should avoid split loads.

Without READ_ONCE() the compiler is free to turn the loads into separate
and out of order byte loads just because its insane, thereby also making
the WRITE_ONCE() pointless.

Now I'm fairly sure it all doesn't matter much, the info can change the
moment we've copied it, so the read is inherently racy.

And by that same argument I feel this is all somewhat over engineered.

> +
> +	/* Make sure 'gen' is read before the vmalloc info: */
> +	smp_rmb();
> +	calc_vmalloc_info(vmi);
> +
> +	/*
> +	 * All updates to vmap_info_cache_gen go through this spinlock,
> +	 * so when the cache got invalidated, we'll only mark it valid
> +	 * again if we first fully write the new vmap_info_cache.
> +	 *
> +	 * This ensures that partial results won't be used and that the
> +	 * vmalloc info belonging to the freshest update is used:
> +	 */
> +	spin_lock(&vmap_info_lock);
> +	if ((int)(gen-vmap_info_cache_gen) > 0) {
> +		vmap_info_cache = *vmi;
> +		/*
> +		 * Make sure the new cached data is visible before
> +		 * the generation counter update:
> +		 */
> +		smp_wmb();
> +		vmap_info_cache_gen = gen;
> +	}
> +	spin_unlock(&vmap_info_lock);
> +}
> +
> +#endif /* CONFIG_PROC_FS */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info
  2015-08-25  9:56                 ` [PATCH 3/3 v6] " Ingo Molnar
  2015-08-25 10:36                   ` George Spelvin
  2015-08-25 12:59                   ` Peter Zijlstra
@ 2015-08-25 14:19                   ` Rasmus Villemoes
  2015-08-25 15:11                     ` George Spelvin
  2 siblings, 1 reply; 25+ messages in thread
From: Rasmus Villemoes @ 2015-08-25 14:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: George Spelvin, dave, linux-kernel, linux-mm, peterz, riel,
	rientjes, torvalds

On Tue, Aug 25 2015, Ingo Molnar <mingo@kernel.org> wrote:

> * George Spelvin <linux@horizon.com> wrote:
>
>> (I hope I'm not annoying you by bikeshedding this too much, although I
>> think this is improving.)
>
> [ I don't mind, although I wish other, more critical parts of the kernel got this
>   much attention as well ;-) ]
>

Since we're beating dead horses, let me point out one possibly
unintentional side-effect of initializing just one of vmap_info{,_cache}_gen:

$ nm -n vmlinux | grep -E 'vmap_info(_cache)?_gen'
ffffffff81e4e5e0 d vmap_info_gen
ffffffff820d5700 b vmap_info_cache_gen

[Up-thread, you wrote "I also moved the function-static cache next to the
flag and seqlock - this should further compress the cache footprint."]

One should probably ensure that they end up in the same cacheline if one
wants the fast-path to be as fast as possible - the easiest way to
ensure that is to put them in a small struct, and that might as well
contain the spinlock and the cache itself as well.

It's been fun seeing this evolve, but overall, I tend to agree with
Peter: It's a lot of complexity for little gain. If we're not going to
just kill the Vmalloc* fields (which is probably too controversial)
I'd prefer Linus' simpler version.

Rasmus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info
  2015-08-25 14:19                   ` Rasmus Villemoes
@ 2015-08-25 15:11                     ` George Spelvin
  0 siblings, 0 replies; 25+ messages in thread
From: George Spelvin @ 2015-08-25 15:11 UTC (permalink / raw)
  To: linux, mingo
  Cc: dave, linux-kernel, linux-mm, linux, peterz, riel, rientjes,
	torvalds

>>> (I hope I'm not annoying you by bikeshedding this too much, although I
>>> think this is improving.)
>>
>> [ I don't mind, although I wish other, more critical parts of the kernel got this
>>   much attention as well ;-) ]

That's the problem with small, understandable problems: people *aren't*
scared to mess with them.

> It's been fun seeing this evolve, but overall, I tend to agree with
> Peter: It's a lot of complexity for little gain. If we're not going to
> just kill the Vmalloc* fields (which is probably too controversial)
> I'd prefer Linus' simpler version.

Are you sure you're not being affected by the number of iterations?

The final version is not actually a lot of code (although yes, more than
Linus's), and offers the advantage of peace of mind: there's not some
nasty-smelling code you can't entirely trust left behind.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info
  2015-08-24  7:34         ` [PATCH 3/3 v4] " Ingo Molnar
  2015-08-24  7:47           ` Ingo Molnar
@ 2015-08-24 13:11           ` John Stoffel
  2015-08-24 15:11             ` George Spelvin
  1 sibling, 1 reply; 25+ messages in thread
From: John Stoffel @ 2015-08-24 13:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: George Spelvin, dave, linux-kernel, linux-mm, linux, peterz, riel,
	rientjes, torvalds

>>>>> "Ingo" == Ingo Molnar <mingo@kernel.org> writes:

Ingo> * George Spelvin <linux@horizon.com> wrote:

>> First, an actual, albeit minor, bug: initializing both vmap_info_gen
>> and vmap_info_cache_gen to 0 marks the cache as valid, which it's not.

Ingo> Ha! :-) Fixed.

>> vmap_info_gen should be initialized to 1 to force an initial
>> cache update.

Blech, it should be initialized with a proper #define
VMAP_CACHE_NEEDS_UPDATE 1, instead of more magic numbers.


Ingo> + */
Ingo> +static DEFINE_SPINLOCK(vmap_info_lock);
Ingo> +static int vmap_info_gen = 1;

   static int vmap_info_gen = VMAP_CACHE_NEEDS_UPDATE;

Ingo> +static int vmap_info_cache_gen;
Ingo> +static struct vmalloc_info vmap_info_cache;
Ingo> +#endif


This will help keep bugs like this out in the future... I hope!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info
  2015-08-24 13:11           ` [PATCH 3/3 v4] " John Stoffel
@ 2015-08-24 15:11             ` George Spelvin
  2015-08-24 15:55               ` John Stoffel
  0 siblings, 1 reply; 25+ messages in thread
From: George Spelvin @ 2015-08-24 15:11 UTC (permalink / raw)
  To: john, mingo
  Cc: dave, linux-kernel, linux-mm, linux, linux, peterz, riel,
	rientjes, torvalds

John Stoffel <john@stoffel.org> wrote:
>> vmap_info_gen should be initialized to 1 to force an initial
>> cache update.

> Blech, it should be initialized with a proper #define
> VMAP_CACHE_NEEDS_UPDATE 1, instead of more magic numbers.

Er... this is a joke, right?

First, this number is used exactly once, and it's not part of a collection
of similar numbers.  And the definition would be adjacent to the use.

We have easier ways of accomplishing that, called "comments".

Second, your proposed name is misleading.  "needs update" is defined
as vmap_info_gen != vmap_info_cache_gen.  There is no particular value
of either that has this meaning.

For example, initializing vmap_info_cache_gen to -1 would do just as well.
(I actually considered that before deciding that +1 was "simpler" than -1.)

For some versions of the code, an *arbitrary* difference is okay.
You could set one ot 0xDEADBEEF and the other to 0xFEEDFACE.

For other versions, the magnitude matters, but not *too* much.
Initializing it to 42 would be perfectly correct, but waste time doing
42 cache updates before settling down.

Singling out the value 1 as VMAP_CACHE_NEEDS_UPDATE is actively misleading.

> This will help keep bugs like this out in the future... I hope!

And this is the punchline, right?

The problem was not realizing that non-default initialization was required;
what we *call* the non-default value is irrelevant.

I doubt it would ever have been a real (i.e. noticeable) bug, actually;
the first bit of vmap activity in very early boot would have invalidated
the cache.

(John, my apologies if I went over the top and am contributing to LKML's
reputation for flaming.  I *did* actually laugh, and *do* think it's a
dumb idea, but my annoyance is really directed at unpleasant memories of
mindless application of coding style guidelines.  In this case, I suspect
you just posted before reading carefully enough to see the subtle logic.)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info
  2015-08-24 15:11             ` George Spelvin
@ 2015-08-24 15:55               ` John Stoffel
  0 siblings, 0 replies; 25+ messages in thread
From: John Stoffel @ 2015-08-24 15:55 UTC (permalink / raw)
  To: George Spelvin
  Cc: john, mingo, dave, linux-kernel, linux-mm, linux, peterz, riel,
	rientjes, torvalds

George> John Stoffel <john@stoffel.org> wrote:
>>> vmap_info_gen should be initialized to 1 to force an initial
>>> cache update.

>> Blech, it should be initialized with a proper #define
>> VMAP_CACHE_NEEDS_UPDATE 1, instead of more magic numbers.

George> Er... this is a joke, right?

Not really.  The comment made before was that by setting this variable
to zero, it wasn't properly initialized.  Which implies that either
the API is wrong... or we should be documenting it better.   I just
went in the direction of the #define instead of a comment. 

George> First, this number is used exactly once, and it's not part of
George> a collection of similar numbers.  And the definition would be
George> adjacent to the use.

George> We have easier ways of accomplishing that, called "comments".

Sure, that would be the better solution in this case.  

George> Second, your proposed name is misleading.  "needs update" is defined
George> as vmap_info_gen != vmap_info_cache_gen.  There is no particular value
George> of either that has this meaning.

George> For example, initializing vmap_info_cache_gen to -1 would do just as well.
George> (I actually considered that before deciding that +1 was "simpler" than -1.)

See, I just threw out a dumb suggestion without reading the patch
properly.  My fault.

George> (John, my apologies if I went over the top and am contributing to LKML's
George> reputation for flaming.  I *did* actually laugh, and *do* think it's a
George> dumb idea, but my annoyance is really directed at unpleasant memories of
George> mindless application of coding style guidelines.  In this case, I suspect
George> you just posted before reading carefully enough to see the subtle logic.)

Nope, I'm in the wrong here.  And your comment here is wonderful, I
really do appreciate how you handled my ham fisted attempt to
contribute.  But I've got thick skin and I'll keep trying in my free
time to comment on patches when I can.

John

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info
  2015-08-23  8:17     ` [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Ingo Molnar
                         ` (2 preceding siblings ...)
  2015-08-24  1:04       ` George Spelvin
@ 2015-08-25 12:46       ` Peter Zijlstra
  3 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2015-08-25 12:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: George Spelvin, dave, linux-kernel, linux-mm, linux, riel,
	rientjes, torvalds

On Sun, Aug 23, 2015 at 10:17:51AM +0200, Ingo Molnar wrote:
> +static u64 vmap_info_gen;
> +static u64 vmap_info_cache_gen;

> +void get_vmalloc_info(struct vmalloc_info *vmi)
> +{
> +	u64 gen = READ_ONCE(vmap_info_gen);
> +
> +	/*
> +	 * If the generation counter of the cache matches that of
> +	 * the vmalloc generation counter then return the cache:
> +	 */
> +	if (READ_ONCE(vmap_info_cache_gen) == gen) {

Why are those things u64? It has the obvious down-side that you still
get split loads on 32bit machines.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-08-25 17:03 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-23  4:48 [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics George Spelvin
2015-08-23  6:04 ` Ingo Molnar
2015-08-23  6:46   ` George Spelvin
2015-08-23  8:17     ` [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Ingo Molnar
2015-08-23 20:53       ` Rasmus Villemoes
2015-08-24  6:58         ` Ingo Molnar
2015-08-24  8:39           ` Rasmus Villemoes
2015-08-23 21:56       ` Rasmus Villemoes
2015-08-24  7:00         ` Ingo Molnar
2015-08-25 16:39         ` Linus Torvalds
2015-08-25 17:03           ` Linus Torvalds
2015-08-24  1:04       ` George Spelvin
2015-08-24  7:34         ` [PATCH 3/3 v4] " Ingo Molnar
2015-08-24  7:47           ` Ingo Molnar
2015-08-24  7:50             ` [PATCH 3/3 v5] " Ingo Molnar
2015-08-24 12:54               ` George Spelvin
2015-08-25  9:56                 ` [PATCH 3/3 v6] " Ingo Molnar
2015-08-25 10:36                   ` George Spelvin
2015-08-25 12:59                   ` Peter Zijlstra
2015-08-25 14:19                   ` Rasmus Villemoes
2015-08-25 15:11                     ` George Spelvin
2015-08-24 13:11           ` [PATCH 3/3 v4] " John Stoffel
2015-08-24 15:11             ` George Spelvin
2015-08-24 15:55               ` John Stoffel
2015-08-25 12:46       ` [PATCH 3/3 v3] " Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).