From: Ingo Molnar <mingo@kernel.org>
To: George Spelvin <linux@horizon.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
torvalds@linux-foundation.org, Dave Hansen <dave@sr71.net>,
Peter Zijlstra <peterz@infradead.org>,
David Rientjes <rientjes@google.com>,
Rik van Riel <riel@redhat.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>
Subject: Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics
Date: Sun, 23 Aug 2015 08:04:43 +0200 [thread overview]
Message-ID: <20150823060443.GA9882@gmail.com> (raw)
In-Reply-To: <20150823044839.5727.qmail@ns.horizon.com>
* George Spelvin <linux@horizon.com> wrote:
> Linus wrote:
> > I don't think any of this can be called "correct", in that the
> > unlocked accesses to the cached state are clearly racy, but I think
> > it's very much "acceptable".
>
> I'd think you could easily fix that with a seqlock-like system.
>
> What makes it so simple is that you can always fall back to
> calc_vmalloc_info if there's any problem, rather than looping or blocking.
>
> The basic idea is that you have a seqlock counter, but if either of
> the two lsbits are set, the cached information is stale.
>
> Basically, you need a seqlock and a spinlock. The seqlock does
> most of the work, and the spinlock ensures that there's only one
> updater of the cache.
>
> vmap_unlock() does set_bit(0, &seq->sequence). This marks the information
> as stale.
>
> get_vmalloc_info reads the seqlock. There are two case:
> - If the two lsbits are 10, the cached information is valid.
> Copy it out, re-check the seqlock, and loop if the sequence
> number changes.
> - In any other case, the cached information is
> not valid.
> - Try to obtain the spinlock. Do not block if it's unavailable.
> - If unavailable, do not block.
> - If the lock is acquired:
> - Set the sequence to (sequence | 3) + 1 (we're the only writer)
> - This bumps the sequence number and leaves the lsbits at 00 (invalid)
> - Memory barrier TBD. Do the RCU ops in calc_vmalloc_info do it for us?
> - Call calc_vmalloc_info
> - If we obtained the spinlock earlier:
> - Copy our vmi to cached_info
> - smp_wmb()
> - set_bit(1, &seq->sequence). This marks the information as valid,
> as long as bit 0 is still clear.
> - Release the spinlock.
>
> Basically, bit 0 says "vmalloc info has changed", and bit 1 says
> "vmalloc cache has been updated". This clears bit 0 before
> starting the update so that an update during calc_vmalloc_info
> will force a new update.
>
> So the three case are basically:
> 00 - calc_vmalloc_info() in progress
> 01 - vmap_unlock() during calc_vmalloc_info()
> 10 - cached_info is valid
> 11 - vmap_unlock has invalidated cached_info, awaiting refresh
>
> Logically, the sequence number should be initialized to ...01,
> but the code above handles 00 okay.
I think this is too complex.
How about something simple like the patch below (on top of the third patch)?
It makes the vmalloc info transactional - /proc/meminfo will always print a
consistent set of numbers. (Not that we really care about races there, but it
looks really simple to solve so why not.)
( I also moved the function-static cache next to the flag and seqlock - this
should further compress the cache footprint. )
Have I missed anything? Very lightly tested: booted in a VM.
Thanks,
Ingo
=========================>
mm/vmalloc.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ef48e557df5a..66726f41e726 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -278,7 +278,15 @@ EXPORT_SYMBOL(vmalloc_to_pfn);
static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock);
+/*
+ * Seqlock and flag for the vmalloc info cache printed in /proc/meminfo.
+ *
+ * The assumption of the optimization is that it's read frequently, but
+ * modified infrequently.
+ */
+static DEFINE_SEQLOCK(vmap_info_lock);
static int vmap_info_changed;
+static struct vmalloc_info vmap_info_cache;
static inline void vmap_lock(void)
{
@@ -2752,10 +2760,14 @@ static void calc_vmalloc_info(struct vmalloc_info *vmi)
void get_vmalloc_info(struct vmalloc_info *vmi)
{
- static struct vmalloc_info cached_info;
+ if (!READ_ONCE(vmap_info_changed)) {
+ unsigned int seq;
+
+ do {
+ seq = read_seqbegin(&vmap_info_lock);
+ *vmi = vmap_info_cache;
+ } while (read_seqretry(&vmap_info_lock, seq));
- if (!vmap_info_changed) {
- *vmi = cached_info;
return;
}
@@ -2764,8 +2776,9 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
calc_vmalloc_info(vmi);
- barrier();
- cached_info = *vmi;
+ write_seqlock(&vmap_info_lock);
+ vmap_info_cache = *vmi;
+ write_sequnlock(&vmap_info_lock);
}
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: George Spelvin <linux@horizon.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
torvalds@linux-foundation.org, Dave Hansen <dave@sr71.net>,
Peter Zijlstra <peterz@infradead.org>,
David Rientjes <rientjes@google.com>,
Rik van Riel <riel@redhat.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>
Subject: Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics
Date: Sun, 23 Aug 2015 08:04:43 +0200 [thread overview]
Message-ID: <20150823060443.GA9882@gmail.com> (raw)
In-Reply-To: <20150823044839.5727.qmail@ns.horizon.com>
* George Spelvin <linux@horizon.com> wrote:
> Linus wrote:
> > I don't think any of this can be called "correct", in that the
> > unlocked accesses to the cached state are clearly racy, but I think
> > it's very much "acceptable".
>
> I'd think you could easily fix that with a seqlock-like system.
>
> What makes it so simple is that you can always fall back to
> calc_vmalloc_info if there's any problem, rather than looping or blocking.
>
> The basic idea is that you have a seqlock counter, but if either of
> the two lsbits are set, the cached information is stale.
>
> Basically, you need a seqlock and a spinlock. The seqlock does
> most of the work, and the spinlock ensures that there's only one
> updater of the cache.
>
> vmap_unlock() does set_bit(0, &seq->sequence). This marks the information
> as stale.
>
> get_vmalloc_info reads the seqlock. There are two case:
> - If the two lsbits are 10, the cached information is valid.
> Copy it out, re-check the seqlock, and loop if the sequence
> number changes.
> - In any other case, the cached information is
> not valid.
> - Try to obtain the spinlock. Do not block if it's unavailable.
> - If unavailable, do not block.
> - If the lock is acquired:
> - Set the sequence to (sequence | 3) + 1 (we're the only writer)
> - This bumps the sequence number and leaves the lsbits at 00 (invalid)
> - Memory barrier TBD. Do the RCU ops in calc_vmalloc_info do it for us?
> - Call calc_vmalloc_info
> - If we obtained the spinlock earlier:
> - Copy our vmi to cached_info
> - smp_wmb()
> - set_bit(1, &seq->sequence). This marks the information as valid,
> as long as bit 0 is still clear.
> - Release the spinlock.
>
> Basically, bit 0 says "vmalloc info has changed", and bit 1 says
> "vmalloc cache has been updated". This clears bit 0 before
> starting the update so that an update during calc_vmalloc_info
> will force a new update.
>
> So the three case are basically:
> 00 - calc_vmalloc_info() in progress
> 01 - vmap_unlock() during calc_vmalloc_info()
> 10 - cached_info is valid
> 11 - vmap_unlock has invalidated cached_info, awaiting refresh
>
> Logically, the sequence number should be initialized to ...01,
> but the code above handles 00 okay.
I think this is too complex.
How about something simple like the patch below (on top of the third patch)?
It makes the vmalloc info transactional - /proc/meminfo will always print a
consistent set of numbers. (Not that we really care about races there, but it
looks really simple to solve so why not.)
( I also moved the function-static cache next to the flag and seqlock - this
should further compress the cache footprint. )
Have I missed anything? Very lightly tested: booted in a VM.
Thanks,
Ingo
=========================>
mm/vmalloc.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ef48e557df5a..66726f41e726 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -278,7 +278,15 @@ EXPORT_SYMBOL(vmalloc_to_pfn);
static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock);
+/*
+ * Seqlock and flag for the vmalloc info cache printed in /proc/meminfo.
+ *
+ * The assumption of the optimization is that it's read frequently, but
+ * modified infrequently.
+ */
+static DEFINE_SEQLOCK(vmap_info_lock);
static int vmap_info_changed;
+static struct vmalloc_info vmap_info_cache;
static inline void vmap_lock(void)
{
@@ -2752,10 +2760,14 @@ static void calc_vmalloc_info(struct vmalloc_info *vmi)
void get_vmalloc_info(struct vmalloc_info *vmi)
{
- static struct vmalloc_info cached_info;
+ if (!READ_ONCE(vmap_info_changed)) {
+ unsigned int seq;
+
+ do {
+ seq = read_seqbegin(&vmap_info_lock);
+ *vmi = vmap_info_cache;
+ } while (read_seqretry(&vmap_info_lock, seq));
- if (!vmap_info_changed) {
- *vmi = cached_info;
return;
}
@@ -2764,8 +2776,9 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
calc_vmalloc_info(vmi);
- barrier();
- cached_info = *vmi;
+ write_seqlock(&vmap_info_lock);
+ vmap_info_cache = *vmi;
+ write_sequnlock(&vmap_info_lock);
}
#endif
next prev parent reply other threads:[~2015-08-23 6:04 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-23 4:48 [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics George Spelvin
2015-08-23 4:48 ` George Spelvin
2015-08-23 6:04 ` Ingo Molnar [this message]
2015-08-23 6:04 ` Ingo Molnar
2015-08-23 6:46 ` George Spelvin
2015-08-23 6:46 ` George Spelvin
2015-08-23 8:17 ` [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Ingo Molnar
2015-08-23 8:17 ` Ingo Molnar
2015-08-23 20:53 ` Rasmus Villemoes
2015-08-23 20:53 ` Rasmus Villemoes
2015-08-24 6:58 ` Ingo Molnar
2015-08-24 6:58 ` Ingo Molnar
2015-08-24 8:39 ` Rasmus Villemoes
2015-08-24 8:39 ` Rasmus Villemoes
2015-08-23 21:56 ` Rasmus Villemoes
2015-08-23 21:56 ` Rasmus Villemoes
2015-08-24 7:00 ` Ingo Molnar
2015-08-24 7:00 ` Ingo Molnar
2015-08-25 16:39 ` Linus Torvalds
2015-08-25 16:39 ` Linus Torvalds
2015-08-25 17:03 ` Linus Torvalds
2015-08-25 17:03 ` Linus Torvalds
2015-08-24 1:04 ` George Spelvin
2015-08-24 1:04 ` George Spelvin
2015-08-24 7:34 ` [PATCH 3/3 v4] " Ingo Molnar
2015-08-24 7:34 ` Ingo Molnar
2015-08-24 7:47 ` Ingo Molnar
2015-08-24 7:47 ` Ingo Molnar
2015-08-24 7:50 ` [PATCH 3/3 v5] " Ingo Molnar
2015-08-24 7:50 ` Ingo Molnar
2015-08-24 12:54 ` George Spelvin
2015-08-24 12:54 ` George Spelvin
2015-08-25 9:56 ` [PATCH 3/3 v6] " Ingo Molnar
2015-08-25 9:56 ` Ingo Molnar
2015-08-25 10:36 ` George Spelvin
2015-08-25 10:36 ` George Spelvin
2015-08-25 12:59 ` Peter Zijlstra
2015-08-25 12:59 ` Peter Zijlstra
2015-08-25 14:19 ` Rasmus Villemoes
2015-08-25 14:19 ` Rasmus Villemoes
2015-08-25 15:11 ` George Spelvin
2015-08-25 15:11 ` George Spelvin
2015-08-24 13:11 ` [PATCH 3/3 v4] " John Stoffel
2015-08-24 13:11 ` John Stoffel
2015-08-24 15:11 ` George Spelvin
2015-08-24 15:11 ` George Spelvin
2015-08-24 15:55 ` John Stoffel
2015-08-24 15:55 ` John Stoffel
2015-08-25 12:46 ` [PATCH 3/3 v3] " Peter Zijlstra
2015-08-25 12:46 ` Peter Zijlstra
-- strict thread matches above, loose matches on Subject: below --
2015-08-22 10:44 [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics Ingo Molnar
2015-08-22 10:44 ` Ingo Molnar
2015-08-22 14:36 ` Linus Torvalds
2015-08-22 14:36 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150823060443.GA9882@gmail.com \
--to=mingo@kernel.org \
--cc=dave@sr71.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@horizon.com \
--cc=linux@rasmusvillemoes.dk \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.