From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f174.google.com (mail-yk0-f174.google.com [209.85.160.174]) by kanga.kvack.org (Postfix) with ESMTP id 176E56B0038 for ; Sun, 23 Aug 2015 00:48:43 -0400 (EDT) Received: by ykdt205 with SMTP id t205so106898548ykd.1 for ; Sat, 22 Aug 2015 21:48:42 -0700 (PDT) Received: from ns.horizon.com (ns.horizon.com. [71.41.210.147]) by mx.google.com with SMTP id t127si8106979ywb.11.2015.08.22.21.48.41 for ; Sat, 22 Aug 2015 21:48:42 -0700 (PDT) Date: 23 Aug 2015 00:48:39 -0400 Message-ID: <20150823044839.5727.qmail@ns.horizon.com> From: "George Spelvin" Subject: Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics Sender: owner-linux-mm@kvack.org List-ID: To: mingo@kernel.org Cc: linux@horizon.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, torvalds@linux-foundation.org Linus wrote: > I don't think any of this can be called "correct", in that the > unlocked accesses to the cached state are clearly racy, but I think > it's very much "acceptable". I'd think you could easily fix that with a seqlock-like system. What makes it so simple is that you can always fall back to calc_vmalloc_info if there's any problem, rather than looping or blocking. The basic idea is that you have a seqlock counter, but if either of the two lsbits are set, the cached information is stale. Basically, you need a seqlock and a spinlock. The seqlock does most of the work, and the spinlock ensures that there's only one updater of the cache. vmap_unlock() does set_bit(0, &seq->sequence). This marks the information as stale. get_vmalloc_info reads the seqlock. There are two case: - If the two lsbits are 10, the cached information is valid. Copy it out, re-check the seqlock, and loop if the sequence number changes. - In any other case, the cached information is not valid. - Try to obtain the spinlock. Do not block if it's unavailable. - If unavailable, do not block. - If the lock is acquired: - Set the sequence to (sequence | 3) + 1 (we're the only writer) - This bumps the sequence number and leaves the lsbits at 00 (invalid) - Memory barrier TBD. Do the RCU ops in calc_vmalloc_info do it for us? - Call calc_vmalloc_info - If we obtained the spinlock earlier: - Copy our vmi to cached_info - smp_wmb() - set_bit(1, &seq->sequence). This marks the information as valid, as long as bit 0 is still clear. - Release the spinlock. Basically, bit 0 says "vmalloc info has changed", and bit 1 says "vmalloc cache has been updated". This clears bit 0 before starting the update so that an update during calc_vmalloc_info will force a new update. So the three case are basically: 00 - calc_vmalloc_info() in progress 01 - vmap_unlock() during calc_vmalloc_info() 10 - cached_info is valid 11 - vmap_unlock has invalidated cached_info, awaiting refresh Logically, the sequence number should be initialized to ...01, but the code above handles 00 okay. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by kanga.kvack.org (Postfix) with ESMTP id AB90B6B0038 for ; Sun, 23 Aug 2015 02:04:49 -0400 (EDT) Received: by wicne3 with SMTP id ne3so45198511wic.0 for ; Sat, 22 Aug 2015 23:04:49 -0700 (PDT) Received: from mail-wi0-x232.google.com (mail-wi0-x232.google.com. [2a00:1450:400c:c05::232]) by mx.google.com with ESMTPS id p11si14386428wik.60.2015.08.22.23.04.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 22 Aug 2015 23:04:47 -0700 (PDT) Received: by wicja10 with SMTP id ja10so45320411wic.1 for ; Sat, 22 Aug 2015 23:04:47 -0700 (PDT) Date: Sun, 23 Aug 2015 08:04:43 +0200 From: Ingo Molnar Subject: Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics Message-ID: <20150823060443.GA9882@gmail.com> References: <20150823044839.5727.qmail@ns.horizon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150823044839.5727.qmail@ns.horizon.com> Sender: owner-linux-mm@kvack.org List-ID: To: George Spelvin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, torvalds@linux-foundation.org, Dave Hansen , Peter Zijlstra , David Rientjes , Rik van Riel , Rasmus Villemoes * George Spelvin wrote: > Linus wrote: > > I don't think any of this can be called "correct", in that the > > unlocked accesses to the cached state are clearly racy, but I think > > it's very much "acceptable". > > I'd think you could easily fix that with a seqlock-like system. > > What makes it so simple is that you can always fall back to > calc_vmalloc_info if there's any problem, rather than looping or blocking. > > The basic idea is that you have a seqlock counter, but if either of > the two lsbits are set, the cached information is stale. > > Basically, you need a seqlock and a spinlock. The seqlock does > most of the work, and the spinlock ensures that there's only one > updater of the cache. > > vmap_unlock() does set_bit(0, &seq->sequence). This marks the information > as stale. > > get_vmalloc_info reads the seqlock. There are two case: > - If the two lsbits are 10, the cached information is valid. > Copy it out, re-check the seqlock, and loop if the sequence > number changes. > - In any other case, the cached information is > not valid. > - Try to obtain the spinlock. Do not block if it's unavailable. > - If unavailable, do not block. > - If the lock is acquired: > - Set the sequence to (sequence | 3) + 1 (we're the only writer) > - This bumps the sequence number and leaves the lsbits at 00 (invalid) > - Memory barrier TBD. Do the RCU ops in calc_vmalloc_info do it for us? > - Call calc_vmalloc_info > - If we obtained the spinlock earlier: > - Copy our vmi to cached_info > - smp_wmb() > - set_bit(1, &seq->sequence). This marks the information as valid, > as long as bit 0 is still clear. > - Release the spinlock. > > Basically, bit 0 says "vmalloc info has changed", and bit 1 says > "vmalloc cache has been updated". This clears bit 0 before > starting the update so that an update during calc_vmalloc_info > will force a new update. > > So the three case are basically: > 00 - calc_vmalloc_info() in progress > 01 - vmap_unlock() during calc_vmalloc_info() > 10 - cached_info is valid > 11 - vmap_unlock has invalidated cached_info, awaiting refresh > > Logically, the sequence number should be initialized to ...01, > but the code above handles 00 okay. I think this is too complex. How about something simple like the patch below (on top of the third patch)? It makes the vmalloc info transactional - /proc/meminfo will always print a consistent set of numbers. (Not that we really care about races there, but it looks really simple to solve so why not.) ( I also moved the function-static cache next to the flag and seqlock - this should further compress the cache footprint. ) Have I missed anything? Very lightly tested: booted in a VM. Thanks, Ingo =========================> mm/vmalloc.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ef48e557df5a..66726f41e726 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -278,7 +278,15 @@ EXPORT_SYMBOL(vmalloc_to_pfn); static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock); +/* + * Seqlock and flag for the vmalloc info cache printed in /proc/meminfo. + * + * The assumption of the optimization is that it's read frequently, but + * modified infrequently. + */ +static DEFINE_SEQLOCK(vmap_info_lock); static int vmap_info_changed; +static struct vmalloc_info vmap_info_cache; static inline void vmap_lock(void) { @@ -2752,10 +2760,14 @@ static void calc_vmalloc_info(struct vmalloc_info *vmi) void get_vmalloc_info(struct vmalloc_info *vmi) { - static struct vmalloc_info cached_info; + if (!READ_ONCE(vmap_info_changed)) { + unsigned int seq; + + do { + seq = read_seqbegin(&vmap_info_lock); + *vmi = vmap_info_cache; + } while (read_seqretry(&vmap_info_lock, seq)); - if (!vmap_info_changed) { - *vmi = cached_info; return; } @@ -2764,8 +2776,9 @@ void get_vmalloc_info(struct vmalloc_info *vmi) calc_vmalloc_info(vmi); - barrier(); - cached_info = *vmi; + write_seqlock(&vmap_info_lock); + vmap_info_cache = *vmi; + write_sequnlock(&vmap_info_lock); } #endif -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f180.google.com (mail-yk0-f180.google.com [209.85.160.180]) by kanga.kvack.org (Postfix) with ESMTP id D87106B0038 for ; Sun, 23 Aug 2015 02:46:05 -0400 (EDT) Received: by ykfw73 with SMTP id w73so107532102ykf.3 for ; Sat, 22 Aug 2015 23:46:05 -0700 (PDT) Received: from ns.horizon.com (ns.horizon.com. [71.41.210.147]) by mx.google.com with SMTP id e125si8200783ywf.83.2015.08.22.23.46.04 for ; Sat, 22 Aug 2015 23:46:05 -0700 (PDT) Date: 23 Aug 2015 02:46:03 -0400 Message-ID: <20150823064603.14050.qmail@ns.horizon.com> From: "George Spelvin" Subject: Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics In-Reply-To: <20150823060443.GA9882@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux@horizon.com, mingo@kernel.org Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Ingo Molnar wrote: > I think this is too complex. > > How about something simple like the patch below (on top of the third patch)? > It makes the vmalloc info transactional - /proc/meminfo will always print a > consistent set of numbers. (Not that we really care about races there, but it > looks really simple to solve so why not.) Looks like a huge simplification! It needs a comment about the approximate nature of the locking and the obvious race conditions: 1) The first caller to get_vmalloc_info() clears vmap_info_changed before updating vmap_info_cache, so a second caller is likely to get stale data for the duration of a calc_vmalloc_info call. 2) Although unlikely, it's possible for two threads to race calling calc_vmalloc_info, and the one that computes fresher data updates the cache first, so the later write leaves stale data. Other issues: 3) Me, I'd make vmap_info_changed a bool, for documentation more than any space saving. 4) I wish there were a trylock version of write_seqlock, so we could avoid blocking entirely. (You *could* hand-roll it, but that eats into the simplicity.) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f169.google.com (mail-wi0-f169.google.com [209.85.212.169]) by kanga.kvack.org (Postfix) with ESMTP id 4BE606B0038 for ; Sun, 23 Aug 2015 04:17:57 -0400 (EDT) Received: by wicja10 with SMTP id ja10so46412813wic.1 for ; Sun, 23 Aug 2015 01:17:56 -0700 (PDT) Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com. [2a00:1450:400c:c05::22e]) by mx.google.com with ESMTPS id gs3si14837682wib.29.2015.08.23.01.17.54 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 23 Aug 2015 01:17:55 -0700 (PDT) Received: by widdq5 with SMTP id dq5so46680437wid.0 for ; Sun, 23 Aug 2015 01:17:54 -0700 (PDT) Date: Sun, 23 Aug 2015 10:17:51 +0200 From: Ingo Molnar Subject: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150823081750.GA28349@gmail.com> References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150823064603.14050.qmail@ns.horizon.com> Sender: owner-linux-mm@kvack.org List-ID: To: George Spelvin Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org * George Spelvin wrote: > Ingo Molnar wrote: > > I think this is too complex. > > > > How about something simple like the patch below (on top of the third patch)? > > > It makes the vmalloc info transactional - /proc/meminfo will always print a > > consistent set of numbers. (Not that we really care about races there, but it > > looks really simple to solve so why not.) > > Looks like a huge simplification! > > It needs a comment about the approximate nature of the locking and > the obvious race conditions: > 1) The first caller to get_vmalloc_info() clears vmap_info_changed > before updating vmap_info_cache, so a second caller is likely to > get stale data for the duration of a calc_vmalloc_info call. > 2) Although unlikely, it's possible for two threads to race calling > calc_vmalloc_info, and the one that computes fresher data updates > the cache first, so the later write leaves stale data. > > Other issues: > 3) Me, I'd make vmap_info_changed a bool, for documentation more than > any space saving. > 4) I wish there were a trylock version of write_seqlock, so we could > avoid blocking entirely. (You *could* hand-roll it, but that eats > into the simplicity.) Ok, fair enough - so how about the attached approach instead, which uses a 64-bit generation counter to track changes to the vmalloc state. This is still very simple, but should not suffer from stale data being returned indefinitely in /proc/meminfo. We might race - but that was true before as well due to the lock-less RCU list walk - but we'll always return a correct and consistent version of the information. Lightly tested. This is a replacement patch to make it easier to read via email. I also made sure there's no extra overhead in the !CONFIG_PROC_FS case. Note that there's an even simpler variant possible I think: we could use just the two generation counters and barriers to remove the seqlock. Thanks, Ingo ==============================> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f51.google.com (mail-la0-f51.google.com [209.85.215.51]) by kanga.kvack.org (Postfix) with ESMTP id ADFD86B0038 for ; Sun, 23 Aug 2015 16:53:34 -0400 (EDT) Received: by lalv9 with SMTP id v9so66232821lal.0 for ; Sun, 23 Aug 2015 13:53:33 -0700 (PDT) Received: from mail-la0-x22d.google.com (mail-la0-x22d.google.com. [2a00:1450:4010:c03::22d]) by mx.google.com with ESMTPS id tk5si11445973lbb.14.2015.08.23.13.53.31 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 23 Aug 2015 13:53:32 -0700 (PDT) Received: by lalv9 with SMTP id v9so66232563lal.0 for ; Sun, 23 Aug 2015 13:53:31 -0700 (PDT) From: Rasmus Villemoes Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> Date: Sun, 23 Aug 2015 22:53:28 +0200 In-Reply-To: <20150823081750.GA28349@gmail.com> (Ingo Molnar's message of "Sun, 23 Aug 2015 10:17:51 +0200") Message-ID: <87lhd1wwtz.fsf@rasmusvillemoes.dk> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org On Sun, Aug 23 2015, Ingo Molnar wrote: > Ok, fair enough - so how about the attached approach instead, which > uses a 64-bit generation counter to track changes to the vmalloc > state. How does this invalidation approach compare to the jiffies approach? In other words, how often does the vmalloc info actually change (or rather, in this approximation, how often is vmap_area_lock taken)? In particular, does it also solve the problem with git's test suite and similar situations with lots of short-lived processes? > ==============================> > From f9fd770e75e2edb4143f32ced0b53d7a77969c94 Mon Sep 17 00:00:00 2001 > From: Ingo Molnar > Date: Sat, 22 Aug 2015 12:28:01 +0200 > Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info > > Linus reported that glibc (rather stupidly) reads /proc/meminfo > for every sysinfo() call, Not quite: It is done by the two functions get_{av,}phys_pages functions; and get_phys_pages is called (once per process) by glibc's qsort implementation. In fact, sysinfo() is (at least part of) the cure, not the disease. Whether qsort should care about the total amount of memory is another discussion. Rasmus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) by kanga.kvack.org (Postfix) with ESMTP id 4A89F6B0038 for ; Sun, 23 Aug 2015 17:56:31 -0400 (EDT) Received: by labia3 with SMTP id ia3so2291955lab.3 for ; Sun, 23 Aug 2015 14:56:30 -0700 (PDT) Received: from mail-la0-x22f.google.com (mail-la0-x22f.google.com. [2a00:1450:4010:c03::22f]) by mx.google.com with ESMTPS id qo3si11533813lbb.122.2015.08.23.14.56.28 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 23 Aug 2015 14:56:28 -0700 (PDT) Received: by lalv9 with SMTP id v9so66616794lal.0 for ; Sun, 23 Aug 2015 14:56:28 -0700 (PDT) From: Rasmus Villemoes Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> Date: Sun, 23 Aug 2015 23:56:24 +0200 In-Reply-To: <20150823081750.GA28349@gmail.com> (Ingo Molnar's message of "Sun, 23 Aug 2015 10:17:51 +0200") Message-ID: <87h9npwtx3.fsf@rasmusvillemoes.dk> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org I was curious why these fields were ever added to /proc/meminfo, and dug up this: commit d262ee3ee6ba4f5f6125571d93d9d63191d2ef76 Author: Andrew Morton Date: Sat Apr 12 12:59:04 2003 -0700 [PATCH] vmalloc stats in /proc/meminfo From: Matt Porter There was a thread a while back on lkml where Dave Hansen proposed this simple vmalloc usage reporting patch. The thread pretty much died out as most people seemed focused on what VM loading type bugs it could solve. I had posted that this type of information was really valuable in debugging embedded Linux board ports. A common example is where people do arch specific setup that limits there vmalloc space and then they find modules won't load. ;) Having the Vmalloc* info readily available is real useful in helping folks to fix their kernel ports. That thread is at . [Maybe one could just remove the fields and see if anybody actually notices/cares any longer. Or, if they are only used by kernel developers, put them in their own file.] Rasmus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f174.google.com (mail-yk0-f174.google.com [209.85.160.174]) by kanga.kvack.org (Postfix) with ESMTP id 4375C6B0038 for ; Sun, 23 Aug 2015 21:04:06 -0400 (EDT) Received: by ykdt205 with SMTP id t205so120663735ykd.1 for ; Sun, 23 Aug 2015 18:04:06 -0700 (PDT) Received: from ns.horizon.com (ns.horizon.com. [71.41.210.147]) by mx.google.com with SMTP id c190si9391378ywa.73.2015.08.23.18.04.04 for ; Sun, 23 Aug 2015 18:04:05 -0700 (PDT) Date: 23 Aug 2015 21:04:03 -0400 Message-ID: <20150824010403.27903.qmail@ns.horizon.com> From: "George Spelvin" Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info In-Reply-To: <20150823081750.GA28349@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux@horizon.com, mingo@kernel.org Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org First, an actual, albeit minor, bug: initializing both vmap_info_gen and vmap_info_cache_gen to 0 marks the cache as valid, which it's not. vmap_info_gen should be initialized to 1 to force an initial cache update. Second, I don't see why you need a 64-bit counter. Seqlocks consider 32 bits (31 bits, actually, the lsbit means "update in progress") quite a strong enough guarantee. Third, it seems as though vmap_info_cache_gen is basically a duplicate of vmap_info_lock.sequence. It should be possible to make one variable serve both purposes. You just need a kludge to handle the case of multiple vamp_info updates between cache updates. There are two simple ones: 1) Avoid bumping vmap_info_gen unnecessarily. In vmap_unlock(), do vmap_info_gen = (vmap_info_lock.sequence | 1) + 1; 2) - Make vmap_info_gen a seqcount_t - In vmap_unlock(), do write_seqcount_barrier(&vmap_info_gen) - In get_vmalloc_info, inside the seqlock critical section, do vmap_info_lock.seqcount.sequence = vmap_info_gen.sequence - 1; (Using the vmap_info_gen.sequence read while validating the cache in the first place.) I should try to write an actual patch illustrating this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) by kanga.kvack.org (Postfix) with ESMTP id 8871C6B0038 for ; Mon, 24 Aug 2015 02:58:14 -0400 (EDT) Received: by wicne3 with SMTP id ne3so62460476wic.0 for ; Sun, 23 Aug 2015 23:58:14 -0700 (PDT) Received: from mail-wi0-x22d.google.com (mail-wi0-x22d.google.com. [2a00:1450:400c:c05::22d]) by mx.google.com with ESMTPS id f5si19767479wiz.82.2015.08.23.23.58.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 23 Aug 2015 23:58:13 -0700 (PDT) Received: by wijp15 with SMTP id p15so67651966wij.0 for ; Sun, 23 Aug 2015 23:58:12 -0700 (PDT) Date: Mon, 24 Aug 2015 08:58:09 +0200 From: Ingo Molnar Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150824065809.GA13082@gmail.com> References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> <87lhd1wwtz.fsf@rasmusvillemoes.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87lhd1wwtz.fsf@rasmusvillemoes.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Rasmus Villemoes Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org * Rasmus Villemoes wrote: > On Sun, Aug 23 2015, Ingo Molnar wrote: > > > Ok, fair enough - so how about the attached approach instead, which > > uses a 64-bit generation counter to track changes to the vmalloc > > state. > > How does this invalidation approach compare to the jiffies approach? In > other words, how often does the vmalloc info actually change (or rather, > in this approximation, how often is vmap_area_lock taken)? In > particular, does it also solve the problem with git's test suite and > similar situations with lots of short-lived processes? The two approaches are pretty similar, and in a typical distro with typical workload vmalloc() is mostly a boot time affair. But vmalloc() can be used more often in certain corner cases - neither of the patches makes that in any way slower, just the optimization won't trigger that often. Since vmalloc() use is suboptimal for several reasons (it does not use large pages for kernel space allocations, etc.), this is all pretty OK IMHO. > > ==============================> > > From f9fd770e75e2edb4143f32ced0b53d7a77969c94 Mon Sep 17 00:00:00 2001 > > From: Ingo Molnar > > Date: Sat, 22 Aug 2015 12:28:01 +0200 > > Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info > > > > Linus reported that glibc (rather stupidly) reads /proc/meminfo > > for every sysinfo() call, > > Not quite: It is done by the two functions get_{av,}phys_pages > functions; and get_phys_pages is called (once per process) by glibc's > qsort implementation. In fact, sysinfo() is (at least part of) the cure, > not the disease. Whether qsort should care about the total amount of > memory is another discussion. > > Thanks, is the fixed up changelog below better? Ingo ===============> mm/vmalloc: Cache the vmalloc memory info Linus reported that for scripting-intense workloads such as the Git build, glibc's qsort will read /proc/meminfo for every process created (by way of get_phys_pages()), which causes the Git build to generate a surprising amount of kernel overhead. A fair chunk of the overhead is due to get_vmalloc_info() - which walks a potentially long list to do its statistics. Modify Linus's jiffies based patch to use generation counters to cache the vmalloc info: vmap_unlock() increases the generation counter, and the get_vmalloc_info() reads it and compares it against a cached generation counter. Also use a seqlock to make sure we always print a consistent set of vmalloc statistics. Reported-by: Linus Torvalds Cc: Andrew Morton Cc: Rik van Riel Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com [209.85.212.175]) by kanga.kvack.org (Postfix) with ESMTP id 9CFFE6B0254 for ; Mon, 24 Aug 2015 03:00:06 -0400 (EDT) Received: by wicja10 with SMTP id ja10so62308118wic.1 for ; Mon, 24 Aug 2015 00:00:06 -0700 (PDT) Received: from mail-wi0-x233.google.com (mail-wi0-x233.google.com. [2a00:1450:400c:c05::233]) by mx.google.com with ESMTPS id f8si19770945wiz.88.2015.08.24.00.00.04 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Aug 2015 00:00:05 -0700 (PDT) Received: by wicja10 with SMTP id ja10so62692471wic.1 for ; Mon, 24 Aug 2015 00:00:04 -0700 (PDT) Date: Mon, 24 Aug 2015 09:00:01 +0200 From: Ingo Molnar Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150824070001.GB13082@gmail.com> References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> <87h9npwtx3.fsf@rasmusvillemoes.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87h9npwtx3.fsf@rasmusvillemoes.dk> Sender: owner-linux-mm@kvack.org List-ID: To: Rasmus Villemoes Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org * Rasmus Villemoes wrote: > I was curious why these fields were ever added to /proc/meminfo, and dug > up this: > > commit d262ee3ee6ba4f5f6125571d93d9d63191d2ef76 > Author: Andrew Morton > Date: Sat Apr 12 12:59:04 2003 -0700 > > [PATCH] vmalloc stats in /proc/meminfo > > From: Matt Porter > > There was a thread a while back on lkml where Dave Hansen proposed this > simple vmalloc usage reporting patch. The thread pretty much died out as > most people seemed focused on what VM loading type bugs it could solve. I > had posted that this type of information was really valuable in debugging > embedded Linux board ports. A common example is where people do arch > specific setup that limits there vmalloc space and then they find modules > won't load. ;) Having the Vmalloc* info readily available is real useful in > helping folks to fix their kernel ports. > > That thread is at . > > [Maybe one could just remove the fields and see if anybody actually > notices/cares any longer. Or, if they are only used by kernel > developers, put them in their own file.] So instead of removing the fields (which I'm quite sure is an ABI breaker as it could break less robust /proc/meminfo parsers and scripts), we could just report '0' all the time - and have the real info somewhere else? Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by kanga.kvack.org (Postfix) with ESMTP id 2A2D66B0038 for ; Mon, 24 Aug 2015 03:34:28 -0400 (EDT) Received: by wijp15 with SMTP id p15so68484601wij.0 for ; Mon, 24 Aug 2015 00:34:27 -0700 (PDT) Received: from mail-wi0-x230.google.com (mail-wi0-x230.google.com. [2a00:1450:400c:c05::230]) by mx.google.com with ESMTPS id i10si19923089wix.56.2015.08.24.00.34.26 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Aug 2015 00:34:26 -0700 (PDT) Received: by wicja10 with SMTP id ja10so63069221wic.1 for ; Mon, 24 Aug 2015 00:34:26 -0700 (PDT) Date: Mon, 24 Aug 2015 09:34:22 +0200 From: Ingo Molnar Subject: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150824073422.GC13082@gmail.com> References: <20150823081750.GA28349@gmail.com> <20150824010403.27903.qmail@ns.horizon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150824010403.27903.qmail@ns.horizon.com> Sender: owner-linux-mm@kvack.org List-ID: To: George Spelvin Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org * George Spelvin wrote: > First, an actual, albeit minor, bug: initializing both vmap_info_gen > and vmap_info_cache_gen to 0 marks the cache as valid, which it's not. Ha! :-) Fixed. > vmap_info_gen should be initialized to 1 to force an initial > cache update. Yeah. > Second, I don't see why you need a 64-bit counter. Seqlocks consider > 32 bits (31 bits, actually, the lsbit means "update in progress") quite > a strong enough guarantee. Just out of general paranoia - but you are right, and this would lower the overhead on 32-bit SMP platforms a bit, plus it avoids 64-bit word tearing artifacts on 32 bit platforms as well. I modified it to u32. > Third, it seems as though vmap_info_cache_gen is basically a duplicate > of vmap_info_lock.sequence. It should be possible to make one variable > serve both purposes. Correct, I alluded to that in my description: > > Note that there's an even simpler variant possible I think: we could use just > > the two generation counters and barriers to remove the seqlock. > You just need a kludge to handle the case of multiple vamp_info updates > between cache updates. > > There are two simple ones: > > 1) Avoid bumping vmap_info_gen unnecessarily. In vmap_unlock(), do > vmap_info_gen = (vmap_info_lock.sequence | 1) + 1; > 2) - Make vmap_info_gen a seqcount_t > - In vmap_unlock(), do write_seqcount_barrier(&vmap_info_gen) > - In get_vmalloc_info, inside the seqlock critical section, do > vmap_info_lock.seqcount.sequence = vmap_info_gen.sequence - 1; > (Using the vmap_info_gen.sequence read while validating the > cache in the first place.) > > I should try to write an actual patch illustrating this. So I think something like the patch below is even simpler than trying to kludge generation counter semantics into seqcounts. I used two generation counters and a spinlock. The fast path is completely lockless and lightweight on modern SMP platforms. (where smp_rmb() is a no-op or very cheap.) There's not even a seqlock retry loop, instead an invalid cache causes us to fall back to the old behavior - and the freshest result is guaranteed to end up in the cache. The linecount got a bit larger: but half of it is comments. Note that the generation counters are signed integers so that this comparison can be done: + if (gen-vmap_info_cache_gen > 0) { Thanks, Ingo ======================> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) by kanga.kvack.org (Postfix) with ESMTP id 13FDC6B0038 for ; Mon, 24 Aug 2015 03:47:19 -0400 (EDT) Received: by wicja10 with SMTP id ja10so63368169wic.1 for ; Mon, 24 Aug 2015 00:47:18 -0700 (PDT) Received: from mail-wi0-x22f.google.com (mail-wi0-x22f.google.com. [2a00:1450:400c:c05::22f]) by mx.google.com with ESMTPS id ff10si30575045wjc.32.2015.08.24.00.47.17 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Aug 2015 00:47:17 -0700 (PDT) Received: by widdq5 with SMTP id dq5so41206461wid.1 for ; Mon, 24 Aug 2015 00:47:17 -0700 (PDT) Date: Mon, 24 Aug 2015 09:47:14 +0200 From: Ingo Molnar Subject: Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150824074714.GA20106@gmail.com> References: <20150823081750.GA28349@gmail.com> <20150824010403.27903.qmail@ns.horizon.com> <20150824073422.GC13082@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150824073422.GC13082@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: George Spelvin Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org * Ingo Molnar wrote: > +/* > + * Return a consistent snapshot of the current vmalloc allocation > + * statistics, for /proc/meminfo: > + */ > +void get_vmalloc_info(struct vmalloc_info *vmi) > +{ > + int gen = READ_ONCE(vmap_info_gen); > + > + /* > + * If the generation counter of the cache matches that of > + * the vmalloc generation counter then return the cache: > + */ > + if (READ_ONCE(vmap_info_cache_gen) == gen) { > + int gen_after; > + > + /* > + * The two read barriers make sure that we read > + * 'gen', 'vmap_info_cache' and 'gen_after' in > + * precisely that order: > + */ > + smp_rmb(); > + *vmi = vmap_info_cache; > + > + smp_rmb(); > + gen_after = READ_ONCE(vmap_info_gen); > + > + /* The cache is still valid: */ > + if (gen == gen_after) > + return; > + > + /* Ok, the cache got invalidated just now, regenerate it */ > + gen = gen_after; > + } One more detail: I just realized that with the read barriers, the READ_ONCE() accesses are not needed anymore - the barriers and the control dependencies are enough. This will further simplify the code. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f177.google.com (mail-wi0-f177.google.com [209.85.212.177]) by kanga.kvack.org (Postfix) with ESMTP id 422F06B0038 for ; Mon, 24 Aug 2015 03:50:23 -0400 (EDT) Received: by wicja10 with SMTP id ja10so63438673wic.1 for ; Mon, 24 Aug 2015 00:50:22 -0700 (PDT) Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com. [2a00:1450:400c:c05::236]) by mx.google.com with ESMTPS id q7si20018725wiz.8.2015.08.24.00.50.21 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Aug 2015 00:50:22 -0700 (PDT) Received: by wicne3 with SMTP id ne3so63636904wic.0 for ; Mon, 24 Aug 2015 00:50:21 -0700 (PDT) Date: Mon, 24 Aug 2015 09:50:18 +0200 From: Ingo Molnar Subject: [PATCH 3/3 v5] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150824075018.GB20106@gmail.com> References: <20150823081750.GA28349@gmail.com> <20150824010403.27903.qmail@ns.horizon.com> <20150824073422.GC13082@gmail.com> <20150824074714.GA20106@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150824074714.GA20106@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: George Spelvin Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org * Ingo Molnar wrote: > One more detail: I just realized that with the read barriers, the READ_ONCE() > accesses are not needed anymore - the barriers and the control dependencies are > enough. > > This will further simplify the code. I.e. something like the updated patch below. (We still need the WRITE_ONCE() for vmap_info_gen update.) Thanks, Ingo ========================> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f178.google.com (mail-lb0-f178.google.com [209.85.217.178]) by kanga.kvack.org (Postfix) with ESMTP id 813526B0038 for ; Mon, 24 Aug 2015 04:39:23 -0400 (EDT) Received: by lbbpu9 with SMTP id pu9so75275494lbb.3 for ; Mon, 24 Aug 2015 01:39:22 -0700 (PDT) Received: from mail-la0-x229.google.com (mail-la0-x229.google.com. [2a00:1450:4010:c03::229]) by mx.google.com with ESMTPS id h6si12581099lbz.7.2015.08.24.01.39.20 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Aug 2015 01:39:21 -0700 (PDT) Received: by labia3 with SMTP id ia3so8288149lab.3 for ; Mon, 24 Aug 2015 01:39:20 -0700 (PDT) From: Rasmus Villemoes Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> <87lhd1wwtz.fsf@rasmusvillemoes.dk> <20150824065809.GA13082@gmail.com> Date: Mon, 24 Aug 2015 10:39:17 +0200 In-Reply-To: <20150824065809.GA13082@gmail.com> (Ingo Molnar's message of "Mon, 24 Aug 2015 08:58:09 +0200") Message-ID: <877fol5bd6.fsf@rasmusvillemoes.dk> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org On Mon, Aug 24 2015, Ingo Molnar wrote: > > Thanks, is the fixed up changelog below better? > Yes, though Linus specifically referred to "make test" (but I guess one could/should consider that part of the build process). Rasmus > mm/vmalloc: Cache the vmalloc memory info > > Linus reported that for scripting-intense workloads such as the > Git build, glibc's qsort will read /proc/meminfo for every process > created (by way of get_phys_pages()), which causes the Git build > to generate a surprising amount of kernel overhead. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f174.google.com (mail-yk0-f174.google.com [209.85.160.174]) by kanga.kvack.org (Postfix) with ESMTP id D75B06B0254 for ; Mon, 24 Aug 2015 08:54:04 -0400 (EDT) Received: by ykfw73 with SMTP id w73so133276244ykf.3 for ; Mon, 24 Aug 2015 05:54:04 -0700 (PDT) Received: from ns.horizon.com (ns.horizon.com. [71.41.210.147]) by mx.google.com with SMTP id j189si10131863ykj.32.2015.08.24.05.54.03 for ; Mon, 24 Aug 2015 05:54:03 -0700 (PDT) Date: 24 Aug 2015 08:54:02 -0400 Message-ID: <20150824125402.28806.qmail@ns.horizon.com> From: "George Spelvin" Subject: Re: [PATCH 3/3 v5] mm/vmalloc: Cache the vmalloc memory info In-Reply-To: <20150824075018.GB20106@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux@horizon.com, mingo@kernel.org Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org (I hope I'm not annoying you by bikeshedding this too much, although I think this is improving.) You've sort of re-invented spinlocks, but after thinking a bit, it all works. Rather than using a single word, which is incremented to an odd number at the start of an update and an even number at the end, there are two. An update is in progress when they're unequal. vmap_info_gen is incremented early, when the cache needs updating, and read late (after the cache is copied). vmap_info_cache_gen is incremented after the cache is updated, and read early (before the cache is copied). This is logically equivalent to my complicated scheme with atomic updates to various bits in a single generation word, but greatly simplified by having two separate words. In particular, there's no longer a need to distinguish "vmap has updated list" from "calc_vmalloc_info in progress". I particularly like the "gen - vmap_info_cache_gen > 0" test. You *must* test for inequality to prevent tearing of a valid cache (...grr...English heteronyms...), and given that, might as well require it be fresher. Anyway, suggested changes for v6 (sigh...): First: you do a second read of vmap_info_gen to optimize out the copy of vmalloc_info if it's easily seen as pointless, but given how small vmalloc_info is (two words!), i'd be inclined to omit that optimization. Copy always, *then* see if it's worth keeping. Smaller code, faster fast path, and is barely noticeable on the slow path. Second, and this is up to you, I'd be inclined to go fully non-blocking and only spin_trylock(). If that fails, just skip the cache update. Third, ANSI C rules allow a compiler to assume that signed integer overflow does not occur. That means that gcc is allowed to optimize "if (x - y > 0)" to "if (x > y)". Given that gcc has annoyed us by using this optimization in other contexts, It might be safer to make them unsigned (which is required to wrap properly) and cast to integer after subtraction. Basically, the following (untested, but pretty damn simple): +/* + * Return a consistent snapshot of the current vmalloc allocation + * statistics, for /proc/meminfo: + */ +void get_vmalloc_info(struct vmalloc_info *vmi) +{ + unsigned gen, cache_gen = READ_ONCE(vmap_info_cache_gen); + + /* + * The two read barriers make sure that we read + * 'cache_gen', 'vmap_info_cache' and 'gen' in + * precisely that order: + */ + smp_rmb(); + *vmi = vmap_info_cache; + + smp_rmb(); + gen = READ_ONCE(vmap_info_gen); + + /* + * If the generation counter of the cache matches that of + * the vmalloc generation counter then return the cache: + */ + if (gen == cache_gen) + return; + + /* Make sure 'gen' is read before the vmalloc info */ + smp_rmb(); + calc_vmalloc_info(vmi); + + /* + * All updates to vmap_info_cache_gen go through this spinlock, + * so when the cache got invalidated, we'll only mark it valid + * again if we first fully write the new vmap_info_cache. + * + * This ensures that partial results won't be used. + */ + if (spin_trylock(&vmap_info_lock)) { + if ((int)(gen - vmap_info_cache_gen) > 0) { + vmap_info_cache = *vmi; + /* + * Make sure the new cached data is visible before + * the generation counter update: + */ + smp_wmb(); + WRITE_ONCE(vmap_info_cache_gen, gen); + } + spin_unlock(&vmap_info_lock); + } +} + +#endif /* CONFIG_PROC_FS */ The only remaining *very small* nit is that this function is a mix of "return early" and "wrap it in an if()" style. If you want to make that "if (!spin_trylock(...)) return;", I leave that you your esthetic judgement. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com [209.85.212.175]) by kanga.kvack.org (Postfix) with ESMTP id C2A536B0038 for ; Mon, 24 Aug 2015 09:12:23 -0400 (EDT) Received: by wijp15 with SMTP id p15so77446467wij.0 for ; Mon, 24 Aug 2015 06:12:23 -0700 (PDT) Received: from mailrelay.lanline.com (mailrelay.lanline.com. [216.187.10.16]) by mx.google.com with ESMTPS id e8si21411133wiz.64.2015.08.24.06.12.21 for (version=TLS1 cipher=RC4-SHA bits=128/128); Mon, 24 Aug 2015 06:12:22 -0700 (PDT) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <21979.6150.929309.800457@quad.stoffel.home> Date: Mon, 24 Aug 2015 09:11:34 -0400 From: "John Stoffel" Subject: Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info In-Reply-To: <20150824073422.GC13082@gmail.com> References: <20150823081750.GA28349@gmail.com> <20150824010403.27903.qmail@ns.horizon.com> <20150824073422.GC13082@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org >>>>> "Ingo" == Ingo Molnar writes: Ingo> * George Spelvin wrote: >> First, an actual, albeit minor, bug: initializing both vmap_info_gen >> and vmap_info_cache_gen to 0 marks the cache as valid, which it's not. Ingo> Ha! :-) Fixed. >> vmap_info_gen should be initialized to 1 to force an initial >> cache update. Blech, it should be initialized with a proper #define VMAP_CACHE_NEEDS_UPDATE 1, instead of more magic numbers. Ingo> + */ Ingo> +static DEFINE_SPINLOCK(vmap_info_lock); Ingo> +static int vmap_info_gen = 1; static int vmap_info_gen = VMAP_CACHE_NEEDS_UPDATE; Ingo> +static int vmap_info_cache_gen; Ingo> +static struct vmalloc_info vmap_info_cache; Ingo> +#endif This will help keep bugs like this out in the future... I hope! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f170.google.com (mail-qk0-f170.google.com [209.85.220.170]) by kanga.kvack.org (Postfix) with ESMTP id 5F1C56B0254 for ; Mon, 24 Aug 2015 11:11:16 -0400 (EDT) Received: by qkch123 with SMTP id h123so64829115qkc.0 for ; Mon, 24 Aug 2015 08:11:16 -0700 (PDT) Received: from ns.horizon.com (ns.horizon.com. [71.41.210.147]) by mx.google.com with SMTP id u205si10370676ywa.76.2015.08.24.08.11.15 for ; Mon, 24 Aug 2015 08:11:15 -0700 (PDT) Date: 24 Aug 2015 11:11:14 -0400 Message-ID: <20150824151114.18743.qmail@ns.horizon.com> From: "George Spelvin" Subject: Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info In-Reply-To: <21979.6150.929309.800457@quad.stoffel.home> Sender: owner-linux-mm@kvack.org List-ID: To: john@stoffel.org, mingo@kernel.org Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@horizon.com, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org John Stoffel wrote: >> vmap_info_gen should be initialized to 1 to force an initial >> cache update. > Blech, it should be initialized with a proper #define > VMAP_CACHE_NEEDS_UPDATE 1, instead of more magic numbers. Er... this is a joke, right? First, this number is used exactly once, and it's not part of a collection of similar numbers. And the definition would be adjacent to the use. We have easier ways of accomplishing that, called "comments". Second, your proposed name is misleading. "needs update" is defined as vmap_info_gen != vmap_info_cache_gen. There is no particular value of either that has this meaning. For example, initializing vmap_info_cache_gen to -1 would do just as well. (I actually considered that before deciding that +1 was "simpler" than -1.) For some versions of the code, an *arbitrary* difference is okay. You could set one ot 0xDEADBEEF and the other to 0xFEEDFACE. For other versions, the magnitude matters, but not *too* much. Initializing it to 42 would be perfectly correct, but waste time doing 42 cache updates before settling down. Singling out the value 1 as VMAP_CACHE_NEEDS_UPDATE is actively misleading. > This will help keep bugs like this out in the future... I hope! And this is the punchline, right? The problem was not realizing that non-default initialization was required; what we *call* the non-default value is irrelevant. I doubt it would ever have been a real (i.e. noticeable) bug, actually; the first bit of vmap activity in very early boot would have invalidated the cache. (John, my apologies if I went over the top and am contributing to LKML's reputation for flaming. I *did* actually laugh, and *do* think it's a dumb idea, but my annoyance is really directed at unpleasant memories of mindless application of coding style guidelines. In this case, I suspect you just posted before reading carefully enough to see the subtle logic.) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by kanga.kvack.org (Postfix) with ESMTP id 77D096B0038 for ; Mon, 24 Aug 2015 11:56:24 -0400 (EDT) Received: by widdq5 with SMTP id dq5so54690824wid.1 for ; Mon, 24 Aug 2015 08:56:23 -0700 (PDT) Received: from mailrelay.lanline.com (mailrelay.lanline.com. [216.187.10.16]) by mx.google.com with ESMTPS id lo6si22241893wic.41.2015.08.24.08.56.22 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 24 Aug 2015 08:56:23 -0700 (PDT) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <21979.15999.82965.295320@quad.stoffel.home> Date: Mon, 24 Aug 2015 11:55:43 -0400 From: "John Stoffel" Subject: Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info In-Reply-To: <20150824151114.18743.qmail@ns.horizon.com> References: <21979.6150.929309.800457@quad.stoffel.home> <20150824151114.18743.qmail@ns.horizon.com> Sender: owner-linux-mm@kvack.org List-ID: To: George Spelvin Cc: john@stoffel.org, mingo@kernel.org, dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org George> John Stoffel wrote: >>> vmap_info_gen should be initialized to 1 to force an initial >>> cache update. >> Blech, it should be initialized with a proper #define >> VMAP_CACHE_NEEDS_UPDATE 1, instead of more magic numbers. George> Er... this is a joke, right? Not really. The comment made before was that by setting this variable to zero, it wasn't properly initialized. Which implies that either the API is wrong... or we should be documenting it better. I just went in the direction of the #define instead of a comment. George> First, this number is used exactly once, and it's not part of George> a collection of similar numbers. And the definition would be George> adjacent to the use. George> We have easier ways of accomplishing that, called "comments". Sure, that would be the better solution in this case. George> Second, your proposed name is misleading. "needs update" is defined George> as vmap_info_gen != vmap_info_cache_gen. There is no particular value George> of either that has this meaning. George> For example, initializing vmap_info_cache_gen to -1 would do just as well. George> (I actually considered that before deciding that +1 was "simpler" than -1.) See, I just threw out a dumb suggestion without reading the patch properly. My fault. George> (John, my apologies if I went over the top and am contributing to LKML's George> reputation for flaming. I *did* actually laugh, and *do* think it's a George> dumb idea, but my annoyance is really directed at unpleasant memories of George> mindless application of coding style guidelines. In this case, I suspect George> you just posted before reading carefully enough to see the subtle logic.) Nope, I'm in the wrong here. And your comment here is wonderful, I really do appreciate how you handled my ham fisted attempt to contribute. But I've got thick skin and I'll keep trying in my free time to comment on patches when I can. John -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f174.google.com (mail-wi0-f174.google.com [209.85.212.174]) by kanga.kvack.org (Postfix) with ESMTP id CA1F96B0254 for ; Tue, 25 Aug 2015 05:56:43 -0400 (EDT) Received: by wicja10 with SMTP id ja10so9721438wic.1 for ; Tue, 25 Aug 2015 02:56:43 -0700 (PDT) Received: from mail-wi0-x22d.google.com (mail-wi0-x22d.google.com. [2a00:1450:400c:c05::22d]) by mx.google.com with ESMTPS id b19si2312699wiw.16.2015.08.25.02.56.41 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Aug 2015 02:56:42 -0700 (PDT) Received: by wicja10 with SMTP id ja10so9673725wic.1 for ; Tue, 25 Aug 2015 02:56:41 -0700 (PDT) Date: Tue, 25 Aug 2015 11:56:38 +0200 From: Ingo Molnar Subject: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150825095638.GA24750@gmail.com> References: <20150824075018.GB20106@gmail.com> <20150824125402.28806.qmail@ns.horizon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150824125402.28806.qmail@ns.horizon.com> Sender: owner-linux-mm@kvack.org List-ID: To: George Spelvin Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org * George Spelvin wrote: > (I hope I'm not annoying you by bikeshedding this too much, although I > think this is improving.) [ I don't mind, although I wish other, more critical parts of the kernel got this much attention as well ;-) ] > Anyway, suggested changes for v6 (sigh...): > > First: you do a second read of vmap_info_gen to optimize out the copy > of vmalloc_info if it's easily seen as pointless, but given how small > vmalloc_info is (two words!), i'd be inclined to omit that optimization. > > Copy always, *then* see if it's worth keeping. Smaller code, faster > fast path, and is barely noticeable on the slow path. Ok, done. > Second, and this is up to you, I'd be inclined to go fully non-blocking and > only spin_trylock(). If that fails, just skip the cache update. So I'm not sure about this one: we have no guarantee of the order every updater reaches the spinlock, and we want the 'freshest' updater to do the update. The trylock might cause us to drop the 'freshest' update erroneously - so this change would introduce a 'stale data' bug I think. > Third, ANSI C rules allow a compiler to assume that signed integer > overflow does not occur. That means that gcc is allowed to optimize > "if (x - y > 0)" to "if (x > y)". That's annoying ... > Given that gcc has annoyed us by using this optimization in other > contexts, It might be safer to make them unsigned (which is required to > wrap properly) and cast to integer after subtraction. Ok, done. > Basically, the following (untested, but pretty damn simple): I've attached v6 which applies your first and last suggestion, but not the trylock one. I also removed _ONCE() accesses from the places that didn't need them. I added your Reviewed-by optimistically, saving a v7 submission hopefully ;-) Lightly tested. Thanks, Ingo ==============================> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f180.google.com (mail-yk0-f180.google.com [209.85.160.180]) by kanga.kvack.org (Postfix) with ESMTP id 79CA76B0254 for ; Tue, 25 Aug 2015 06:36:33 -0400 (EDT) Received: by ykbi184 with SMTP id i184so150968246ykb.2 for ; Tue, 25 Aug 2015 03:36:33 -0700 (PDT) Received: from ns.horizon.com (ns.horizon.com. [71.41.210.147]) by mx.google.com with SMTP id o20si7476286yke.178.2015.08.25.03.36.32 for ; Tue, 25 Aug 2015 03:36:32 -0700 (PDT) Date: 25 Aug 2015 06:36:30 -0400 Message-ID: <20150825103630.26398.qmail@ns.horizon.com> From: "George Spelvin" Subject: Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info In-Reply-To: <20150825095638.GA24750@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux@horizon.com, mingo@kernel.org Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org >> Second, and this is up to you, I'd be inclined to go fully non-blocking >> and only spin_trylock(). If that fails, just skip the cache update. > So I'm not sure about this one: we have no guarantee of the order every > updater reaches the spinlock, and we want the 'freshest' updater to do > the update. The trylock might cause us to drop the 'freshest' update > erroneously - so this change would introduce a 'stale data' bug I think. Er, no it wouldn't. If someone leaves the cache stale, they'll leave vmap_info_cache_gen != vmap_info_gen and the next reader to come along will see the cache is stale and refresh it. If there's lock contention, there's a risk of more work, because the callers fall back to calc_vmalloc_info rather than waiting. But it's not a big risk. With the blocking code, if two readers arrive simultaneously and see a stale cache, they'll both call calc_vmalloc_info() and then line up to update the cache. The second will get the lock but then *not* update the cache. With trylock(), the second will just skip the update faster. Same number of calc_vmalloc_info() calls. The only inefficient case is if two readers arrive far enough apart that vmap_info_gen is updated between them, yet close enough together than the second arrives at the lock while the first is updating the cache. In that case, the second reader will not update the cache and (assuming no more changes to vmap_info_gen) some future third reader will have to duplicate the effort of calling calc_vmalloc_info(). But that's such a tiny window I give preference to my fondness for non-blocking code. > I added your Reviewed-by optimistically, saving a v7 submission hopefully ;-) You did right. As I said, the non-blocking part is your preference. I've done so much nasty stuff in interrupt handlers ($DAY_JOB more than kernel) that I go for the non-blocking algorithm whenever possible. Reviewed-by: George Spelvin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) by kanga.kvack.org (Postfix) with ESMTP id 8BF436B0253 for ; Tue, 25 Aug 2015 08:47:07 -0400 (EDT) Received: by widdq5 with SMTP id dq5so14755965wid.1 for ; Tue, 25 Aug 2015 05:47:07 -0700 (PDT) Received: from casper.infradead.org (casper.infradead.org. [2001:770:15f::2]) by mx.google.com with ESMTPS id kf4si3066602wic.48.2015.08.25.05.47.03 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Aug 2015 05:47:03 -0700 (PDT) Date: Tue, 25 Aug 2015 14:46:55 +0200 From: Peter Zijlstra Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150825124655.GQ16853@twins.programming.kicks-ass.net> References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150823081750.GA28349@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org On Sun, Aug 23, 2015 at 10:17:51AM +0200, Ingo Molnar wrote: > +static u64 vmap_info_gen; > +static u64 vmap_info_cache_gen; > +void get_vmalloc_info(struct vmalloc_info *vmi) > +{ > + u64 gen = READ_ONCE(vmap_info_gen); > + > + /* > + * If the generation counter of the cache matches that of > + * the vmalloc generation counter then return the cache: > + */ > + if (READ_ONCE(vmap_info_cache_gen) == gen) { Why are those things u64? It has the obvious down-side that you still get split loads on 32bit machines. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by kanga.kvack.org (Postfix) with ESMTP id 8D8DD6B0253 for ; Tue, 25 Aug 2015 08:59:58 -0400 (EDT) Received: by pacdd16 with SMTP id dd16so123625270pac.2 for ; Tue, 25 Aug 2015 05:59:58 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id rr5si15836775pab.153.2015.08.25.05.59.57 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Aug 2015 05:59:57 -0700 (PDT) Date: Tue, 25 Aug 2015 14:59:51 +0200 From: Peter Zijlstra Subject: Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150825125951.GR16853@twins.programming.kicks-ass.net> References: <20150824075018.GB20106@gmail.com> <20150824125402.28806.qmail@ns.horizon.com> <20150825095638.GA24750@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150825095638.GA24750@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org On Tue, Aug 25, 2015 at 11:56:38AM +0200, Ingo Molnar wrote: > +void get_vmalloc_info(struct vmalloc_info *vmi) > +{ > + unsigned int cache_gen, gen; I see you dropped the u64 thing, good, ignore that previous email. > + > + /* > + * The common case is that the cache is valid, so first > + * read it, then check its validity. > + * > + * The two read barriers make sure that we read > + * 'cache_gen', 'vmap_info_cache' and 'gen' in > + * precisely that order: > + */ > + cache_gen = vmap_info_cache_gen; > + smp_rmb(); > + *vmi = vmap_info_cache; > + smp_rmb(); > + gen = vmap_info_gen; > + > + /* > + * If the generation counter of the cache matches that of > + * the vmalloc generation counter then return the cache: > + */ > + if (cache_gen == gen) > + return; There is one aspect of READ_ONCE() that is not replaced with smp_rmb(), and that is that READ_ONCE() should avoid split loads. Without READ_ONCE() the compiler is free to turn the loads into separate and out of order byte loads just because its insane, thereby also making the WRITE_ONCE() pointless. Now I'm fairly sure it all doesn't matter much, the info can change the moment we've copied it, so the read is inherently racy. And by that same argument I feel this is all somewhat over engineered. > + > + /* Make sure 'gen' is read before the vmalloc info: */ > + smp_rmb(); > + calc_vmalloc_info(vmi); > + > + /* > + * All updates to vmap_info_cache_gen go through this spinlock, > + * so when the cache got invalidated, we'll only mark it valid > + * again if we first fully write the new vmap_info_cache. > + * > + * This ensures that partial results won't be used and that the > + * vmalloc info belonging to the freshest update is used: > + */ > + spin_lock(&vmap_info_lock); > + if ((int)(gen-vmap_info_cache_gen) > 0) { > + vmap_info_cache = *vmi; > + /* > + * Make sure the new cached data is visible before > + * the generation counter update: > + */ > + smp_wmb(); > + vmap_info_cache_gen = gen; > + } > + spin_unlock(&vmap_info_lock); > +} > + > +#endif /* CONFIG_PROC_FS */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f49.google.com (mail-la0-f49.google.com [209.85.215.49]) by kanga.kvack.org (Postfix) with ESMTP id 727219003C7 for ; Tue, 25 Aug 2015 10:20:04 -0400 (EDT) Received: by laba3 with SMTP id a3so99369239lab.1 for ; Tue, 25 Aug 2015 07:20:03 -0700 (PDT) Received: from mail-la0-x230.google.com (mail-la0-x230.google.com. [2a00:1450:4010:c03::230]) by mx.google.com with ESMTPS id oa2si16116359lbb.128.2015.08.25.07.20.02 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Aug 2015 07:20:02 -0700 (PDT) Received: by labgv11 with SMTP id gv11so31471925lab.2 for ; Tue, 25 Aug 2015 07:20:02 -0700 (PDT) From: Rasmus Villemoes Subject: Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info References: <20150824075018.GB20106@gmail.com> <20150824125402.28806.qmail@ns.horizon.com> <20150825095638.GA24750@gmail.com> Date: Tue, 25 Aug 2015 16:19:59 +0200 In-Reply-To: <20150825095638.GA24750@gmail.com> (Ingo Molnar's message of "Tue, 25 Aug 2015 11:56:38 +0200") Message-ID: <87io83wiuo.fsf@rasmusvillemoes.dk> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org On Tue, Aug 25 2015, Ingo Molnar wrote: > * George Spelvin wrote: > >> (I hope I'm not annoying you by bikeshedding this too much, although I >> think this is improving.) > > [ I don't mind, although I wish other, more critical parts of the kernel got this > much attention as well ;-) ] > Since we're beating dead horses, let me point out one possibly unintentional side-effect of initializing just one of vmap_info{,_cache}_gen: $ nm -n vmlinux | grep -E 'vmap_info(_cache)?_gen' ffffffff81e4e5e0 d vmap_info_gen ffffffff820d5700 b vmap_info_cache_gen [Up-thread, you wrote "I also moved the function-static cache next to the flag and seqlock - this should further compress the cache footprint."] One should probably ensure that they end up in the same cacheline if one wants the fast-path to be as fast as possible - the easiest way to ensure that is to put them in a small struct, and that might as well contain the spinlock and the cache itself as well. It's been fun seeing this evolve, but overall, I tend to agree with Peter: It's a lot of complexity for little gain. If we're not going to just kill the Vmalloc* fields (which is probably too controversial) I'd prefer Linus' simpler version. Rasmus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f178.google.com (mail-yk0-f178.google.com [209.85.160.178]) by kanga.kvack.org (Postfix) with ESMTP id 49D606B0253 for ; Tue, 25 Aug 2015 11:11:56 -0400 (EDT) Received: by ykdt205 with SMTP id t205so160169782ykd.1 for ; Tue, 25 Aug 2015 08:11:56 -0700 (PDT) Received: from ns.horizon.com (ns.horizon.com. [71.41.210.147]) by mx.google.com with SMTP id z62si12585071ywd.170.2015.08.25.08.11.55 for ; Tue, 25 Aug 2015 08:11:55 -0700 (PDT) Date: 25 Aug 2015 11:11:54 -0400 Message-ID: <20150825151154.19516.qmail@ns.horizon.com> From: "George Spelvin" Subject: Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info In-Reply-To: <87io83wiuo.fsf@rasmusvillemoes.dk> Sender: owner-linux-mm@kvack.org List-ID: To: linux@rasmusvillemoes.dk, mingo@kernel.org Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@horizon.com, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org >>> (I hope I'm not annoying you by bikeshedding this too much, although I >>> think this is improving.) >> >> [ I don't mind, although I wish other, more critical parts of the kernel got this >> much attention as well ;-) ] That's the problem with small, understandable problems: people *aren't* scared to mess with them. > It's been fun seeing this evolve, but overall, I tend to agree with > Peter: It's a lot of complexity for little gain. If we're not going to > just kill the Vmalloc* fields (which is probably too controversial) > I'd prefer Linus' simpler version. Are you sure you're not being affected by the number of iterations? The final version is not actually a lot of code (although yes, more than Linus's), and offers the advantage of peace of mind: there's not some nasty-smelling code you can't entirely trust left behind. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f180.google.com (mail-ig0-f180.google.com [209.85.213.180]) by kanga.kvack.org (Postfix) with ESMTP id 96BDF6B0253 for ; Tue, 25 Aug 2015 12:39:51 -0400 (EDT) Received: by igfj19 with SMTP id j19so16101491igf.0 for ; Tue, 25 Aug 2015 09:39:51 -0700 (PDT) Received: from mail-ig0-x22e.google.com (mail-ig0-x22e.google.com. [2607:f8b0:4001:c05::22e]) by mx.google.com with ESMTPS id 18si10123853iok.118.2015.08.25.09.39.50 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Aug 2015 09:39:50 -0700 (PDT) Received: by igui7 with SMTP id i7so15566030igu.0 for ; Tue, 25 Aug 2015 09:39:50 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <87h9npwtx3.fsf@rasmusvillemoes.dk> References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> <87h9npwtx3.fsf@rasmusvillemoes.dk> Date: Tue, 25 Aug 2015 09:39:50 -0700 Message-ID: Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info From: Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Rasmus Villemoes Cc: Ingo Molnar , George Spelvin , Dave Hansen , Linux Kernel Mailing List , linux-mm , Peter Zijlstra , Rik van Riel , David Rientjes On Sun, Aug 23, 2015 at 2:56 PM, Rasmus Villemoes wrote: > > [Maybe one could just remove the fields and see if anybody actually > notices/cares any longer. Or, if they are only used by kernel > developers, put them in their own file.] I'm actually inclined to try exactly that for 4.3, and then take Ingo's patch as a fallback in case somebody actually notices. I'm not convinced anybody actually uses those values, and they are getting *less* relevant rather than more (on 64-bit, those values really don't matter, since the vmalloc space isn't really a limitation), so let's try removing them and seeing what happens. And then we know what we can do if somebody does actually notice. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f182.google.com (mail-ig0-f182.google.com [209.85.213.182]) by kanga.kvack.org (Postfix) with ESMTP id 7E7246B0253 for ; Tue, 25 Aug 2015 13:03:22 -0400 (EDT) Received: by igbjg10 with SMTP id jg10so17963167igb.0 for ; Tue, 25 Aug 2015 10:03:22 -0700 (PDT) Received: from mail-ig0-x22e.google.com (mail-ig0-x22e.google.com. [2607:f8b0:4001:c05::22e]) by mx.google.com with ESMTPS id m33si10175315iod.140.2015.08.25.10.03.21 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Aug 2015 10:03:21 -0700 (PDT) Received: by igfj19 with SMTP id j19so16225199igf.1 for ; Tue, 25 Aug 2015 10:03:21 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> <87h9npwtx3.fsf@rasmusvillemoes.dk> Date: Tue, 25 Aug 2015 10:03:21 -0700 Message-ID: Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info From: Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Rasmus Villemoes Cc: Ingo Molnar , George Spelvin , Dave Hansen , Linux Kernel Mailing List , linux-mm , Peter Zijlstra , Rik van Riel , David Rientjes On Tue, Aug 25, 2015 at 9:39 AM, Linus Torvalds wrote: > > I'm not convinced anybody actually uses those values, and they are > getting *less* relevant rather than more (on 64-bit, those values > really don't matter, since the vmalloc space isn't really a > limitation) Side note: the people who actually care about "my vmalloc area is too full, what's up?" would use /proc/vmallocinfo anyway, since that's what shows things like fragmentation etc. So I'm just talking about removing the /proc/meminfo part. First try to remove it *all*, and if there is some script that hollers because it wants to parse them, print out the values as zero. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750967AbbHWEsm (ORCPT ); Sun, 23 Aug 2015 00:48:42 -0400 Received: from ns.horizon.com ([71.41.210.147]:29051 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1750761AbbHWEsl (ORCPT ); Sun, 23 Aug 2015 00:48:41 -0400 Date: 23 Aug 2015 00:48:39 -0400 Message-ID: <20150823044839.5727.qmail@ns.horizon.com> From: "George Spelvin" To: mingo@kernel.org Subject: Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics Cc: linux@horizon.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, torvalds@linux-foundation.org Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus wrote: > I don't think any of this can be called "correct", in that the > unlocked accesses to the cached state are clearly racy, but I think > it's very much "acceptable". I'd think you could easily fix that with a seqlock-like system. What makes it so simple is that you can always fall back to calc_vmalloc_info if there's any problem, rather than looping or blocking. The basic idea is that you have a seqlock counter, but if either of the two lsbits are set, the cached information is stale. Basically, you need a seqlock and a spinlock. The seqlock does most of the work, and the spinlock ensures that there's only one updater of the cache. vmap_unlock() does set_bit(0, &seq->sequence). This marks the information as stale. get_vmalloc_info reads the seqlock. There are two case: - If the two lsbits are 10, the cached information is valid. Copy it out, re-check the seqlock, and loop if the sequence number changes. - In any other case, the cached information is not valid. - Try to obtain the spinlock. Do not block if it's unavailable. - If unavailable, do not block. - If the lock is acquired: - Set the sequence to (sequence | 3) + 1 (we're the only writer) - This bumps the sequence number and leaves the lsbits at 00 (invalid) - Memory barrier TBD. Do the RCU ops in calc_vmalloc_info do it for us? - Call calc_vmalloc_info - If we obtained the spinlock earlier: - Copy our vmi to cached_info - smp_wmb() - set_bit(1, &seq->sequence). This marks the information as valid, as long as bit 0 is still clear. - Release the spinlock. Basically, bit 0 says "vmalloc info has changed", and bit 1 says "vmalloc cache has been updated". This clears bit 0 before starting the update so that an update during calc_vmalloc_info will force a new update. So the three case are basically: 00 - calc_vmalloc_info() in progress 01 - vmap_unlock() during calc_vmalloc_info() 10 - cached_info is valid 11 - vmap_unlock has invalidated cached_info, awaiting refresh Logically, the sequence number should be initialized to ...01, but the code above handles 00 okay. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751343AbbHWGEt (ORCPT ); Sun, 23 Aug 2015 02:04:49 -0400 Received: from mail-wi0-f181.google.com ([209.85.212.181]:33439 "EHLO mail-wi0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750956AbbHWGEs (ORCPT ); Sun, 23 Aug 2015 02:04:48 -0400 Date: Sun, 23 Aug 2015 08:04:43 +0200 From: Ingo Molnar To: George Spelvin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, torvalds@linux-foundation.org, Dave Hansen , Peter Zijlstra , David Rientjes , Rik van Riel , Rasmus Villemoes Subject: Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics Message-ID: <20150823060443.GA9882@gmail.com> References: <20150823044839.5727.qmail@ns.horizon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150823044839.5727.qmail@ns.horizon.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * George Spelvin wrote: > Linus wrote: > > I don't think any of this can be called "correct", in that the > > unlocked accesses to the cached state are clearly racy, but I think > > it's very much "acceptable". > > I'd think you could easily fix that with a seqlock-like system. > > What makes it so simple is that you can always fall back to > calc_vmalloc_info if there's any problem, rather than looping or blocking. > > The basic idea is that you have a seqlock counter, but if either of > the two lsbits are set, the cached information is stale. > > Basically, you need a seqlock and a spinlock. The seqlock does > most of the work, and the spinlock ensures that there's only one > updater of the cache. > > vmap_unlock() does set_bit(0, &seq->sequence). This marks the information > as stale. > > get_vmalloc_info reads the seqlock. There are two case: > - If the two lsbits are 10, the cached information is valid. > Copy it out, re-check the seqlock, and loop if the sequence > number changes. > - In any other case, the cached information is > not valid. > - Try to obtain the spinlock. Do not block if it's unavailable. > - If unavailable, do not block. > - If the lock is acquired: > - Set the sequence to (sequence | 3) + 1 (we're the only writer) > - This bumps the sequence number and leaves the lsbits at 00 (invalid) > - Memory barrier TBD. Do the RCU ops in calc_vmalloc_info do it for us? > - Call calc_vmalloc_info > - If we obtained the spinlock earlier: > - Copy our vmi to cached_info > - smp_wmb() > - set_bit(1, &seq->sequence). This marks the information as valid, > as long as bit 0 is still clear. > - Release the spinlock. > > Basically, bit 0 says "vmalloc info has changed", and bit 1 says > "vmalloc cache has been updated". This clears bit 0 before > starting the update so that an update during calc_vmalloc_info > will force a new update. > > So the three case are basically: > 00 - calc_vmalloc_info() in progress > 01 - vmap_unlock() during calc_vmalloc_info() > 10 - cached_info is valid > 11 - vmap_unlock has invalidated cached_info, awaiting refresh > > Logically, the sequence number should be initialized to ...01, > but the code above handles 00 okay. I think this is too complex. How about something simple like the patch below (on top of the third patch)? It makes the vmalloc info transactional - /proc/meminfo will always print a consistent set of numbers. (Not that we really care about races there, but it looks really simple to solve so why not.) ( I also moved the function-static cache next to the flag and seqlock - this should further compress the cache footprint. ) Have I missed anything? Very lightly tested: booted in a VM. Thanks, Ingo =========================> mm/vmalloc.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ef48e557df5a..66726f41e726 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -278,7 +278,15 @@ EXPORT_SYMBOL(vmalloc_to_pfn); static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock); +/* + * Seqlock and flag for the vmalloc info cache printed in /proc/meminfo. + * + * The assumption of the optimization is that it's read frequently, but + * modified infrequently. + */ +static DEFINE_SEQLOCK(vmap_info_lock); static int vmap_info_changed; +static struct vmalloc_info vmap_info_cache; static inline void vmap_lock(void) { @@ -2752,10 +2760,14 @@ static void calc_vmalloc_info(struct vmalloc_info *vmi) void get_vmalloc_info(struct vmalloc_info *vmi) { - static struct vmalloc_info cached_info; + if (!READ_ONCE(vmap_info_changed)) { + unsigned int seq; + + do { + seq = read_seqbegin(&vmap_info_lock); + *vmi = vmap_info_cache; + } while (read_seqretry(&vmap_info_lock, seq)); - if (!vmap_info_changed) { - *vmi = cached_info; return; } @@ -2764,8 +2776,9 @@ void get_vmalloc_info(struct vmalloc_info *vmi) calc_vmalloc_info(vmi); - barrier(); - cached_info = *vmi; + write_seqlock(&vmap_info_lock); + vmap_info_cache = *vmi; + write_sequnlock(&vmap_info_lock); } #endif From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751369AbbHWGqH (ORCPT ); Sun, 23 Aug 2015 02:46:07 -0400 Received: from ns.horizon.com ([71.41.210.147]:61386 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1750925AbbHWGqF (ORCPT ); Sun, 23 Aug 2015 02:46:05 -0400 Date: 23 Aug 2015 02:46:03 -0400 Message-ID: <20150823064603.14050.qmail@ns.horizon.com> From: "George Spelvin" To: linux@horizon.com, mingo@kernel.org Subject: Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org In-Reply-To: <20150823060443.GA9882@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ingo Molnar wrote: > I think this is too complex. > > How about something simple like the patch below (on top of the third patch)? > It makes the vmalloc info transactional - /proc/meminfo will always print a > consistent set of numbers. (Not that we really care about races there, but it > looks really simple to solve so why not.) Looks like a huge simplification! It needs a comment about the approximate nature of the locking and the obvious race conditions: 1) The first caller to get_vmalloc_info() clears vmap_info_changed before updating vmap_info_cache, so a second caller is likely to get stale data for the duration of a calc_vmalloc_info call. 2) Although unlikely, it's possible for two threads to race calling calc_vmalloc_info, and the one that computes fresher data updates the cache first, so the later write leaves stale data. Other issues: 3) Me, I'd make vmap_info_changed a bool, for documentation more than any space saving. 4) I wish there were a trylock version of write_seqlock, so we could avoid blocking entirely. (You *could* hand-roll it, but that eats into the simplicity.) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752006AbbHWIR7 (ORCPT ); Sun, 23 Aug 2015 04:17:59 -0400 Received: from mail-wi0-f174.google.com ([209.85.212.174]:36686 "EHLO mail-wi0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751708AbbHWIR4 (ORCPT ); Sun, 23 Aug 2015 04:17:56 -0400 Date: Sun, 23 Aug 2015 10:17:51 +0200 From: Ingo Molnar To: George Spelvin Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150823081750.GA28349@gmail.com> References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150823064603.14050.qmail@ns.horizon.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * George Spelvin wrote: > Ingo Molnar wrote: > > I think this is too complex. > > > > How about something simple like the patch below (on top of the third patch)? > > > It makes the vmalloc info transactional - /proc/meminfo will always print a > > consistent set of numbers. (Not that we really care about races there, but it > > looks really simple to solve so why not.) > > Looks like a huge simplification! > > It needs a comment about the approximate nature of the locking and > the obvious race conditions: > 1) The first caller to get_vmalloc_info() clears vmap_info_changed > before updating vmap_info_cache, so a second caller is likely to > get stale data for the duration of a calc_vmalloc_info call. > 2) Although unlikely, it's possible for two threads to race calling > calc_vmalloc_info, and the one that computes fresher data updates > the cache first, so the later write leaves stale data. > > Other issues: > 3) Me, I'd make vmap_info_changed a bool, for documentation more than > any space saving. > 4) I wish there were a trylock version of write_seqlock, so we could > avoid blocking entirely. (You *could* hand-roll it, but that eats > into the simplicity.) Ok, fair enough - so how about the attached approach instead, which uses a 64-bit generation counter to track changes to the vmalloc state. This is still very simple, but should not suffer from stale data being returned indefinitely in /proc/meminfo. We might race - but that was true before as well due to the lock-less RCU list walk - but we'll always return a correct and consistent version of the information. Lightly tested. This is a replacement patch to make it easier to read via email. I also made sure there's no extra overhead in the !CONFIG_PROC_FS case. Note that there's an even simpler variant possible I think: we could use just the two generation counters and barriers to remove the seqlock. Thanks, Ingo ==============================> >>From f9fd770e75e2edb4143f32ced0b53d7a77969c94 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Sat, 22 Aug 2015 12:28:01 +0200 Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info Linus reported that glibc (rather stupidly) reads /proc/meminfo for every sysinfo() call, which causes the Git build to use a surprising amount of CPU time, mostly due to the overhead of get_vmalloc_info() - which walks a long list to do its statistics. Modify Linus's jiffies based patch to use generation counters to cache the vmalloc info: vmap_unlock() increases the generation counter, and the get_vmalloc_info() reads it and compares it against a cached generation counter. Also use a seqlock to make sure we always print a consistent set of vmalloc statistics. Cc: Linus Torvalds Cc: Andrew Morton Cc: Rik van Riel Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar --- mm/vmalloc.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 56 insertions(+), 3 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 605138083880..d72b23436906 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -276,7 +276,21 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define VM_LAZY_FREEING 0x02 #define VM_VM_AREA 0x04 -static DEFINE_SPINLOCK(vmap_area_lock); +static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock); + +#ifdef CONFIG_PROC_FS +/* + * A seqlock and two generation counters for a simple cache of the + * vmalloc allocation statistics info printed in /proc/meminfo. + * + * ( The assumption of the optimization is that it's read frequently, but + * modified infrequently. ) + */ +static DEFINE_SEQLOCK(vmap_info_lock); +static u64 vmap_info_gen; +static u64 vmap_info_cache_gen; +static struct vmalloc_info vmap_info_cache; +#endif static inline void vmap_lock(void) { @@ -285,6 +299,9 @@ static inline void vmap_lock(void) static inline void vmap_unlock(void) { +#ifdef CONFIG_PROC_FS + WRITE_ONCE(vmap_info_gen, vmap_info_gen+1); +#endif spin_unlock(&vmap_area_lock); } @@ -2699,7 +2716,7 @@ static int __init proc_vmalloc_init(void) } module_init(proc_vmalloc_init); -void get_vmalloc_info(struct vmalloc_info *vmi) +static void calc_vmalloc_info(struct vmalloc_info *vmi) { struct vmap_area *va; unsigned long free_area_size; @@ -2746,5 +2763,41 @@ void get_vmalloc_info(struct vmalloc_info *vmi) out: rcu_read_unlock(); } -#endif +/* + * Return a consistent snapshot of the current vmalloc allocation + * statistics, for /proc/meminfo: + */ +void get_vmalloc_info(struct vmalloc_info *vmi) +{ + u64 gen = READ_ONCE(vmap_info_gen); + + /* + * If the generation counter of the cache matches that of + * the vmalloc generation counter then return the cache: + */ + if (READ_ONCE(vmap_info_cache_gen) == gen) { + unsigned int seq; + + do { + seq = read_seqbegin(&vmap_info_lock); + *vmi = vmap_info_cache; + } while (read_seqretry(&vmap_info_lock, seq)); + + return; + } + + calc_vmalloc_info(vmi); + + /* + * If are racing with a new vmalloc() then we might write + * the old generation counter here - and the next call to + * get_vmalloc_info() will fix things up: + */ + write_seqlock(&vmap_info_lock); + vmap_info_cache = *vmi; + WRITE_ONCE(vmap_info_cache_gen, gen); + write_sequnlock(&vmap_info_lock); +} + +#endif /* CONFIG_PROC_FS */ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753443AbbHWUxe (ORCPT ); Sun, 23 Aug 2015 16:53:34 -0400 Received: from mail-la0-f44.google.com ([209.85.215.44]:35848 "EHLO mail-la0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751612AbbHWUxd (ORCPT ); Sun, 23 Aug 2015 16:53:33 -0400 From: Rasmus Villemoes To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Organization: D03 References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> X-Hashcash: 1:20:150823:linux@horizon.com::74OY0bWTF6JJdNH2:000000000000000000000000000000000000000000000cnm X-Hashcash: 1:20:150823:rientjes@google.com::1aB/7FoDMAScqiqM:0000000000000000000000000000000000000000001N2T X-Hashcash: 1:20:150823:linux-kernel@vger.kernel.org::UITJar0R4wJ1Oo7d:0000000000000000000000000000000002D9t X-Hashcash: 1:20:150823:mingo@kernel.org::ASqES/1Ij1Oo194A:02jW9 X-Hashcash: 1:20:150823:dave@sr71.net::7b+4snVzzzVTET+H:00003pz+ X-Hashcash: 1:20:150823:torvalds@linux-foundation.org::25VyRHnuaVkyXrqu:000000000000000000000000000000003J77 X-Hashcash: 1:20:150823:linux-mm@kvack.org::CE9pFa9YOHPKu7c+:00000000000000000000000000000000000000000006P4N X-Hashcash: 1:20:150823:riel@redhat.com::PuIN88VZFuJHPj1B:006RAj X-Hashcash: 1:20:150823:peterz@infradead.org::B6VYp4pjX7edC7hV:000000000000000000000000000000000000000007Wai Date: Sun, 23 Aug 2015 22:53:28 +0200 In-Reply-To: <20150823081750.GA28349@gmail.com> (Ingo Molnar's message of "Sun, 23 Aug 2015 10:17:51 +0200") Message-ID: <87lhd1wwtz.fsf@rasmusvillemoes.dk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Aug 23 2015, Ingo Molnar wrote: > Ok, fair enough - so how about the attached approach instead, which > uses a 64-bit generation counter to track changes to the vmalloc > state. How does this invalidation approach compare to the jiffies approach? In other words, how often does the vmalloc info actually change (or rather, in this approximation, how often is vmap_area_lock taken)? In particular, does it also solve the problem with git's test suite and similar situations with lots of short-lived processes? > ==============================> > From f9fd770e75e2edb4143f32ced0b53d7a77969c94 Mon Sep 17 00:00:00 2001 > From: Ingo Molnar > Date: Sat, 22 Aug 2015 12:28:01 +0200 > Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info > > Linus reported that glibc (rather stupidly) reads /proc/meminfo > for every sysinfo() call, Not quite: It is done by the two functions get_{av,}phys_pages functions; and get_phys_pages is called (once per process) by glibc's qsort implementation. In fact, sysinfo() is (at least part of) the cure, not the disease. Whether qsort should care about the total amount of memory is another discussion. Rasmus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753616AbbHWV4b (ORCPT ); Sun, 23 Aug 2015 17:56:31 -0400 Received: from mail-la0-f42.google.com ([209.85.215.42]:36622 "EHLO mail-la0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751612AbbHWV43 (ORCPT ); Sun, 23 Aug 2015 17:56:29 -0400 From: Rasmus Villemoes To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Organization: D03 References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> X-Hashcash: 1:20:150823:linux-mm@kvack.org::XKvyXuInva6s2AfP:000000000000000000000000000000000000000000006UQ X-Hashcash: 1:20:150823:torvalds@linux-foundation.org::FK0Bwj+OMWmTgaPa:000000000000000000000000000000000rQ/ X-Hashcash: 1:20:150823:dave@sr71.net::mqLbwS9FrlTHImct:00002Idv X-Hashcash: 1:20:150823:linux-kernel@vger.kernel.org::AIhd70BLyh4lFgq7:0000000000000000000000000000000002X/B X-Hashcash: 1:20:150823:riel@redhat.com::HKO3f/RzMG/C5vkL:002xjJ X-Hashcash: 1:20:150823:linux@horizon.com::kpuGQ+N8xbj6jYAM:000000000000000000000000000000000000000000003rYd X-Hashcash: 1:20:150823:rientjes@google.com::KnBNBTyZWyHNPkJ5:0000000000000000000000000000000000000000005CQ0 X-Hashcash: 1:20:150823:peterz@infradead.org::BdhkU4CmGdfi3/jH:00000000000000000000000000000000000000000Aomc X-Hashcash: 1:20:150823:mingo@kernel.org::9IOGLQSqXsD3WzC3:0Bd/L Date: Sun, 23 Aug 2015 23:56:24 +0200 In-Reply-To: <20150823081750.GA28349@gmail.com> (Ingo Molnar's message of "Sun, 23 Aug 2015 10:17:51 +0200") Message-ID: <87h9npwtx3.fsf@rasmusvillemoes.dk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I was curious why these fields were ever added to /proc/meminfo, and dug up this: commit d262ee3ee6ba4f5f6125571d93d9d63191d2ef76 Author: Andrew Morton Date: Sat Apr 12 12:59:04 2003 -0700 [PATCH] vmalloc stats in /proc/meminfo From: Matt Porter There was a thread a while back on lkml where Dave Hansen proposed this simple vmalloc usage reporting patch. The thread pretty much died out as most people seemed focused on what VM loading type bugs it could solve. I had posted that this type of information was really valuable in debugging embedded Linux board ports. A common example is where people do arch specific setup that limits there vmalloc space and then they find modules won't load. ;) Having the Vmalloc* info readily available is real useful in helping folks to fix their kernel ports. That thread is at . [Maybe one could just remove the fields and see if anybody actually notices/cares any longer. Or, if they are only used by kernel developers, put them in their own file.] Rasmus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753684AbbHXBEH (ORCPT ); Sun, 23 Aug 2015 21:04:07 -0400 Received: from ns.horizon.com ([71.41.210.147]:62974 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751881AbbHXBEG (ORCPT ); Sun, 23 Aug 2015 21:04:06 -0400 Date: 23 Aug 2015 21:04:03 -0400 Message-ID: <20150824010403.27903.qmail@ns.horizon.com> From: "George Spelvin" To: linux@horizon.com, mingo@kernel.org Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org In-Reply-To: <20150823081750.GA28349@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org First, an actual, albeit minor, bug: initializing both vmap_info_gen and vmap_info_cache_gen to 0 marks the cache as valid, which it's not. vmap_info_gen should be initialized to 1 to force an initial cache update. Second, I don't see why you need a 64-bit counter. Seqlocks consider 32 bits (31 bits, actually, the lsbit means "update in progress") quite a strong enough guarantee. Third, it seems as though vmap_info_cache_gen is basically a duplicate of vmap_info_lock.sequence. It should be possible to make one variable serve both purposes. You just need a kludge to handle the case of multiple vamp_info updates between cache updates. There are two simple ones: 1) Avoid bumping vmap_info_gen unnecessarily. In vmap_unlock(), do vmap_info_gen = (vmap_info_lock.sequence | 1) + 1; 2) - Make vmap_info_gen a seqcount_t - In vmap_unlock(), do write_seqcount_barrier(&vmap_info_gen) - In get_vmalloc_info, inside the seqlock critical section, do vmap_info_lock.seqcount.sequence = vmap_info_gen.sequence - 1; (Using the vmap_info_gen.sequence read while validating the cache in the first place.) I should try to write an actual patch illustrating this. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753985AbbHXG6O (ORCPT ); Mon, 24 Aug 2015 02:58:14 -0400 Received: from mail-wi0-f176.google.com ([209.85.212.176]:37509 "EHLO mail-wi0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753943AbbHXG6N (ORCPT ); Mon, 24 Aug 2015 02:58:13 -0400 Date: Mon, 24 Aug 2015 08:58:09 +0200 From: Ingo Molnar To: Rasmus Villemoes Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150824065809.GA13082@gmail.com> References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> <87lhd1wwtz.fsf@rasmusvillemoes.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87lhd1wwtz.fsf@rasmusvillemoes.dk> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Rasmus Villemoes wrote: > On Sun, Aug 23 2015, Ingo Molnar wrote: > > > Ok, fair enough - so how about the attached approach instead, which > > uses a 64-bit generation counter to track changes to the vmalloc > > state. > > How does this invalidation approach compare to the jiffies approach? In > other words, how often does the vmalloc info actually change (or rather, > in this approximation, how often is vmap_area_lock taken)? In > particular, does it also solve the problem with git's test suite and > similar situations with lots of short-lived processes? The two approaches are pretty similar, and in a typical distro with typical workload vmalloc() is mostly a boot time affair. But vmalloc() can be used more often in certain corner cases - neither of the patches makes that in any way slower, just the optimization won't trigger that often. Since vmalloc() use is suboptimal for several reasons (it does not use large pages for kernel space allocations, etc.), this is all pretty OK IMHO. > > ==============================> > > From f9fd770e75e2edb4143f32ced0b53d7a77969c94 Mon Sep 17 00:00:00 2001 > > From: Ingo Molnar > > Date: Sat, 22 Aug 2015 12:28:01 +0200 > > Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info > > > > Linus reported that glibc (rather stupidly) reads /proc/meminfo > > for every sysinfo() call, > > Not quite: It is done by the two functions get_{av,}phys_pages > functions; and get_phys_pages is called (once per process) by glibc's > qsort implementation. In fact, sysinfo() is (at least part of) the cure, > not the disease. Whether qsort should care about the total amount of > memory is another discussion. > > Thanks, is the fixed up changelog below better? Ingo ===============> mm/vmalloc: Cache the vmalloc memory info Linus reported that for scripting-intense workloads such as the Git build, glibc's qsort will read /proc/meminfo for every process created (by way of get_phys_pages()), which causes the Git build to generate a surprising amount of kernel overhead. A fair chunk of the overhead is due to get_vmalloc_info() - which walks a potentially long list to do its statistics. Modify Linus's jiffies based patch to use generation counters to cache the vmalloc info: vmap_unlock() increases the generation counter, and the get_vmalloc_info() reads it and compares it against a cached generation counter. Also use a seqlock to make sure we always print a consistent set of vmalloc statistics. Reported-by: Linus Torvalds Cc: Andrew Morton Cc: Rik van Riel Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754033AbbHXHAI (ORCPT ); Mon, 24 Aug 2015 03:00:08 -0400 Received: from mail-wi0-f177.google.com ([209.85.212.177]:36956 "EHLO mail-wi0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753156AbbHXHAG (ORCPT ); Mon, 24 Aug 2015 03:00:06 -0400 Date: Mon, 24 Aug 2015 09:00:01 +0200 From: Ingo Molnar To: Rasmus Villemoes Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150824070001.GB13082@gmail.com> References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> <87h9npwtx3.fsf@rasmusvillemoes.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87h9npwtx3.fsf@rasmusvillemoes.dk> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Rasmus Villemoes wrote: > I was curious why these fields were ever added to /proc/meminfo, and dug > up this: > > commit d262ee3ee6ba4f5f6125571d93d9d63191d2ef76 > Author: Andrew Morton > Date: Sat Apr 12 12:59:04 2003 -0700 > > [PATCH] vmalloc stats in /proc/meminfo > > From: Matt Porter > > There was a thread a while back on lkml where Dave Hansen proposed this > simple vmalloc usage reporting patch. The thread pretty much died out as > most people seemed focused on what VM loading type bugs it could solve. I > had posted that this type of information was really valuable in debugging > embedded Linux board ports. A common example is where people do arch > specific setup that limits there vmalloc space and then they find modules > won't load. ;) Having the Vmalloc* info readily available is real useful in > helping folks to fix their kernel ports. > > That thread is at . > > [Maybe one could just remove the fields and see if anybody actually > notices/cares any longer. Or, if they are only used by kernel > developers, put them in their own file.] So instead of removing the fields (which I'm quite sure is an ABI breaker as it could break less robust /proc/meminfo parsers and scripts), we could just report '0' all the time - and have the real info somewhere else? Thanks, Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753958AbbHXHe2 (ORCPT ); Mon, 24 Aug 2015 03:34:28 -0400 Received: from mail-wi0-f172.google.com ([209.85.212.172]:36011 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753396AbbHXHe1 (ORCPT ); Mon, 24 Aug 2015 03:34:27 -0400 Date: Mon, 24 Aug 2015 09:34:22 +0200 From: Ingo Molnar To: George Spelvin Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150824073422.GC13082@gmail.com> References: <20150823081750.GA28349@gmail.com> <20150824010403.27903.qmail@ns.horizon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150824010403.27903.qmail@ns.horizon.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * George Spelvin wrote: > First, an actual, albeit minor, bug: initializing both vmap_info_gen > and vmap_info_cache_gen to 0 marks the cache as valid, which it's not. Ha! :-) Fixed. > vmap_info_gen should be initialized to 1 to force an initial > cache update. Yeah. > Second, I don't see why you need a 64-bit counter. Seqlocks consider > 32 bits (31 bits, actually, the lsbit means "update in progress") quite > a strong enough guarantee. Just out of general paranoia - but you are right, and this would lower the overhead on 32-bit SMP platforms a bit, plus it avoids 64-bit word tearing artifacts on 32 bit platforms as well. I modified it to u32. > Third, it seems as though vmap_info_cache_gen is basically a duplicate > of vmap_info_lock.sequence. It should be possible to make one variable > serve both purposes. Correct, I alluded to that in my description: > > Note that there's an even simpler variant possible I think: we could use just > > the two generation counters and barriers to remove the seqlock. > You just need a kludge to handle the case of multiple vamp_info updates > between cache updates. > > There are two simple ones: > > 1) Avoid bumping vmap_info_gen unnecessarily. In vmap_unlock(), do > vmap_info_gen = (vmap_info_lock.sequence | 1) + 1; > 2) - Make vmap_info_gen a seqcount_t > - In vmap_unlock(), do write_seqcount_barrier(&vmap_info_gen) > - In get_vmalloc_info, inside the seqlock critical section, do > vmap_info_lock.seqcount.sequence = vmap_info_gen.sequence - 1; > (Using the vmap_info_gen.sequence read while validating the > cache in the first place.) > > I should try to write an actual patch illustrating this. So I think something like the patch below is even simpler than trying to kludge generation counter semantics into seqcounts. I used two generation counters and a spinlock. The fast path is completely lockless and lightweight on modern SMP platforms. (where smp_rmb() is a no-op or very cheap.) There's not even a seqlock retry loop, instead an invalid cache causes us to fall back to the old behavior - and the freshest result is guaranteed to end up in the cache. The linecount got a bit larger: but half of it is comments. Note that the generation counters are signed integers so that this comparison can be done: + if (gen-vmap_info_cache_gen > 0) { Thanks, Ingo ======================> >>From 1a4c168a22cc302282cbd1bf503ecfc4dc52b74f Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Sat, 22 Aug 2015 12:28:01 +0200 Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info Linus reported that for scripting-intense workloads such as the Git build, glibc's qsort will read /proc/meminfo for every process created (by way of get_phys_pages()), which causes the Git build to generate a surprising amount of kernel overhead. A fair chunk of the overhead is due to get_vmalloc_info() - which walks a potentially long list to do its statistics. Modify Linus's jiffies based patch to use generation counters to cache the vmalloc info: vmap_unlock() increases the generation counter, and the get_vmalloc_info() reads it and compares it against a cached generation counter. Also use a seqlock to make sure we always print a consistent set of vmalloc statistics. Reported-by: Linus Torvalds Cc: Andrew Morton Cc: Rik van Riel Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar --- mm/vmalloc.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 80 insertions(+), 3 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 605138083880..23df06ebb48a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -276,7 +276,21 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define VM_LAZY_FREEING 0x02 #define VM_VM_AREA 0x04 -static DEFINE_SPINLOCK(vmap_area_lock); +static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock); + +#ifdef CONFIG_PROC_FS +/* + * A seqlock and two generation counters for a simple cache of the + * vmalloc allocation statistics info printed in /proc/meminfo. + * + * ( The assumption of the optimization is that it's read frequently, but + * modified infrequently. ) + */ +static DEFINE_SPINLOCK(vmap_info_lock); +static int vmap_info_gen = 1; +static int vmap_info_cache_gen; +static struct vmalloc_info vmap_info_cache; +#endif static inline void vmap_lock(void) { @@ -285,6 +299,9 @@ static inline void vmap_lock(void) static inline void vmap_unlock(void) { +#ifdef CONFIG_PROC_FS + WRITE_ONCE(vmap_info_gen, vmap_info_gen+1); +#endif spin_unlock(&vmap_area_lock); } @@ -2699,7 +2716,7 @@ static int __init proc_vmalloc_init(void) } module_init(proc_vmalloc_init); -void get_vmalloc_info(struct vmalloc_info *vmi) +static void calc_vmalloc_info(struct vmalloc_info *vmi) { struct vmap_area *va; unsigned long free_area_size; @@ -2746,5 +2763,65 @@ void get_vmalloc_info(struct vmalloc_info *vmi) out: rcu_read_unlock(); } -#endif +/* + * Return a consistent snapshot of the current vmalloc allocation + * statistics, for /proc/meminfo: + */ +void get_vmalloc_info(struct vmalloc_info *vmi) +{ + int gen = READ_ONCE(vmap_info_gen); + + /* + * If the generation counter of the cache matches that of + * the vmalloc generation counter then return the cache: + */ + if (READ_ONCE(vmap_info_cache_gen) == gen) { + int gen_after; + + /* + * The two read barriers make sure that we read + * 'gen', 'vmap_info_cache' and 'gen_after' in + * precisely that order: + */ + smp_rmb(); + *vmi = vmap_info_cache; + + smp_rmb(); + gen_after = READ_ONCE(vmap_info_gen); + + /* The cache is still valid: */ + if (gen == gen_after) + return; + + /* Ok, the cache got invalidated just now, regenerate it */ + gen = gen_after; + } + + /* Make sure 'gen' is read before the vmalloc info */ + smp_rmb(); + + calc_vmalloc_info(vmi); + + /* + * All updates to vmap_info_cache_gen go through this spinlock, + * so when the cache got invalidated, we'll only mark it valid + * again if we first fully write the new vmap_info_cache. + * + * This ensures that partial results won't be used. + */ + spin_lock(&vmap_info_lock); + if (gen-vmap_info_cache_gen > 0) { + vmap_info_cache = *vmi; + /* + * Make sure the new cached data is visible before + * the generation counter update: + */ + smp_wmb(); + + WRITE_ONCE(vmap_info_cache_gen, gen); + } + spin_unlock(&vmap_info_lock); +} + +#endif /* CONFIG_PROC_FS */ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754018AbbHXHrT (ORCPT ); Mon, 24 Aug 2015 03:47:19 -0400 Received: from mail-wi0-f179.google.com ([209.85.212.179]:38901 "EHLO mail-wi0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751696AbbHXHrS (ORCPT ); Mon, 24 Aug 2015 03:47:18 -0400 Date: Mon, 24 Aug 2015 09:47:14 +0200 From: Ingo Molnar To: George Spelvin Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150824074714.GA20106@gmail.com> References: <20150823081750.GA28349@gmail.com> <20150824010403.27903.qmail@ns.horizon.com> <20150824073422.GC13082@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150824073422.GC13082@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > +/* > + * Return a consistent snapshot of the current vmalloc allocation > + * statistics, for /proc/meminfo: > + */ > +void get_vmalloc_info(struct vmalloc_info *vmi) > +{ > + int gen = READ_ONCE(vmap_info_gen); > + > + /* > + * If the generation counter of the cache matches that of > + * the vmalloc generation counter then return the cache: > + */ > + if (READ_ONCE(vmap_info_cache_gen) == gen) { > + int gen_after; > + > + /* > + * The two read barriers make sure that we read > + * 'gen', 'vmap_info_cache' and 'gen_after' in > + * precisely that order: > + */ > + smp_rmb(); > + *vmi = vmap_info_cache; > + > + smp_rmb(); > + gen_after = READ_ONCE(vmap_info_gen); > + > + /* The cache is still valid: */ > + if (gen == gen_after) > + return; > + > + /* Ok, the cache got invalidated just now, regenerate it */ > + gen = gen_after; > + } One more detail: I just realized that with the read barriers, the READ_ONCE() accesses are not needed anymore - the barriers and the control dependencies are enough. This will further simplify the code. Thanks, Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932323AbbHXHuY (ORCPT ); Mon, 24 Aug 2015 03:50:24 -0400 Received: from mail-wi0-f177.google.com ([209.85.212.177]:35766 "EHLO mail-wi0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752077AbbHXHuX (ORCPT ); Mon, 24 Aug 2015 03:50:23 -0400 Date: Mon, 24 Aug 2015 09:50:18 +0200 From: Ingo Molnar To: George Spelvin Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: [PATCH 3/3 v5] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150824075018.GB20106@gmail.com> References: <20150823081750.GA28349@gmail.com> <20150824010403.27903.qmail@ns.horizon.com> <20150824073422.GC13082@gmail.com> <20150824074714.GA20106@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150824074714.GA20106@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > One more detail: I just realized that with the read barriers, the READ_ONCE() > accesses are not needed anymore - the barriers and the control dependencies are > enough. > > This will further simplify the code. I.e. something like the updated patch below. (We still need the WRITE_ONCE() for vmap_info_gen update.) Thanks, Ingo ========================> >>From 46a0507e0a395a7bc2fe4b46a4766e7457ac0140 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Sat, 22 Aug 2015 12:28:01 +0200 Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info Linus reported that for scripting-intense workloads such as the Git build, glibc's qsort will read /proc/meminfo for every process created (by way of get_phys_pages()), which causes the Git build to generate a surprising amount of kernel overhead. A fair chunk of the overhead is due to get_vmalloc_info() - which walks a potentially long list to do its statistics. Modify Linus's jiffies based patch to use generation counters to cache the vmalloc info: vmap_unlock() increases the generation counter, and the get_vmalloc_info() reads it and compares it against a cached generation counter. Also use a seqlock to make sure we always print a consistent set of vmalloc statistics. Reported-by: Linus Torvalds Cc: Andrew Morton Cc: Rik van Riel Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar --- mm/vmalloc.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 79 insertions(+), 3 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 605138083880..2f8d9660e007 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -276,7 +276,21 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define VM_LAZY_FREEING 0x02 #define VM_VM_AREA 0x04 -static DEFINE_SPINLOCK(vmap_area_lock); +static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock); + +#ifdef CONFIG_PROC_FS +/* + * A seqlock and two generation counters for a simple cache of the + * vmalloc allocation statistics info printed in /proc/meminfo. + * + * ( The assumption of the optimization is that it's read frequently, but + * modified infrequently. ) + */ +static DEFINE_SPINLOCK(vmap_info_lock); +static int vmap_info_gen = 1; +static int vmap_info_cache_gen; +static struct vmalloc_info vmap_info_cache; +#endif static inline void vmap_lock(void) { @@ -285,6 +299,9 @@ static inline void vmap_lock(void) static inline void vmap_unlock(void) { +#ifdef CONFIG_PROC_FS + WRITE_ONCE(vmap_info_gen, vmap_info_gen+1); +#endif spin_unlock(&vmap_area_lock); } @@ -2699,7 +2716,7 @@ static int __init proc_vmalloc_init(void) } module_init(proc_vmalloc_init); -void get_vmalloc_info(struct vmalloc_info *vmi) +static void calc_vmalloc_info(struct vmalloc_info *vmi) { struct vmap_area *va; unsigned long free_area_size; @@ -2746,5 +2763,64 @@ void get_vmalloc_info(struct vmalloc_info *vmi) out: rcu_read_unlock(); } -#endif +/* + * Return a consistent snapshot of the current vmalloc allocation + * statistics, for /proc/meminfo: + */ +void get_vmalloc_info(struct vmalloc_info *vmi) +{ + int gen = vmap_info_gen; + + /* + * If the generation counter of the cache matches that of + * the vmalloc generation counter then return the cache: + */ + if (vmap_info_cache_gen == gen) { + int gen_after; + + /* + * The two read barriers make sure that we read + * 'gen', 'vmap_info_cache' and 'gen_after' in + * precisely that order: + */ + smp_rmb(); + *vmi = vmap_info_cache; + + smp_rmb(); + gen_after = vmap_info_gen; + + /* The cache is still valid: */ + if (gen == gen_after) + return; + + /* Ok, the cache got invalidated just now, regenerate it */ + gen = gen_after; + } + + /* Make sure 'gen' is read before the vmalloc info */ + smp_rmb(); + + calc_vmalloc_info(vmi); + + /* + * All updates to vmap_info_cache_gen go through this spinlock, + * so when the cache got invalidated, we'll only mark it valid + * again if we first fully write the new vmap_info_cache. + * + * This ensures that partial results won't be used. + */ + spin_lock(&vmap_info_lock); + if (gen-vmap_info_cache_gen > 0) { + vmap_info_cache = *vmi; + /* + * Make sure the new cached data is visible before + * the generation counter update: + */ + smp_wmb(); + vmap_info_cache_gen = gen; + } + spin_unlock(&vmap_info_lock); +} + +#endif /* CONFIG_PROC_FS */ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932533AbbHXIjX (ORCPT ); Mon, 24 Aug 2015 04:39:23 -0400 Received: from mail-la0-f43.google.com ([209.85.215.43]:33828 "EHLO mail-la0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751980AbbHXIjV (ORCPT ); Mon, 24 Aug 2015 04:39:21 -0400 From: Rasmus Villemoes To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Organization: D03 References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> <87lhd1wwtz.fsf@rasmusvillemoes.dk> <20150824065809.GA13082@gmail.com> X-Hashcash: 1:20:150824:riel@redhat.com::TKoazI/uvXXkwUkH:000c7C X-Hashcash: 1:20:150824:mingo@kernel.org::xM/5nAsuQjqe+tIY:013eg X-Hashcash: 1:20:150824:peterz@infradead.org::vv83/oywDDIU6k8o:000000000000000000000000000000000000000000wF8 X-Hashcash: 1:20:150824:dave@sr71.net::C7ZvR+ED7fEuIjd8:000020qL X-Hashcash: 1:20:150824:linux-kernel@vger.kernel.org::vGdlX7U27rFH885r:00000000000000000000000000000000025cd X-Hashcash: 1:20:150824:linux@horizon.com::LS0/2A6MzGwku5x7:000000000000000000000000000000000000000000002WSq X-Hashcash: 1:20:150824:torvalds@linux-foundation.org::5cY2e9YobH/srMFo:000000000000000000000000000000003Lzd X-Hashcash: 1:20:150824:rientjes@google.com::wIusgJZJN9iOx3iF:0000000000000000000000000000000000000000004Xrl X-Hashcash: 1:20:150824:linux-mm@kvack.org::dbAWf+YL5as86eDj:0000000000000000000000000000000000000000000IBhO Date: Mon, 24 Aug 2015 10:39:17 +0200 In-Reply-To: <20150824065809.GA13082@gmail.com> (Ingo Molnar's message of "Mon, 24 Aug 2015 08:58:09 +0200") Message-ID: <877fol5bd6.fsf@rasmusvillemoes.dk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 24 2015, Ingo Molnar wrote: > > Thanks, is the fixed up changelog below better? > Yes, though Linus specifically referred to "make test" (but I guess one could/should consider that part of the build process). Rasmus > mm/vmalloc: Cache the vmalloc memory info > > Linus reported that for scripting-intense workloads such as the > Git build, glibc's qsort will read /proc/meminfo for every process > created (by way of get_phys_pages()), which causes the Git build > to generate a surprising amount of kernel overhead. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933009AbbHXMyH (ORCPT ); Mon, 24 Aug 2015 08:54:07 -0400 Received: from ns.horizon.com ([71.41.210.147]:22050 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S932935AbbHXMyE (ORCPT ); Mon, 24 Aug 2015 08:54:04 -0400 Date: 24 Aug 2015 08:54:02 -0400 Message-ID: <20150824125402.28806.qmail@ns.horizon.com> From: "George Spelvin" To: linux@horizon.com, mingo@kernel.org Subject: Re: [PATCH 3/3 v5] mm/vmalloc: Cache the vmalloc memory info Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org In-Reply-To: <20150824075018.GB20106@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (I hope I'm not annoying you by bikeshedding this too much, although I think this is improving.) You've sort of re-invented spinlocks, but after thinking a bit, it all works. Rather than using a single word, which is incremented to an odd number at the start of an update and an even number at the end, there are two. An update is in progress when they're unequal. vmap_info_gen is incremented early, when the cache needs updating, and read late (after the cache is copied). vmap_info_cache_gen is incremented after the cache is updated, and read early (before the cache is copied). This is logically equivalent to my complicated scheme with atomic updates to various bits in a single generation word, but greatly simplified by having two separate words. In particular, there's no longer a need to distinguish "vmap has updated list" from "calc_vmalloc_info in progress". I particularly like the "gen - vmap_info_cache_gen > 0" test. You *must* test for inequality to prevent tearing of a valid cache (...grr...English heteronyms...), and given that, might as well require it be fresher. Anyway, suggested changes for v6 (sigh...): First: you do a second read of vmap_info_gen to optimize out the copy of vmalloc_info if it's easily seen as pointless, but given how small vmalloc_info is (two words!), i'd be inclined to omit that optimization. Copy always, *then* see if it's worth keeping. Smaller code, faster fast path, and is barely noticeable on the slow path. Second, and this is up to you, I'd be inclined to go fully non-blocking and only spin_trylock(). If that fails, just skip the cache update. Third, ANSI C rules allow a compiler to assume that signed integer overflow does not occur. That means that gcc is allowed to optimize "if (x - y > 0)" to "if (x > y)". Given that gcc has annoyed us by using this optimization in other contexts, It might be safer to make them unsigned (which is required to wrap properly) and cast to integer after subtraction. Basically, the following (untested, but pretty damn simple): +/* + * Return a consistent snapshot of the current vmalloc allocation + * statistics, for /proc/meminfo: + */ +void get_vmalloc_info(struct vmalloc_info *vmi) +{ + unsigned gen, cache_gen = READ_ONCE(vmap_info_cache_gen); + + /* + * The two read barriers make sure that we read + * 'cache_gen', 'vmap_info_cache' and 'gen' in + * precisely that order: + */ + smp_rmb(); + *vmi = vmap_info_cache; + + smp_rmb(); + gen = READ_ONCE(vmap_info_gen); + + /* + * If the generation counter of the cache matches that of + * the vmalloc generation counter then return the cache: + */ + if (gen == cache_gen) + return; + + /* Make sure 'gen' is read before the vmalloc info */ + smp_rmb(); + calc_vmalloc_info(vmi); + + /* + * All updates to vmap_info_cache_gen go through this spinlock, + * so when the cache got invalidated, we'll only mark it valid + * again if we first fully write the new vmap_info_cache. + * + * This ensures that partial results won't be used. + */ + if (spin_trylock(&vmap_info_lock)) { + if ((int)(gen - vmap_info_cache_gen) > 0) { + vmap_info_cache = *vmi; + /* + * Make sure the new cached data is visible before + * the generation counter update: + */ + smp_wmb(); + WRITE_ONCE(vmap_info_cache_gen, gen); + } + spin_unlock(&vmap_info_lock); + } +} + +#endif /* CONFIG_PROC_FS */ The only remaining *very small* nit is that this function is a mix of "return early" and "wrap it in an if()" style. If you want to make that "if (!spin_trylock(...)) return;", I leave that you your esthetic judgement. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754577AbbHXNpk (ORCPT ); Mon, 24 Aug 2015 09:45:40 -0400 Received: from mailrelay.lanline.com ([216.187.10.16]:33391 "EHLO mailrelay.lanline.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754540AbbHXNpi (ORCPT ); Mon, 24 Aug 2015 09:45:38 -0400 X-Greylist: delayed 2032 seconds by postgrey-1.27 at vger.kernel.org; Mon, 24 Aug 2015 09:45:38 EDT MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <21979.6150.929309.800457@quad.stoffel.home> Date: Mon, 24 Aug 2015 09:11:34 -0400 From: "John Stoffel" To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info In-Reply-To: <20150824073422.GC13082@gmail.com> References: <20150823081750.GA28349@gmail.com> <20150824010403.27903.qmail@ns.horizon.com> <20150824073422.GC13082@gmail.com> X-Mailer: VM 8.2.0b under 23.4.1 (x86_64-pc-linux-gnu) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >>>>> "Ingo" == Ingo Molnar writes: Ingo> * George Spelvin wrote: >> First, an actual, albeit minor, bug: initializing both vmap_info_gen >> and vmap_info_cache_gen to 0 marks the cache as valid, which it's not. Ingo> Ha! :-) Fixed. >> vmap_info_gen should be initialized to 1 to force an initial >> cache update. Blech, it should be initialized with a proper #define VMAP_CACHE_NEEDS_UPDATE 1, instead of more magic numbers. Ingo> + */ Ingo> +static DEFINE_SPINLOCK(vmap_info_lock); Ingo> +static int vmap_info_gen = 1; static int vmap_info_gen = VMAP_CACHE_NEEDS_UPDATE; Ingo> +static int vmap_info_cache_gen; Ingo> +static struct vmalloc_info vmap_info_cache; Ingo> +#endif This will help keep bugs like this out in the future... I hope! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932388AbbHXPLS (ORCPT ); Mon, 24 Aug 2015 11:11:18 -0400 Received: from ns.horizon.com ([71.41.210.147]:53486 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1753727AbbHXPLQ (ORCPT ); Mon, 24 Aug 2015 11:11:16 -0400 Date: 24 Aug 2015 11:11:14 -0400 Message-ID: <20150824151114.18743.qmail@ns.horizon.com> From: "George Spelvin" To: john@stoffel.org, mingo@kernel.org Subject: Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@horizon.com, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org In-Reply-To: <21979.6150.929309.800457@quad.stoffel.home> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org John Stoffel wrote: >> vmap_info_gen should be initialized to 1 to force an initial >> cache update. > Blech, it should be initialized with a proper #define > VMAP_CACHE_NEEDS_UPDATE 1, instead of more magic numbers. Er... this is a joke, right? First, this number is used exactly once, and it's not part of a collection of similar numbers. And the definition would be adjacent to the use. We have easier ways of accomplishing that, called "comments". Second, your proposed name is misleading. "needs update" is defined as vmap_info_gen != vmap_info_cache_gen. There is no particular value of either that has this meaning. For example, initializing vmap_info_cache_gen to -1 would do just as well. (I actually considered that before deciding that +1 was "simpler" than -1.) For some versions of the code, an *arbitrary* difference is okay. You could set one ot 0xDEADBEEF and the other to 0xFEEDFACE. For other versions, the magnitude matters, but not *too* much. Initializing it to 42 would be perfectly correct, but waste time doing 42 cache updates before settling down. Singling out the value 1 as VMAP_CACHE_NEEDS_UPDATE is actively misleading. > This will help keep bugs like this out in the future... I hope! And this is the punchline, right? The problem was not realizing that non-default initialization was required; what we *call* the non-default value is irrelevant. I doubt it would ever have been a real (i.e. noticeable) bug, actually; the first bit of vmap activity in very early boot would have invalidated the cache. (John, my apologies if I went over the top and am contributing to LKML's reputation for flaming. I *did* actually laugh, and *do* think it's a dumb idea, but my annoyance is really directed at unpleasant memories of mindless application of coding style guidelines. In this case, I suspect you just posted before reading carefully enough to see the subtle logic.) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754838AbbHXPzu (ORCPT ); Mon, 24 Aug 2015 11:55:50 -0400 Received: from mailrelay.lanline.com ([216.187.10.16]:58286 "EHLO mailrelay.lanline.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750941AbbHXPzt (ORCPT ); Mon, 24 Aug 2015 11:55:49 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <21979.15999.82965.295320@quad.stoffel.home> Date: Mon, 24 Aug 2015 11:55:43 -0400 From: "John Stoffel" To: "George Spelvin" Cc: john@stoffel.org, mingo@kernel.org, dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: Re: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info In-Reply-To: <20150824151114.18743.qmail@ns.horizon.com> References: <21979.6150.929309.800457@quad.stoffel.home> <20150824151114.18743.qmail@ns.horizon.com> X-Mailer: VM 8.2.0b under 23.4.1 (x86_64-pc-linux-gnu) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org George> John Stoffel wrote: >>> vmap_info_gen should be initialized to 1 to force an initial >>> cache update. >> Blech, it should be initialized with a proper #define >> VMAP_CACHE_NEEDS_UPDATE 1, instead of more magic numbers. George> Er... this is a joke, right? Not really. The comment made before was that by setting this variable to zero, it wasn't properly initialized. Which implies that either the API is wrong... or we should be documenting it better. I just went in the direction of the #define instead of a comment. George> First, this number is used exactly once, and it's not part of George> a collection of similar numbers. And the definition would be George> adjacent to the use. George> We have easier ways of accomplishing that, called "comments". Sure, that would be the better solution in this case. George> Second, your proposed name is misleading. "needs update" is defined George> as vmap_info_gen != vmap_info_cache_gen. There is no particular value George> of either that has this meaning. George> For example, initializing vmap_info_cache_gen to -1 would do just as well. George> (I actually considered that before deciding that +1 was "simpler" than -1.) See, I just threw out a dumb suggestion without reading the patch properly. My fault. George> (John, my apologies if I went over the top and am contributing to LKML's George> reputation for flaming. I *did* actually laugh, and *do* think it's a George> dumb idea, but my annoyance is really directed at unpleasant memories of George> mindless application of coding style guidelines. In this case, I suspect George> you just posted before reading carefully enough to see the subtle logic.) Nope, I'm in the wrong here. And your comment here is wonderful, I really do appreciate how you handled my ham fisted attempt to contribute. But I've got thick skin and I'll keep trying in my free time to comment on patches when I can. John From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754950AbbHYJ4p (ORCPT ); Tue, 25 Aug 2015 05:56:45 -0400 Received: from mail-wi0-f171.google.com ([209.85.212.171]:37467 "EHLO mail-wi0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751087AbbHYJ4n (ORCPT ); Tue, 25 Aug 2015 05:56:43 -0400 Date: Tue, 25 Aug 2015 11:56:38 +0200 From: Ingo Molnar To: George Spelvin Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150825095638.GA24750@gmail.com> References: <20150824075018.GB20106@gmail.com> <20150824125402.28806.qmail@ns.horizon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150824125402.28806.qmail@ns.horizon.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * George Spelvin wrote: > (I hope I'm not annoying you by bikeshedding this too much, although I > think this is improving.) [ I don't mind, although I wish other, more critical parts of the kernel got this much attention as well ;-) ] > Anyway, suggested changes for v6 (sigh...): > > First: you do a second read of vmap_info_gen to optimize out the copy > of vmalloc_info if it's easily seen as pointless, but given how small > vmalloc_info is (two words!), i'd be inclined to omit that optimization. > > Copy always, *then* see if it's worth keeping. Smaller code, faster > fast path, and is barely noticeable on the slow path. Ok, done. > Second, and this is up to you, I'd be inclined to go fully non-blocking and > only spin_trylock(). If that fails, just skip the cache update. So I'm not sure about this one: we have no guarantee of the order every updater reaches the spinlock, and we want the 'freshest' updater to do the update. The trylock might cause us to drop the 'freshest' update erroneously - so this change would introduce a 'stale data' bug I think. > Third, ANSI C rules allow a compiler to assume that signed integer > overflow does not occur. That means that gcc is allowed to optimize > "if (x - y > 0)" to "if (x > y)". That's annoying ... > Given that gcc has annoyed us by using this optimization in other > contexts, It might be safer to make them unsigned (which is required to > wrap properly) and cast to integer after subtraction. Ok, done. > Basically, the following (untested, but pretty damn simple): I've attached v6 which applies your first and last suggestion, but not the trylock one. I also removed _ONCE() accesses from the places that didn't need them. I added your Reviewed-by optimistically, saving a v7 submission hopefully ;-) Lightly tested. Thanks, Ingo ==============================> >>From 8364822f9cff9da9f9858f0ca1f1ddc5bd3ad3a1 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Sat, 22 Aug 2015 12:28:01 +0200 Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info Linus reported that for certain workloads such as 'make test' in the Git build, glibc's qsort will read /proc/meminfo for every process created (by way of get_phys_pages()), which causes the Git build to generate a surprising amount of kernel overhead. A fair chunk of the overhead is due to get_vmalloc_info() - which walks a potentially long list to do its statistics. Modify Linus's jiffies based patch to use generation counters to cache the vmalloc info: vmap_unlock() increases the generation counter, and the get_vmalloc_info() reads it and compares it against a cached generation counter. Also use a spinlock to make sure we always print a consistent set of vmalloc statistics, FWIIW. Reported-by: Linus Torvalds Reviewed-by: George Spelvin Cc: Andrew Morton Cc: Rik van Riel Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar --- mm/vmalloc.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 74 insertions(+), 3 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 605138083880..a0a4274a7be9 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -276,7 +276,21 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define VM_LAZY_FREEING 0x02 #define VM_VM_AREA 0x04 -static DEFINE_SPINLOCK(vmap_area_lock); +static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock); + +#ifdef CONFIG_PROC_FS +/* + * A seqlock and two generation counters for a simple cache of the + * vmalloc allocation statistics info printed in /proc/meminfo. + * + * ( The assumption of the optimization is that it's read frequently, but + * modified infrequently. ) + */ +static DEFINE_SPINLOCK(vmap_info_lock); +static unsigned int vmap_info_gen = 1; +static unsigned int vmap_info_cache_gen; +static struct vmalloc_info vmap_info_cache; +#endif static inline void vmap_lock(void) { @@ -285,6 +299,9 @@ static inline void vmap_lock(void) static inline void vmap_unlock(void) { +#ifdef CONFIG_PROC_FS + WRITE_ONCE(vmap_info_gen, vmap_info_gen+1); +#endif spin_unlock(&vmap_area_lock); } @@ -2699,7 +2716,7 @@ static int __init proc_vmalloc_init(void) } module_init(proc_vmalloc_init); -void get_vmalloc_info(struct vmalloc_info *vmi) +static void calc_vmalloc_info(struct vmalloc_info *vmi) { struct vmap_area *va; unsigned long free_area_size; @@ -2746,5 +2763,59 @@ void get_vmalloc_info(struct vmalloc_info *vmi) out: rcu_read_unlock(); } -#endif +/* + * Return a consistent snapshot of the current vmalloc allocation + * statistics, for /proc/meminfo: + */ +void get_vmalloc_info(struct vmalloc_info *vmi) +{ + unsigned int cache_gen, gen; + + /* + * The common case is that the cache is valid, so first + * read it, then check its validity. + * + * The two read barriers make sure that we read + * 'cache_gen', 'vmap_info_cache' and 'gen' in + * precisely that order: + */ + cache_gen = vmap_info_cache_gen; + smp_rmb(); + *vmi = vmap_info_cache; + smp_rmb(); + gen = vmap_info_gen; + + /* + * If the generation counter of the cache matches that of + * the vmalloc generation counter then return the cache: + */ + if (cache_gen == gen) + return; + + /* Make sure 'gen' is read before the vmalloc info: */ + smp_rmb(); + calc_vmalloc_info(vmi); + + /* + * All updates to vmap_info_cache_gen go through this spinlock, + * so when the cache got invalidated, we'll only mark it valid + * again if we first fully write the new vmap_info_cache. + * + * This ensures that partial results won't be used and that the + * vmalloc info belonging to the freshest update is used: + */ + spin_lock(&vmap_info_lock); + if ((int)(gen-vmap_info_cache_gen) > 0) { + vmap_info_cache = *vmi; + /* + * Make sure the new cached data is visible before + * the generation counter update: + */ + smp_wmb(); + vmap_info_cache_gen = gen; + } + spin_unlock(&vmap_info_lock); +} + +#endif /* CONFIG_PROC_FS */ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755450AbbHYKge (ORCPT ); Tue, 25 Aug 2015 06:36:34 -0400 Received: from ns.horizon.com ([71.41.210.147]:39852 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751087AbbHYKgd (ORCPT ); Tue, 25 Aug 2015 06:36:33 -0400 Date: 25 Aug 2015 06:36:30 -0400 Message-ID: <20150825103630.26398.qmail@ns.horizon.com> From: "George Spelvin" To: linux@horizon.com, mingo@kernel.org Subject: Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org In-Reply-To: <20150825095638.GA24750@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >> Second, and this is up to you, I'd be inclined to go fully non-blocking >> and only spin_trylock(). If that fails, just skip the cache update. > So I'm not sure about this one: we have no guarantee of the order every > updater reaches the spinlock, and we want the 'freshest' updater to do > the update. The trylock might cause us to drop the 'freshest' update > erroneously - so this change would introduce a 'stale data' bug I think. Er, no it wouldn't. If someone leaves the cache stale, they'll leave vmap_info_cache_gen != vmap_info_gen and the next reader to come along will see the cache is stale and refresh it. If there's lock contention, there's a risk of more work, because the callers fall back to calc_vmalloc_info rather than waiting. But it's not a big risk. With the blocking code, if two readers arrive simultaneously and see a stale cache, they'll both call calc_vmalloc_info() and then line up to update the cache. The second will get the lock but then *not* update the cache. With trylock(), the second will just skip the update faster. Same number of calc_vmalloc_info() calls. The only inefficient case is if two readers arrive far enough apart that vmap_info_gen is updated between them, yet close enough together than the second arrives at the lock while the first is updating the cache. In that case, the second reader will not update the cache and (assuming no more changes to vmap_info_gen) some future third reader will have to duplicate the effort of calling calc_vmalloc_info(). But that's such a tiny window I give preference to my fondness for non-blocking code. > I added your Reviewed-by optimistically, saving a v7 submission hopefully ;-) You did right. As I said, the non-blocking part is your preference. I've done so much nasty stuff in interrupt handlers ($DAY_JOB more than kernel) that I go for the non-blocking algorithm whenever possible. Reviewed-by: George Spelvin From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755453AbbHYMrG (ORCPT ); Tue, 25 Aug 2015 08:47:06 -0400 Received: from casper.infradead.org ([85.118.1.10]:34016 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752387AbbHYMrE (ORCPT ); Tue, 25 Aug 2015 08:47:04 -0400 Date: Tue, 25 Aug 2015 14:46:55 +0200 From: Peter Zijlstra To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150825124655.GQ16853@twins.programming.kicks-ass.net> References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150823081750.GA28349@gmail.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Aug 23, 2015 at 10:17:51AM +0200, Ingo Molnar wrote: > +static u64 vmap_info_gen; > +static u64 vmap_info_cache_gen; > +void get_vmalloc_info(struct vmalloc_info *vmi) > +{ > + u64 gen = READ_ONCE(vmap_info_gen); > + > + /* > + * If the generation counter of the cache matches that of > + * the vmalloc generation counter then return the cache: > + */ > + if (READ_ONCE(vmap_info_cache_gen) == gen) { Why are those things u64? It has the obvious down-side that you still get split loads on 32bit machines. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755597AbbHYNAD (ORCPT ); Tue, 25 Aug 2015 09:00:03 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:46005 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754548AbbHYNAB (ORCPT ); Tue, 25 Aug 2015 09:00:01 -0400 Date: Tue, 25 Aug 2015 14:59:51 +0200 From: Peter Zijlstra To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info Message-ID: <20150825125951.GR16853@twins.programming.kicks-ass.net> References: <20150824075018.GB20106@gmail.com> <20150824125402.28806.qmail@ns.horizon.com> <20150825095638.GA24750@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150825095638.GA24750@gmail.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 25, 2015 at 11:56:38AM +0200, Ingo Molnar wrote: > +void get_vmalloc_info(struct vmalloc_info *vmi) > +{ > + unsigned int cache_gen, gen; I see you dropped the u64 thing, good, ignore that previous email. > + > + /* > + * The common case is that the cache is valid, so first > + * read it, then check its validity. > + * > + * The two read barriers make sure that we read > + * 'cache_gen', 'vmap_info_cache' and 'gen' in > + * precisely that order: > + */ > + cache_gen = vmap_info_cache_gen; > + smp_rmb(); > + *vmi = vmap_info_cache; > + smp_rmb(); > + gen = vmap_info_gen; > + > + /* > + * If the generation counter of the cache matches that of > + * the vmalloc generation counter then return the cache: > + */ > + if (cache_gen == gen) > + return; There is one aspect of READ_ONCE() that is not replaced with smp_rmb(), and that is that READ_ONCE() should avoid split loads. Without READ_ONCE() the compiler is free to turn the loads into separate and out of order byte loads just because its insane, thereby also making the WRITE_ONCE() pointless. Now I'm fairly sure it all doesn't matter much, the info can change the moment we've copied it, so the read is inherently racy. And by that same argument I feel this is all somewhat over engineered. > + > + /* Make sure 'gen' is read before the vmalloc info: */ > + smp_rmb(); > + calc_vmalloc_info(vmi); > + > + /* > + * All updates to vmap_info_cache_gen go through this spinlock, > + * so when the cache got invalidated, we'll only mark it valid > + * again if we first fully write the new vmap_info_cache. > + * > + * This ensures that partial results won't be used and that the > + * vmalloc info belonging to the freshest update is used: > + */ > + spin_lock(&vmap_info_lock); > + if ((int)(gen-vmap_info_cache_gen) > 0) { > + vmap_info_cache = *vmi; > + /* > + * Make sure the new cached data is visible before > + * the generation counter update: > + */ > + smp_wmb(); > + vmap_info_cache_gen = gen; > + } > + spin_unlock(&vmap_info_lock); > +} > + > +#endif /* CONFIG_PROC_FS */ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755700AbbHYOUG (ORCPT ); Tue, 25 Aug 2015 10:20:06 -0400 Received: from mail-la0-f41.google.com ([209.85.215.41]:34458 "EHLO mail-la0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751564AbbHYOUE (ORCPT ); Tue, 25 Aug 2015 10:20:04 -0400 From: Rasmus Villemoes To: Ingo Molnar Cc: George Spelvin , dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org Subject: Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info Organization: D03 References: <20150824075018.GB20106@gmail.com> <20150824125402.28806.qmail@ns.horizon.com> <20150825095638.GA24750@gmail.com> X-Hashcash: 1:20:150825:linux-kernel@vger.kernel.org::TgXrknFwg3rWnv8Y:0000000000000000000000000000000000SGA X-Hashcash: 1:20:150825:linux@horizon.com::eQpD7QEBBUao5ayz:000000000000000000000000000000000000000000000cxV X-Hashcash: 1:20:150825:linux-mm@kvack.org::0CBgAngtbueFWNxx:00000000000000000000000000000000000000000000+sS X-Hashcash: 1:20:150825:torvalds@linux-foundation.org::y3rMukhXJQM79B07:000000000000000000000000000000001Cx1 X-Hashcash: 1:20:150825:mingo@kernel.org::Xzi9/vzQOSvF6/zN:02Cor X-Hashcash: 1:20:150825:dave@sr71.net::46u/xdAL9+senwcF:00002p6R X-Hashcash: 1:20:150825:rientjes@google.com::ADOtplNpL4NMzSis:0000000000000000000000000000000000000000004711 X-Hashcash: 1:20:150825:peterz@infradead.org::QZPXzItQlJhhzcAQ:000000000000000000000000000000000000000003r9J X-Hashcash: 1:20:150825:riel@redhat.com::tK49klVxHCNAWfPY:00DaVy Date: Tue, 25 Aug 2015 16:19:59 +0200 In-Reply-To: <20150825095638.GA24750@gmail.com> (Ingo Molnar's message of "Tue, 25 Aug 2015 11:56:38 +0200") Message-ID: <87io83wiuo.fsf@rasmusvillemoes.dk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 25 2015, Ingo Molnar wrote: > * George Spelvin wrote: > >> (I hope I'm not annoying you by bikeshedding this too much, although I >> think this is improving.) > > [ I don't mind, although I wish other, more critical parts of the kernel got this > much attention as well ;-) ] > Since we're beating dead horses, let me point out one possibly unintentional side-effect of initializing just one of vmap_info{,_cache}_gen: $ nm -n vmlinux | grep -E 'vmap_info(_cache)?_gen' ffffffff81e4e5e0 d vmap_info_gen ffffffff820d5700 b vmap_info_cache_gen [Up-thread, you wrote "I also moved the function-static cache next to the flag and seqlock - this should further compress the cache footprint."] One should probably ensure that they end up in the same cacheline if one wants the fast-path to be as fast as possible - the easiest way to ensure that is to put them in a small struct, and that might as well contain the spinlock and the cache itself as well. It's been fun seeing this evolve, but overall, I tend to agree with Peter: It's a lot of complexity for little gain. If we're not going to just kill the Vmalloc* fields (which is probably too controversial) I'd prefer Linus' simpler version. Rasmus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755491AbbHYPL7 (ORCPT ); Tue, 25 Aug 2015 11:11:59 -0400 Received: from ns.horizon.com ([71.41.210.147]:18349 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751859AbbHYPL6 (ORCPT ); Tue, 25 Aug 2015 11:11:58 -0400 Date: 25 Aug 2015 11:11:54 -0400 Message-ID: <20150825151154.19516.qmail@ns.horizon.com> From: "George Spelvin" To: linux@rasmusvillemoes.dk, mingo@kernel.org Subject: Re: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info Cc: dave@sr71.net, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@horizon.com, peterz@infradead.org, riel@redhat.com, rientjes@google.com, torvalds@linux-foundation.org In-Reply-To: <87io83wiuo.fsf@rasmusvillemoes.dk> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >>> (I hope I'm not annoying you by bikeshedding this too much, although I >>> think this is improving.) >> >> [ I don't mind, although I wish other, more critical parts of the kernel got this >> much attention as well ;-) ] That's the problem with small, understandable problems: people *aren't* scared to mess with them. > It's been fun seeing this evolve, but overall, I tend to agree with > Peter: It's a lot of complexity for little gain. If we're not going to > just kill the Vmalloc* fields (which is probably too controversial) > I'd prefer Linus' simpler version. Are you sure you're not being affected by the number of iterations? The final version is not actually a lot of code (although yes, more than Linus's), and offers the advantage of peace of mind: there's not some nasty-smelling code you can't entirely trust left behind. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932471AbbHYQjw (ORCPT ); Tue, 25 Aug 2015 12:39:52 -0400 Received: from mail-ig0-f180.google.com ([209.85.213.180]:36269 "EHLO mail-ig0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755291AbbHYQjv (ORCPT ); Tue, 25 Aug 2015 12:39:51 -0400 MIME-Version: 1.0 In-Reply-To: <87h9npwtx3.fsf@rasmusvillemoes.dk> References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> <87h9npwtx3.fsf@rasmusvillemoes.dk> Date: Tue, 25 Aug 2015 09:39:50 -0700 X-Google-Sender-Auth: TOzYzUnB8dyuc7ZUgr4d9BH6a40 Message-ID: Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info From: Linus Torvalds To: Rasmus Villemoes Cc: Ingo Molnar , George Spelvin , Dave Hansen , Linux Kernel Mailing List , linux-mm , Peter Zijlstra , Rik van Riel , David Rientjes Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Aug 23, 2015 at 2:56 PM, Rasmus Villemoes wrote: > > [Maybe one could just remove the fields and see if anybody actually > notices/cares any longer. Or, if they are only used by kernel > developers, put them in their own file.] I'm actually inclined to try exactly that for 4.3, and then take Ingo's patch as a fallback in case somebody actually notices. I'm not convinced anybody actually uses those values, and they are getting *less* relevant rather than more (on 64-bit, those values really don't matter, since the vmalloc space isn't really a limitation), so let's try removing them and seeing what happens. And then we know what we can do if somebody does actually notice. Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755537AbbHYRDX (ORCPT ); Tue, 25 Aug 2015 13:03:23 -0400 Received: from mail-ig0-f176.google.com ([209.85.213.176]:38473 "EHLO mail-ig0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751315AbbHYRDW (ORCPT ); Tue, 25 Aug 2015 13:03:22 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150823060443.GA9882@gmail.com> <20150823064603.14050.qmail@ns.horizon.com> <20150823081750.GA28349@gmail.com> <87h9npwtx3.fsf@rasmusvillemoes.dk> Date: Tue, 25 Aug 2015 10:03:21 -0700 X-Google-Sender-Auth: IOaZPeXgSECbp8r_Epd9I-7fkbI Message-ID: Subject: Re: [PATCH 3/3 v3] mm/vmalloc: Cache the vmalloc memory info From: Linus Torvalds To: Rasmus Villemoes Cc: Ingo Molnar , George Spelvin , Dave Hansen , Linux Kernel Mailing List , linux-mm , Peter Zijlstra , Rik van Riel , David Rientjes Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 25, 2015 at 9:39 AM, Linus Torvalds wrote: > > I'm not convinced anybody actually uses those values, and they are > getting *less* relevant rather than more (on 64-bit, those values > really don't matter, since the vmalloc space isn't really a > limitation) Side note: the people who actually care about "my vmalloc area is too full, what's up?" would use /proc/vmallocinfo anyway, since that's what shows things like fragmentation etc. So I'm just talking about removing the /proc/meminfo part. First try to remove it *all*, and if there is some script that hollers because it wants to parse them, print out the values as zero. Linus