[git pull] vfs.git bits and pieces

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [git pull] vfs.git bits and pieces
@ 2013-11-20 17:42 Al Viro
  2013-11-20 17:47 ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2013-11-20 17:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

[Apologies for delay; I'd spent the last day hunting down something that
turned out to be a VM leak completely unrelated to this stuff - it's
present in mainline, for starters.  Unreliable reproducers make for fun
bisects ;-/  Anyway, by now I'm absolutely sure that this is a VM bug and
not something I had somehow managed to break, so...]

Assorted bits that got missed in the first pull request + fixes for
a couple of coredump regressions.  Please, pull from the usual place -
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

Shortlog:
Al Viro (7):
      dump_align(): fix the dumb braino
      dump_emit(): use __kernel_write(), not vfs_write()
      gfs2: endianness misannotations
      consolidate simple ->d_delete() instances
      take read_seqbegin_or_lock() and friends to seqlock.h
      dcache.c: get rid of pointless macros
      fold try_to_ascend() into the sole remaining caller

Diffstat:
 arch/ia64/kernel/perfmon.c |    8 +----
 fs/9p/vfs_dentry.c         |   19 +---------
 fs/configfs/dir.c          |   12 +------
 fs/coredump.c              |    6 ++--
 fs/dcache.c                |   84 ++++++++++---------------------------------
 fs/efivarfs/super.c        |   11 +-----
 fs/gfs2/lock_dlm.c         |    8 ++--
 fs/gfs2/quota.c            |   23 +++++-------
 fs/gfs2/rgrp.c             |    4 +-
 fs/hostfs/hostfs_kern.c    |   11 +-----
 fs/libfs.c                 |   12 ++++---
 fs/proc/generic.c          |   18 +---------
 fs/proc/namespaces.c       |    8 +----
 include/linux/fs.h         |    2 +
 include/linux/seqlock.h    |   29 +++++++++++++++
 kernel/cgroup.c            |    7 +---
 net/sunrpc/rpc_pipe.c      |   11 +-----
 17 files changed, 86 insertions(+), 187 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [git pull] vfs.git bits and pieces
  2013-11-20 17:42 [git pull] vfs.git bits and pieces Al Viro
@ 2013-11-20 17:47 ` Al Viro
  2013-11-20 22:16   ` Al Viro
  2013-11-20 22:33   ` Linus Torvalds
  0 siblings, 2 replies; 8+ messages in thread
From: Al Viro @ 2013-11-20 17:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

On Wed, Nov 20, 2013 at 05:42:11PM +0000, Al Viro wrote:
> [Apologies for delay; I'd spent the last day hunting down something that
> turned out to be a VM leak completely unrelated to this stuff - it's
> present in mainline, for starters.  Unreliable reproducers make for fun
> bisects ;-/  Anyway, by now I'm absolutely sure that this is a VM bug and
> not something I had somehow managed to break, so...]

BTW, something odd happened to mm/memory.c - either a mangled patch
or a lost followup.  Take a look at the last commit in there:
commit ea1e7ed33708c7a760419ff9ded0a6cb90586a50
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Thu Nov 14 14:31:53 2013 -0800

    mm: create a separate slab for page->ptl allocation
    
    If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
    is 72 bytes.  For page->ptl they will be allocated from kmalloc-96 slab,
    so we loose 24 on each.  An average system can easily allocate few tens
    thousands of page->ptl and overhead is significant.
    
    Let's create a separate slab for page->ptl allocation to solve this.
    
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Fair enough, and yes, it does create that separate slab.  The problem is,
it's still using kmalloc/kfree for those beasts - page_ptl_cachep isn't
used at all...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [git pull] vfs.git bits and pieces
  2013-11-20 17:47 ` Al Viro
@ 2013-11-20 22:16   ` Al Viro
  2013-11-20 22:24     ` Joe Perches
  2013-11-20 22:33   ` Linus Torvalds
  1 sibling, 1 reply; 8+ messages in thread
From: Al Viro @ 2013-11-20 22:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel, Kirill A. Shutemov

On Wed, Nov 20, 2013 at 05:47:12PM +0000, Al Viro wrote:
> On Wed, Nov 20, 2013 at 05:42:11PM +0000, Al Viro wrote:
> > [Apologies for delay; I'd spent the last day hunting down something that
> > turned out to be a VM leak completely unrelated to this stuff - it's
> > present in mainline, for starters.  Unreliable reproducers make for fun
> > bisects ;-/  Anyway, by now I'm absolutely sure that this is a VM bug and
> > not something I had somehow managed to break, so...]
> 
> BTW, something odd happened to mm/memory.c - either a mangled patch
> or a lost followup.  Take a look at the last commit in there:
> commit ea1e7ed33708c7a760419ff9ded0a6cb90586a50
> Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Date:   Thu Nov 14 14:31:53 2013 -0800
> 
>     mm: create a separate slab for page->ptl allocation
>     
>     If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
>     is 72 bytes.  For page->ptl they will be allocated from kmalloc-96 slab,
>     so we loose 24 on each.  An average system can easily allocate few tens
>     thousands of page->ptl and overhead is significant.
>     
>     Let's create a separate slab for page->ptl allocation to solve this.
>     
>     Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Ingo Molnar <mingo@elte.hu>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> Fair enough, and yes, it does create that separate slab.  The problem is,
> it's still using kmalloc/kfree for those beasts - page_ptl_cachep isn't
> used at all...

While digging in the same area:

Wrong page freed on preallocate_pmds() failure exit

Note that pmds[i] is simply uninitialized at that point...
Granted, it's very hard to hit (you need split page locks
*and* kmalloc(sizeof(spinlock_t), GFP_KERNEL) failing),
but the code is obviously bogus.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index a7cccb6d..36aa999 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -209,7 +209,7 @@ static int preallocate_pmds(pmd_t *pmds[])
 		if (!pmd)
 			failed = true;
 		if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
-			free_page((unsigned long)pmds[i]);
+			free_page((unsigned long)pmd);
 			pmd = NULL;
 			failed = true;
 		}

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [git pull] vfs.git bits and pieces
  2013-11-20 22:16   ` Al Viro
@ 2013-11-20 22:24     ` Joe Perches
  0 siblings, 0 replies; 8+ messages in thread
From: Joe Perches @ 2013-11-20 22:24 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel, Kirill A. Shutemov

On Wed, 2013-11-20 at 22:16 +0000, Al Viro wrote:
> While digging in the same area:
> 
> Wrong page freed on preallocate_pmds() failure exit
> 
> Note that pmds[i] is simply uninitialized at that point...
> Granted, it's very hard to hit (you need split page locks
> *and* kmalloc(sizeof(spinlock_t), GFP_KERNEL) failing),
> but the code is obviously bogus.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index a7cccb6d..36aa999 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -209,7 +209,7 @@ static int preallocate_pmds(pmd_t *pmds[])
>  		if (!pmd)
>  			failed = true;
>  		if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
> -			free_page((unsigned long)pmds[i]);
> +			free_page((unsigned long)pmd);
>  			pmd = NULL;
>  			failed = true;
>  		}

trivia: It'd probably read better with an else too

		if (!pmd) {
			failed = true;
		} else if (!pgtable_pmd_page_ctor(virt_to_page(pmd))) {
			etc...

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [git pull] vfs.git bits and pieces
  2013-11-20 17:47 ` Al Viro
  2013-11-20 22:16   ` Al Viro
@ 2013-11-20 22:33   ` Linus Torvalds
  2013-11-20 22:40     ` Andrew Morton
  2013-11-20 22:42     ` Damien Wyart
  1 sibling, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2013-11-20 22:33 UTC (permalink / raw)
  To: Al Viro, Kirill A. Shutemov, Andrew Morton
  Cc: Linux Kernel Mailing List, linux-fsdevel

On Wed, Nov 20, 2013 at 9:47 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> BTW, something odd happened to mm/memory.c - either a mangled patch
> or a lost followup:
>
>     commit ea1e7ed33708
>     mm: create a separate slab for page->ptl allocation
>
> Fair enough, and yes, it does create that separate slab.  The problem is,
> it's still using kmalloc/kfree for those beasts - page_ptl_cachep isn't
> used at all...

Ok, it looks straightforward enough to just replace the kmalloc/kfree
with using a slab allocation using the page_ptl_cachep pointer. I'd do
it myself, but I would like to know how it got lost? Also, much
testing to make sure the cachep is initialized early enough.

Or should we just revert the commit that added the pointless/unused
slab pointer?

Andrew, Kirill, comments?

Also note the other issue Al found: see commit 2a46eed54a28 ("Wrong
page freed on preallocate_pmds() failure exit") that I just pushed
out.

                Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [git pull] vfs.git bits and pieces
  2013-11-20 22:33   ` Linus Torvalds
@ 2013-11-20 22:40     ` Andrew Morton
  2013-11-21 11:19       ` Kirill A. Shutemov
  2013-11-20 22:42     ` Damien Wyart
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2013-11-20 22:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Kirill A. Shutemov, Linux Kernel Mailing List,
	linux-fsdevel

On Wed, 20 Nov 2013 14:33:35 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Nov 20, 2013 at 9:47 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > BTW, something odd happened to mm/memory.c - either a mangled patch
> > or a lost followup:
> >
> >     commit ea1e7ed33708
> >     mm: create a separate slab for page->ptl allocation
> >
> > Fair enough, and yes, it does create that separate slab.  The problem is,
> > it's still using kmalloc/kfree for those beasts - page_ptl_cachep isn't
> > used at all...
> 
> Ok, it looks straightforward enough to just replace the kmalloc/kfree
> with using a slab allocation using the page_ptl_cachep pointer. I'd do
> it myself, but I would like to know how it got lost? Also, much
> testing to make sure the cachep is initialized early enough.

agh, I went through hell keeping that patch alive and it appears I lost
some of it.

> Or should we just revert the commit that added the pointless/unused
> slab pointer?
> 
> Andrew, Kirill, comments?

Let's just kill it please.  We can try again for 3.14.

> Also note the other issue Al found: see commit 2a46eed54a28 ("Wrong
> page freed on preallocate_pmds() failure exit") that I just pushed
> out.

lgtm, thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [git pull] vfs.git bits and pieces
  2013-11-20 22:33   ` Linus Torvalds
  2013-11-20 22:40     ` Andrew Morton
@ 2013-11-20 22:42     ` Damien Wyart
  1 sibling, 0 replies; 8+ messages in thread
From: Damien Wyart @ 2013-11-20 22:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Kirill A. Shutemov, Andrew Morton,
	Linux Kernel Mailing List, linux-fsdevel

> > BTW, something odd happened to mm/memory.c - either a mangled patch
> > or a lost followup:

> >     commit ea1e7ed33708
> >     mm: create a separate slab for page->ptl allocation

> > Fair enough, and yes, it does create that separate slab. The problem
> > is, it's still using kmalloc/kfree for those beasts -
> > page_ptl_cachep isn't used at all...

* Linus Torvalds <torvalds@linux-foundation.org> [2013-11-20 14:33]:
> Ok, it looks straightforward enough to just replace the kmalloc/kfree
> with using a slab allocation using the page_ptl_cachep pointer. I'd do
> it myself, but I would like to know how it got lost? Also, much
> testing to make sure the cachep is initialized early enough.

The initial sending had the proper hunks at the end, so something really
got lost afterwards...
https://lkml.org/lkml/2013/10/22/129

-- 
Damien Wyart

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [git pull] vfs.git bits and pieces
  2013-11-20 22:40     ` Andrew Morton
@ 2013-11-21 11:19       ` Kirill A. Shutemov
  0 siblings, 0 replies; 8+ messages in thread
From: Kirill A. Shutemov @ 2013-11-21 11:19 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: Al Viro, Kirill A. Shutemov, Linux Kernel Mailing List,
	linux-fsdevel, Ingo Molnar, Peter Zijlstra, linux-mm

Andrew Morton wrote:
> On Wed, 20 Nov 2013 14:33:35 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Wed, Nov 20, 2013 at 9:47 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > BTW, something odd happened to mm/memory.c - either a mangled patch
> > > or a lost followup:
> > >
> > >     commit ea1e7ed33708
> > >     mm: create a separate slab for page->ptl allocation
> > >
> > > Fair enough, and yes, it does create that separate slab.  The problem is,
> > > it's still using kmalloc/kfree for those beasts - page_ptl_cachep isn't
> > > used at all...
> > 
> > Ok, it looks straightforward enough to just replace the kmalloc/kfree
> > with using a slab allocation using the page_ptl_cachep pointer. I'd do
> > it myself, but I would like to know how it got lost? Also, much
> > testing to make sure the cachep is initialized early enough.
> 
> agh, I went through hell keeping that patch alive and it appears I lost
> some of it.

Actually, I've lost it while adding BLOATED_SPINLOCKS :(

> > Or should we just revert the commit that added the pointless/unused
> > slab pointer?
> > 
> > Andrew, Kirill, comments?
> 
> Let's just kill it please.  We can try again for 3.14.

I'm okay with that.
Only side note: it's useful not only for debug case, but also for
PREEMPT_RT where spinlock_t is always bloated.

Fixed patch:

>From e624075b47caa2a15998225df7cec953d271b9ac Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Thu, 14 Nov 2013 14:31:53 -0800
Subject: [PATCH] mm: create a separate slab for page->ptl allocation, try two

If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
is 72 bytes.  For page->ptl they will be allocated from kmalloc-96 slab,
so we loose 24 on each.  An average system can easily allocate few tens
thousands of page->ptl and overhead is significant.

Let's create a separate slab for page->ptl allocation to solve this.

To make sure that it really works this time, some numbers from my test
machine (just booted, no load):

Before:
  # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
  kmalloc-96         31987  32190    128   30    1 : tunables  120   60    8 : slabdata   1073   1073     92
After:
  # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
  page->ptl          27516  28143     72   53    1 : tunables  120   60    8 : slabdata    531    531      9
  kmalloc-96          3853   5280    128   30    1 : tunables  120   60    8 : slabdata    176    176      0

Note that the patch is useful not only for debug case, but also for
PREEMPT_RT, where spinlock_t is always bloated.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |  9 +++++++++
 init/main.c        |  2 +-
 mm/memory.c        | 11 +++++++++--
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1cedd000cf29..0548eb201e05 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1318,6 +1318,7 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
 
 #if USE_SPLIT_PTE_PTLOCKS
 #if BLOATED_SPINLOCKS
+void __init ptlock_cache_init(void);
 extern bool ptlock_alloc(struct page *page);
 extern void ptlock_free(struct page *page);
 
@@ -1326,6 +1327,7 @@ static inline spinlock_t *ptlock_ptr(struct page *page)
 	return page->ptl;
 }
 #else /* BLOATED_SPINLOCKS */
+static inline void ptlock_cache_init(void) {}
 static inline bool ptlock_alloc(struct page *page)
 {
 	return true;
@@ -1378,10 +1380,17 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
 	return &mm->page_table_lock;
 }
+static inline void ptlock_cache_init(void) {}
 static inline bool ptlock_init(struct page *page) { return true; }
 static inline void pte_lock_deinit(struct page *page) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
+static inline void pgtable_init(void)
+{
+	ptlock_cache_init();
+	pgtable_cache_init();
+}
+
 static inline bool pgtable_page_ctor(struct page *page)
 {
 	inc_zone_page_state(page, NR_PAGETABLE);
diff --git a/init/main.c b/init/main.c
index febc511e078a..01573fdfa186 100644
--- a/init/main.c
+++ b/init/main.c
@@ -476,7 +476,7 @@ static void __init mm_init(void)
 	mem_init();
 	kmem_cache_init();
 	percpu_init_late();
-	pgtable_cache_init();
+	pgtable_init();
 	vmalloc_init();
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 5d9025f3b3e1..cf6098c10084 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4272,11 +4272,18 @@ void copy_user_huge_page(struct page *dst, struct page *src,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
 
 #if USE_SPLIT_PTE_PTLOCKS && BLOATED_SPINLOCKS
+static struct kmem_cache *page_ptl_cachep;
+void __init ptlock_cache_init(void)
+{
+	page_ptl_cachep = kmem_cache_create("page->ptl", sizeof(spinlock_t), 0,
+			SLAB_PANIC, NULL);
+}
+
 bool ptlock_alloc(struct page *page)
 {
 	spinlock_t *ptl;
 
-	ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
+	ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
 	if (!ptl)
 		return false;
 	page->ptl = ptl;
@@ -4285,6 +4292,6 @@ bool ptlock_alloc(struct page *page)
 
 void ptlock_free(struct page *page)
 {
-	kfree(page->ptl);
+	kmem_cache_free(page_ptl_cachep, page->ptl);
 }
 #endif
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-11-21 11:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-20 17:42 [git pull] vfs.git bits and pieces Al Viro
2013-11-20 17:47 ` Al Viro
2013-11-20 22:16   ` Al Viro
2013-11-20 22:24     ` Joe Perches
2013-11-20 22:33   ` Linus Torvalds
2013-11-20 22:40     ` Andrew Morton
2013-11-21 11:19       ` Kirill A. Shutemov
2013-11-20 22:42     ` Damien Wyart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).