linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* WARNING in shmem_evict_inode
@ 2015-11-09  8:55 Dmitry Vyukov
  2015-11-23  8:30 ` Dmitry Vyukov
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Vyukov @ 2015-11-09  8:55 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton, linux-mm@kvack.org, LKML,
	Sasha Levin
  Cc: syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet

Hello,

The following program:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include <syscall.h>
#include <string.h>
#include <stdint.h>
#include <pthread.h>

#define SYS_memfd_create 319

long fd;

void *thr(void *p)
{
        syscall(SYS_ftruncate, fd, 0x8ul, 0, 0, 0, 0);
        return 0;
}

int main()
{
        pthread_t th;

        syscall(SYS_mmap, 0x20000000ul, 0x10000ul, 0x3ul, 0x32ul,
0xfffffffffffffffful, 0x0ul);
        memcpy((void*)0x20000f96, "\x23\x65\x6d\x31\x07\x2b\x27\x29\x00", 9);
        fd = syscall(SYS_memfd_create, 0x20000f96ul, 0x2ul, 0, 0, 0, 0);
        syscall(SYS_fallocate, fd, 0x0ul, 0x31d89288ul, 0x4ul, 0, 0);
        syscall(SYS_mmap, 0x20061000ul, 0xc00000ul,
0x1a9d91e04768640bul, 0x11ul, fd, 0x0ul);
        pthread_create(&th, 0, thr, 0);
        syscall(SYS_fstat, fd, 0x20550fcful, 0, 0, 0, 0);
        pthread_join(th, 0);
        return 0;
}


triggers WARNING in shmem_evict_inode:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 10442 at mm/shmem.c:625 shmem_evict_inode+0x335/0x480()
Modules linked in:
CPU: 1 PID: 8944 Comm: executor Not tainted 4.3.0+ #39
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 00000000ffffffff ffff88006c6afab8 ffffffff81aad406 0000000000000000
 ffff88006e39ac80 ffffffff83091660 ffff88006c6afaf8 ffffffff81100829
 ffffffff814192e5 ffffffff83091660 0000000000000271 ffff88003d075aa8
Call Trace:
 [<ffffffff81100a59>] warn_slowpath_null+0x29/0x30 kernel/panic.c:480
 [<ffffffff814192e5>] shmem_evict_inode+0x335/0x480 mm/shmem.c:625
 [<ffffffff8151560e>] evict+0x26e/0x580 fs/inode.c:542
 [<     inline     >] iput_final fs/inode.c:1477
 [<ffffffff81515f30>] iput+0x4a0/0x790 fs/inode.c:1504
 [<     inline     >] dentry_iput fs/dcache.c:358
 [<ffffffff8150667e>] __dentry_kill+0x4fe/0x700 fs/dcache.c:543
 [<     inline     >] dentry_kill fs/dcache.c:587
 [<ffffffff8150be7b>] dput+0x6ab/0x7a0 fs/dcache.c:796
 [<ffffffff814c499b>] __fput+0x3fb/0x6e0 fs/file_table.c:226
 [<ffffffff814c4d05>] ____fput+0x15/0x20 fs/file_table.c:244
 [<ffffffff8115ab23>] task_work_run+0x163/0x1f0 kernel/task_work.c:115
 [<     inline     >] exit_task_work include/linux/task_work.h:21
 [<ffffffff81105049>] do_exit+0x7f9/0x2b80 kernel/exit.c:748
 [<ffffffff8110b268>] do_group_exit+0x108/0x320 kernel/exit.c:878
 [<     inline     >] SYSC_exit_group kernel/exit.c:889
 [<ffffffff8110b49d>] SyS_exit_group+0x1d/0x20 kernel/exit.c:887
---[ end trace 43da88a03e29c2a5 ]---


Run the program in a loop, as the WARNING seems to be triggered by a race.

On commit d1e41ff11941784f469f17795a4d9425c2eb4b7a (Nov 5).
But I was also able to reproduce it on a 3.11-based kernel.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: WARNING in shmem_evict_inode
  2015-11-09  8:55 WARNING in shmem_evict_inode Dmitry Vyukov
@ 2015-11-23  8:30 ` Dmitry Vyukov
  2015-12-02  9:29   ` Hugh Dickins
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Vyukov @ 2015-11-23  8:30 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton, linux-mm@kvack.org, LKML,
	Sasha Levin
  Cc: syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet

On Mon, Nov 9, 2015 at 9:55 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> The following program:
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #include <syscall.h>
> #include <string.h>
> #include <stdint.h>
> #include <pthread.h>
>
> #define SYS_memfd_create 319
>
> long fd;
>
> void *thr(void *p)
> {
>         syscall(SYS_ftruncate, fd, 0x8ul, 0, 0, 0, 0);
>         return 0;
> }
>
> int main()
> {
>         pthread_t th;
>
>         syscall(SYS_mmap, 0x20000000ul, 0x10000ul, 0x3ul, 0x32ul,
> 0xfffffffffffffffful, 0x0ul);
>         memcpy((void*)0x20000f96, "\x23\x65\x6d\x31\x07\x2b\x27\x29\x00", 9);
>         fd = syscall(SYS_memfd_create, 0x20000f96ul, 0x2ul, 0, 0, 0, 0);
>         syscall(SYS_fallocate, fd, 0x0ul, 0x31d89288ul, 0x4ul, 0, 0);
>         syscall(SYS_mmap, 0x20061000ul, 0xc00000ul,
> 0x1a9d91e04768640bul, 0x11ul, fd, 0x0ul);
>         pthread_create(&th, 0, thr, 0);
>         syscall(SYS_fstat, fd, 0x20550fcful, 0, 0, 0, 0);
>         pthread_join(th, 0);
>         return 0;
> }
>
>
> triggers WARNING in shmem_evict_inode:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 10442 at mm/shmem.c:625 shmem_evict_inode+0x335/0x480()
> Modules linked in:
> CPU: 1 PID: 8944 Comm: executor Not tainted 4.3.0+ #39
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  00000000ffffffff ffff88006c6afab8 ffffffff81aad406 0000000000000000
>  ffff88006e39ac80 ffffffff83091660 ffff88006c6afaf8 ffffffff81100829
>  ffffffff814192e5 ffffffff83091660 0000000000000271 ffff88003d075aa8
> Call Trace:
>  [<ffffffff81100a59>] warn_slowpath_null+0x29/0x30 kernel/panic.c:480
>  [<ffffffff814192e5>] shmem_evict_inode+0x335/0x480 mm/shmem.c:625
>  [<ffffffff8151560e>] evict+0x26e/0x580 fs/inode.c:542
>  [<     inline     >] iput_final fs/inode.c:1477
>  [<ffffffff81515f30>] iput+0x4a0/0x790 fs/inode.c:1504
>  [<     inline     >] dentry_iput fs/dcache.c:358
>  [<ffffffff8150667e>] __dentry_kill+0x4fe/0x700 fs/dcache.c:543
>  [<     inline     >] dentry_kill fs/dcache.c:587
>  [<ffffffff8150be7b>] dput+0x6ab/0x7a0 fs/dcache.c:796
>  [<ffffffff814c499b>] __fput+0x3fb/0x6e0 fs/file_table.c:226
>  [<ffffffff814c4d05>] ____fput+0x15/0x20 fs/file_table.c:244
>  [<ffffffff8115ab23>] task_work_run+0x163/0x1f0 kernel/task_work.c:115
>  [<     inline     >] exit_task_work include/linux/task_work.h:21
>  [<ffffffff81105049>] do_exit+0x7f9/0x2b80 kernel/exit.c:748
>  [<ffffffff8110b268>] do_group_exit+0x108/0x320 kernel/exit.c:878
>  [<     inline     >] SYSC_exit_group kernel/exit.c:889
>  [<ffffffff8110b49d>] SyS_exit_group+0x1d/0x20 kernel/exit.c:887
> ---[ end trace 43da88a03e29c2a5 ]---
>
>
> Run the program in a loop, as the WARNING seems to be triggered by a race.
>
> On commit d1e41ff11941784f469f17795a4d9425c2eb4b7a (Nov 5).
> But I was also able to reproduce it on a 3.11-based kernel.


Hello,

This is still happening periodically for me. Is anybody looking at this?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: WARNING in shmem_evict_inode
  2015-11-23  8:30 ` Dmitry Vyukov
@ 2015-12-02  9:29   ` Hugh Dickins
  2015-12-16 19:23     ` Holger Hoffstätte
  0 siblings, 1 reply; 4+ messages in thread
From: Hugh Dickins @ 2015-12-02  9:29 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Hugh Dickins, Andrew Morton, linux-mm@kvack.org, LKML,
	Sasha Levin, syzkaller, Kostya Serebryany, Alexander Potapenko,
	Eric Dumazet, Greg Thelen

On Mon, 23 Nov 2015, Dmitry Vyukov wrote:
> On Mon, Nov 9, 2015 at 9:55 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > Hello,
> >
> > The following program:
> >
> > // autogenerated by syzkaller (http://github.com/google/syzkaller)
> > #include <syscall.h>
> > #include <string.h>
> > #include <stdint.h>
> > #include <pthread.h>
> >
> > #define SYS_memfd_create 319
> >
> > long fd;
> >
> > void *thr(void *p)
> > {
> >         syscall(SYS_ftruncate, fd, 0x8ul, 0, 0, 0, 0);
> >         return 0;
> > }
> >
> > int main()
> > {
> >         pthread_t th;
> >
> >         syscall(SYS_mmap, 0x20000000ul, 0x10000ul, 0x3ul, 0x32ul,
> > 0xfffffffffffffffful, 0x0ul);
> >         memcpy((void*)0x20000f96, "\x23\x65\x6d\x31\x07\x2b\x27\x29\x00", 9);
> >         fd = syscall(SYS_memfd_create, 0x20000f96ul, 0x2ul, 0, 0, 0, 0);
> >         syscall(SYS_fallocate, fd, 0x0ul, 0x31d89288ul, 0x4ul, 0, 0);
> >         syscall(SYS_mmap, 0x20061000ul, 0xc00000ul,
> > 0x1a9d91e04768640bul, 0x11ul, fd, 0x0ul);
> >         pthread_create(&th, 0, thr, 0);
> >         syscall(SYS_fstat, fd, 0x20550fcful, 0, 0, 0, 0);
> >         pthread_join(th, 0);
> >         return 0;
> > }
> >
> >
> > triggers WARNING in shmem_evict_inode:
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 10442 at mm/shmem.c:625 shmem_evict_inode+0x335/0x480()
> > Modules linked in:
> > CPU: 1 PID: 8944 Comm: executor Not tainted 4.3.0+ #39
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >  00000000ffffffff ffff88006c6afab8 ffffffff81aad406 0000000000000000
> >  ffff88006e39ac80 ffffffff83091660 ffff88006c6afaf8 ffffffff81100829
> >  ffffffff814192e5 ffffffff83091660 0000000000000271 ffff88003d075aa8
> > Call Trace:
> >  [<ffffffff81100a59>] warn_slowpath_null+0x29/0x30 kernel/panic.c:480
> >  [<ffffffff814192e5>] shmem_evict_inode+0x335/0x480 mm/shmem.c:625
> >  [<ffffffff8151560e>] evict+0x26e/0x580 fs/inode.c:542
> >  [<     inline     >] iput_final fs/inode.c:1477
> >  [<ffffffff81515f30>] iput+0x4a0/0x790 fs/inode.c:1504
> >  [<     inline     >] dentry_iput fs/dcache.c:358
> >  [<ffffffff8150667e>] __dentry_kill+0x4fe/0x700 fs/dcache.c:543
> >  [<     inline     >] dentry_kill fs/dcache.c:587
> >  [<ffffffff8150be7b>] dput+0x6ab/0x7a0 fs/dcache.c:796
> >  [<ffffffff814c499b>] __fput+0x3fb/0x6e0 fs/file_table.c:226
> >  [<ffffffff814c4d05>] ____fput+0x15/0x20 fs/file_table.c:244
> >  [<ffffffff8115ab23>] task_work_run+0x163/0x1f0 kernel/task_work.c:115
> >  [<     inline     >] exit_task_work include/linux/task_work.h:21
> >  [<ffffffff81105049>] do_exit+0x7f9/0x2b80 kernel/exit.c:748
> >  [<ffffffff8110b268>] do_group_exit+0x108/0x320 kernel/exit.c:878
> >  [<     inline     >] SYSC_exit_group kernel/exit.c:889
> >  [<ffffffff8110b49d>] SyS_exit_group+0x1d/0x20 kernel/exit.c:887
> > ---[ end trace 43da88a03e29c2a5 ]---
> >
> >
> > Run the program in a loop, as the WARNING seems to be triggered by a race.
> >
> > On commit d1e41ff11941784f469f17795a4d9425c2eb4b7a (Nov 5).
> > But I was also able to reproduce it on a 3.11-based kernel.
> 
> 
> Hello,
> 
> This is still happening periodically for me. Is anybody looking at this?

It was more interesting than I expected, thanks.
I believe you will find that this fixes it.

[PATCH] tmpfs: fix shmem_evict_inode warnings on i_blocks

Dmitry Vyukov provides a little program, autogenerated by syzkaller,
which races a fault on a mapping of a sparse memfd object, against
truncation of that object below the fault address: run repeatedly
for a few minutes, it reliably generates shmem_evict_inode()'s
WARN_ON(inode->i_blocks).

(But there's nothing specific to memfd here, nor to the fstat which it
happened to use to generate the fault: though that looked suspicious,
since a shmem_recalc_inode() had been added there recently.  The same
problem can be reproduced with open+unlink in place of memfd_create,
and with fstatfs in place of fstat.)

v3.7 commit 0f3c42f522dc ("tmpfs: change final i_blocks BUG to WARNING")
explains one cause of such a warning (a race with shmem_writepage to swap),
and possible solutions; but we never took it further, and this syzkaller
incident turns out to have a different cause.

shmem_getpage_gfp()'s error recovery, when a freshly allocated page is
then found to be beyond eof, looks plausible - decrementing the alloced
count that was just before incremented - but in fact can go wrong, if a
racing thread (the truncator, for example) gets its shmem_recalc_inode()
in just after our delete_from_page_cache().  delete_from_page_cache()
decrements nrpages, that shmem_recalc_inode() will balance the books
by decrementing alloced itself, then our decrement of alloced take it
one too low: leading to the WARNING when the object is finally evicted.

Once the new page has been exposed in the page cache, shmem_getpage_gfp()
must leave it to shmem_recalc_inode() itself to get the accounting right
in all cases (and not fall through from "trunc:" to "decused:").  Adjust
that error recovery block; and the reinitialization of info and sbinfo
can be removed too.

While we're here, fix shmem_writepage() to avoid the original issue:
it will be safe against a racing shmem_recalc_inode(), if it merely
increments swapped before the shmem_delete_from_page_cache() which
decrements nrpages (but it must then do its own shmem_recalc_inode()
before that, while still in balance, instead of after).  (Aside: why
do we shmem_recalc_inode() here in the swap path?  Because its raison
d'etre is to cope with clean sparse shmem pages being reclaimed behind
our back: so here when swapping is a good place to look for that case.)
But I've not now managed to reproduce this bug, even without the patch.

I don't see why I didn't do that earlier: perhaps inhibited by the
preference to eliminate shmem_recalc_inode() altogether.  Driven by this
incident, I do now have a patch to do so at last; but still want to sit
on it for a bit, there's a couple of questions yet to be resolved.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
---
Cc stable?  I don't think that's necessary, but might be proved wrong:
along with the warning, the bug does allow one page beyond the limit
to be allocated from a size-limited tmpfs mount.

 mm/shmem.c |   34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)

--- 4.4-rc3/mm/shmem.c	2015-11-15 21:06:56.513752469 -0800
+++ linux/mm/shmem.c	2015-11-30 17:38:42.337790242 -0800
@@ -843,14 +843,14 @@ static int shmem_writepage(struct page *
 		list_add_tail(&info->swaplist, &shmem_swaplist);
 
 	if (add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
-		swap_shmem_alloc(swap);
-		shmem_delete_from_page_cache(page, swp_to_radix_entry(swap));
-
 		spin_lock(&info->lock);
-		info->swapped++;
 		shmem_recalc_inode(inode);
+		info->swapped++;
 		spin_unlock(&info->lock);
 
+		swap_shmem_alloc(swap);
+		shmem_delete_from_page_cache(page, swp_to_radix_entry(swap));
+
 		mutex_unlock(&shmem_swaplist_mutex);
 		BUG_ON(page_mapped(page));
 		swap_writepage(page, wbc);
@@ -1078,7 +1078,7 @@ repeat:
 	if (sgp != SGP_WRITE && sgp != SGP_FALLOC &&
 	    ((loff_t)index << PAGE_CACHE_SHIFT) >= i_size_read(inode)) {
 		error = -EINVAL;
-		goto failed;
+		goto unlock;
 	}
 
 	if (page && sgp == SGP_WRITE)
@@ -1246,11 +1246,15 @@ clear:
 	/* Perhaps the file has been truncated since we checked */
 	if (sgp != SGP_WRITE && sgp != SGP_FALLOC &&
 	    ((loff_t)index << PAGE_CACHE_SHIFT) >= i_size_read(inode)) {
+		if (alloced) {
+			ClearPageDirty(page);
+			delete_from_page_cache(page);
+			spin_lock(&info->lock);
+			shmem_recalc_inode(inode);
+			spin_unlock(&info->lock);
+		}
 		error = -EINVAL;
-		if (alloced)
-			goto trunc;
-		else
-			goto failed;
+		goto unlock;
 	}
 	*pagep = page;
 	return 0;
@@ -1258,23 +1262,13 @@ clear:
 	/*
 	 * Error recovery.
 	 */
-trunc:
-	info = SHMEM_I(inode);
-	ClearPageDirty(page);
-	delete_from_page_cache(page);
-	spin_lock(&info->lock);
-	info->alloced--;
-	inode->i_blocks -= BLOCKS_PER_PAGE;
-	spin_unlock(&info->lock);
 decused:
-	sbinfo = SHMEM_SB(inode->i_sb);
 	if (sbinfo->max_blocks)
 		percpu_counter_add(&sbinfo->used_blocks, -1);
 unacct:
 	shmem_unacct_blocks(info->flags, 1);
 failed:
-	if (swap.val && error != -EINVAL &&
-	    !shmem_confirm_swap(mapping, index, swap))
+	if (swap.val && !shmem_confirm_swap(mapping, index, swap))
 		error = -EEXIST;
 unlock:
 	if (page) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: WARNING in shmem_evict_inode
  2015-12-02  9:29   ` Hugh Dickins
@ 2015-12-16 19:23     ` Holger Hoffstätte
  0 siblings, 0 replies; 4+ messages in thread
From: Holger Hoffstätte @ 2015-12-16 19:23 UTC (permalink / raw)
  To: Hugh Dickins, Dmitry Vyukov
  Cc: Andrew Morton, linux-mm@kvack.org, LKML, Sasha Levin, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet, Greg Thelen

On 12/02/15 10:29, Hugh Dickins wrote:
> On Mon, 23 Nov 2015, Dmitry Vyukov wrote:
>> On Mon, Nov 9, 2015 at 9:55 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
[snip]
>>> triggers WARNING in shmem_evict_inode:
>>>
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 10442 at mm/shmem.c:625 shmem_evict_inode+0x335/0x480()
>>> Modules linked in:
>>> CPU: 1 PID: 8944 Comm: executor Not tainted 4.3.0+ #39
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>>  00000000ffffffff ffff88006c6afab8 ffffffff81aad406 0000000000000000
>>>  ffff88006e39ac80 ffffffff83091660 ffff88006c6afaf8 ffffffff81100829
>>>  ffffffff814192e5 ffffffff83091660 0000000000000271 ffff88003d075aa8
>>> Call Trace:
>>>  [<ffffffff81100a59>] warn_slowpath_null+0x29/0x30 kernel/panic.c:480
>>>  [<ffffffff814192e5>] shmem_evict_inode+0x335/0x480 mm/shmem.c:625
>>>  [<ffffffff8151560e>] evict+0x26e/0x580 fs/inode.c:542
>>>  [<     inline     >] iput_final fs/inode.c:1477
[snip]
> It was more interesting than I expected, thanks.
> I believe you will find that this fixes it.
> 
> [PATCH] tmpfs: fix shmem_evict_inode warnings on i_blocks

Since I just saw this in Linus' tree, here's another retrospective bug
report and Thank You for fixing it. :-)

The problem is quite real, even though I'm probably the only other person
to ever report it, see: http://www.spinics.net/lists/linux-fsdevel/msg83567.html

> Cc stable?  I don't think that's necessary, but might be proved wrong:
> along with the warning, the bug does allow one page beyond the limit
> to be allocated from a size-limited tmpfs mount.

It applies and works fine, so it probably wouldn't hurt. I'm using it in my
4.1++ tree as we speak, no problems.

-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-12-16 19:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-09  8:55 WARNING in shmem_evict_inode Dmitry Vyukov
2015-11-23  8:30 ` Dmitry Vyukov
2015-12-02  9:29   ` Hugh Dickins
2015-12-16 19:23     ` Holger Hoffstätte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).