Re: [BUG] Slab corruption during XFS writeback under memory pressure

From: Dave Chinner <david@fromorbit.com>
To: Calvin Owens <calvinowens@fb.com>
Cc: xfs@oss.sgi.com, linux-kernel@vger.kernel.org,
	kernel-team@fb.com, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org
Subject: Re: [BUG] Slab corruption during XFS writeback under memory pressure
Date: Sun, 17 Jul 2016 10:00:03 +1000	[thread overview]
Message-ID: <20160717000003.GW1922@dastard> (raw)
In-Reply-To: <28f77d74-5ab4-d913-2921-df90da53f393@fb.com>

On Fri, Jul 15, 2016 at 05:18:02PM -0700, Calvin Owens wrote:
> Hello all,
> 
> I've found a nasty source of slab corruption. Based on seeing similar symptoms
> on boxes at Facebook, I suspect it's been around since at least 3.10.
> 
> It only reproduces under memory pressure so far as I can tell: the issue seems
> to be that XFS reclaims pages from buffers that are still in use by
> scsi/block. I'm not sure which side the bug lies on, but I've only observed it
> with XFS.
> 
> [67203.776421] ==================================================================
> [67203.792521] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3bf/0x4c0 at addr ffff8804cf466288
> [67203.812036] Read of size 8 by task python2.7/22913
> [67203.822713] =============================================================================
> [67203.840917] BUG buffer_head (Not tainted): kasan: bad access detected
> [67203.855253] -----------------------------------------------------------------------------
> [67203.855253]
> [67203.876727] Disabling lock debugging due to kernel taint
> [67203.888575] INFO: Allocated in 0xffff8804cf465d40 age=18437180719206552994 cpu=2191548261 pid=-1
> [67203.908139] 	alloc_buffer_head+0x22/0xd0
> [67203.916903] 	___slab_alloc+0x4e0/0x520
> [67203.925286] 	__slab_alloc+0x43/0x70
> [67203.933087] 	kmem_cache_alloc+0x228/0x2c0
> [67203.942042] 	alloc_buffer_head+0x22/0xd0
> [67203.950782] 	alloc_page_buffers+0xa9/0x1f0
> [67203.959936] 	create_empty_buffers+0x30/0x420
> [67203.969495] 	create_page_buffers+0x120/0x1b0
> [67203.979029] 	__block_write_begin+0x16b/0x1010
> [67203.988756] 	xfs_vm_write_begin+0x55/0x1b0
> [67203.997884] 	generic_perform_write+0x288/0x510
> [67204.007771] 	xfs_file_buffered_aio_write+0x316/0x780
> [67204.018811] 	xfs_file_write_iter+0x26f/0x6c0
> [67204.028313] 	__vfs_write+0x2a0/0x620
> [67204.036276] 	vfs_write+0x159/0x4c0
> [67204.043855] 	SyS_write+0xd2/0x1b0
> [67204.051245] INFO: Freed in 0x103fc80ec age=18446651500051355200 cpu=2165122683 pid=-1
> [67204.068634] 	free_buffer_head+0x41/0x90
> [67204.077175] 	__slab_free+0x1ed/0x340
> [67204.085138] 	kmem_cache_free+0x270/0x300
> [67204.093867] 	free_buffer_head+0x41/0x90
> [67204.102422] 	try_to_free_buffers+0x171/0x240
> [67204.111925] 	xfs_vm_releasepage+0xcb/0x3b0
> [67204.121101] 	try_to_release_page+0x106/0x190
> [67204.130602] 	shrink_page_list+0x118e/0x1a10
> [67204.139910] 	shrink_inactive_list+0x42c/0xdf0
> [67204.149600] 	shrink_zone_memcg+0xa09/0xfa0
> [67204.158715] 	shrink_zone+0x2c3/0xbc0
> [67204.166679] 	do_try_to_free_pages+0x42a/0x12f0
> [67204.176562] 	try_to_free_pages+0x1a3/0x5d0
> [67204.185709] 	__alloc_pages_nodemask+0xbeb/0x20d0
> [67204.195979] 	alloc_pages_vma+0x11b/0x5e0
> [67204.204709] 	handle_mm_fault+0x2c27/0x47d0
> [67204.213823] INFO: Slab 0xffffea00133d1900 objects=37 used=14 fp=0xffff8804cf464530 flags=0x2000000000004080
> [67204.235439] INFO: Object 0xffff8804cf466260 @offset=8800 fp=0xbbbbbbbbbbbbbbbb

Hmmm - can only get there with a clean page, so this should be fine.

> [67204.455817] CPU: 1 PID: 22913 Comm: python2.7 Tainted: G    B           4.7.0-rc7-calvinowens-1468357363-00001-gcaa3dc6 #1
> [67204.480313] Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016
> [67204.497509]  ffff88075e99f480 ffff88075ec87a30 ffffffff81e8b8e4 ffff8804cf464000
> [67204.514224]  ffff8804cf466260 ffff88075ec87a60 ffffffff8153a995 ffff88075e99f480
> [67204.530924]  ffffea00133d1900 ffff8804cf466260 dffffc0000000000 ffff88075ec87a88
> [67204.547624] Call Trace:
> [67204.553086]  <IRQ>  [<ffffffff81e8b8e4>] dump_stack+0x68/0x94
> [67204.565946]  [<ffffffff8153a995>] print_trailer+0x115/0x1a0
> [67204.578334]  [<ffffffff81541174>] object_err+0x34/0x40
> [67204.589762]  [<ffffffff815436e7>] kasan_report_error+0x217/0x530
> [67204.616847]  [<ffffffff81543b33>] __asan_report_load8_noabort+0x43/0x50
> [67204.645085]  [<ffffffff819d651f>] xfs_destroy_ioend+0x3bf/0x4c0
> [67204.658243]  [<ffffffff819d69d4>] xfs_end_bio+0x154/0x220
> [67204.685362]  [<ffffffff81de0c58>] bio_endio+0x158/0x1b0
> [67204.696983]  [<ffffffff81dff61b>] blk_update_request+0x18b/0xb80
> [67204.710334]  [<ffffffff821baf57>] scsi_end_request+0x97/0x5a0
> [67204.723108]  [<ffffffff821c5558>] scsi_io_completion+0x438/0x1690
> [67204.807293]  [<ffffffff821a8d95>] scsi_finish_command+0x375/0x4e0
> [67204.820838]  [<ffffffff821c3940>] scsi_softirq_done+0x280/0x340
> [67204.848884]  [<ffffffff81e1e13f>] blk_done_softirq+0x1ff/0x360
> [67204.875074]  [<ffffffff82a08cad>] __do_softirq+0x22d/0x8d7
> [67204.887270]  [<ffffffff8113d42c>] irq_exit+0x15c/0x190
> [67204.898697]  [<ffffffff82a085e3>] smp_apic_timer_interrupt+0x83/0xa0
> [67204.912815]  [<ffffffff82a06b29>] apic_timer_interrupt+0x89/0x90
> [67205.029113] ==================================================================

But this indicates that the page is under writeback at this point,
so that tends to indicate that the above freeing was incorrect.

Hmmm - it's clear we've got direct reclaim involved here, and the
suspicion of a dirty page that has had it's bufferheads cleared.
Are there any other warnings in the log from XFS prior to kasan
throwing the error?

> Is there anything else I can send that might be helpful?

full console/dmesg output from a crashed machine, plus:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> --
> /*
>  * Run as "./repro outfile 1000", where "outfile" sits on an XFS filesystem.
>  */
> 
> #include <stdlib.h>
> #include <stdio.h>
> #include <fcntl.h>
> #include <signal.h>
> #include <unistd.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <sys/mman.h>
> 
> #define CHUNK (32768)
> 
> static const char crap[CHUNK];
> 
> int main(int argc, char **argv)
> {
> 	int r, fd, i;
> 	size_t allocsize, count;
> 	void *p;
> 
> 	if (argc != 3) {
> 		printf("Usage: %s filename count\n", argv[0]);
> 		return 1;
> 	}
> 
> 	fd = open(argv[1], O_RDWR|O_CREAT, 0644);
> 	if (fd == -1) {
> 		perror("Can't open");
> 		return 1;
> 	}
> 
> 	if (!fork()) {
> 		count = atol(argv[2]);
> 
> 		while (1) {
> 			for (i = 0; i < count; i++)
> 				if (write(fd, crap, CHUNK) != CHUNK)
> 					perror("Eh?");
> 
> 			fsync(fd);
> 			ftruncate(fd, 0);
> 		}

Hmmmm. Truncate is used, but only after fsync. If the truncate
is removed, does the problem go away?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com