From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:30052 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387874AbfGYQA3 (ORCPT ); Thu, 25 Jul 2019 12:00:29 -0400 Date: Thu, 25 Jul 2019 12:00:27 -0400 From: Brian Foster Subject: Re: xfs: garbage file data inclusion bug under memory pressure Message-ID: <20190725160027.GB5221@bfoster> References: <20190725105350.GA5221@bfoster> <0f507f3d-eed0-f6b8-48fe-acc9fd872d6b@i-love.sakura.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0f507f3d-eed0-f6b8-48fe-acc9fd872d6b@i-love.sakura.ne.jp> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Tetsuo Handa Cc: linux-xfs@vger.kernel.org On Thu, Jul 25, 2019 at 09:30:01PM +0900, Tetsuo Handa wrote: > On 2019/07/25 19:53, Brian Foster wrote: > > This is a known problem. XFS delayed allocation has a window between > > delalloc to real block conversion and writeback completion where stale > > data exposure is possible if the writeback doesn't complete (i.e., due > > to crash, I/O error, etc.). See fstests generic/536 for another > > reference. We've batted around potential solutions like using unwritten > > extents for delalloc allocations, but IIRC we haven't been able to come > > up with something with suitable performance to this point. > > > > I'm curious why your OOM test results in writeback errors in the first > > place. Is that generally expected? Does dmesg show any other XFS related > > events, such as filesystem shutdown for example? I gave it a quick try > > on a 4GB swapless VM and it doesn't trigger OOM. What's your memory > > configuration and what does the /tmp filesystem look like ('xfs_info > > /tmp')? > > Writeback errors should not happen by just close-to-OOM situation. > And there is no other XFS related events. > Indeed, that is strange. ... > > Kernel config is http://I-love.SAKURA.ne.jp/tmp/config-5.3-rc1 . > > Below result is from a different VM which shows the same problem. > > # xfs_info /tmp > meta-data=/dev/sda1 isize=256 agcount=4, agsize=16383936 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=0 finobt=0 spinodes=0 > data = bsize=4096 blocks=65535744, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=0 > log =internal bsize=4096 blocks=31999, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > I ran your oom-torture.c (without the fs fill step) tool again after dropping VM RAM to 3GB and still had to invoke some usemem (from fstests) instances to consume memory before OOM triggered. I eventually reproduced oom-torture OOM kills but did not reproduce writeback errors. I've only run it once, but this is against a virtio vdisk backing lvm+XFS in the guest. What is your target device here? Is it failing independently by chance? Brian > >