From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp2120.oracle.com ([156.151.31.85]:36766 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726547AbfGYR2i (ORCPT ); Thu, 25 Jul 2019 13:28:38 -0400 Date: Thu, 25 Jul 2019 10:28:30 -0700 From: "Darrick J. Wong" Subject: Re: xfs: garbage file data inclusion bug under memory pressure Message-ID: <20190725172830.GE1561054@magnolia> References: <20190725113231.GV7689@dread.disaster.area> <804d24cb-5b7c-4620-5a5f-4ec039472086@i-love.sakura.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <804d24cb-5b7c-4620-5a5f-4ec039472086@i-love.sakura.ne.jp> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Tetsuo Handa Cc: Dave Chinner , linux-xfs@vger.kernel.org On Thu, Jul 25, 2019 at 09:44:35PM +0900, Tetsuo Handa wrote: > On 2019/07/25 20:32, Dave Chinner wrote: > > You've had writeback errors. This is somewhat expected behaviour for > > most filesystems when there are write errors - space has been > > allocated, but whatever was to be written into that allocated space > > failed for some reason so it remains in an uninitialised state.... > > This is bad for security perspective. The data I found are e.g. random > source file, /var/log/secure , SQL database server's access log > containing secret values... Woot. That's bad. By any chance do the duo of patches: https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?id=bd012b434a56d9fac3cbc33062b8e2cd6e1ad0a0 https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?id=adcf7c0c87191fd3616813c8ce9790f89a9a8eba fix this problem? I wrote them a while ago but I never got around to quantifying how much of a performance impact they'd have. > > For XFS and sequential writes, the on-disk file size is not extended > > on an IO error, hence it should not expose stale data. However, > > your test code is not checking for errors - that's a bug in your > > test code - and that's why writeback errors are resulting in stale > > data exposure. i.e. by ignoring the fsync() error, > > the test continues writing at the next offset and the fsync() for > > that new data write exposes the region of stale data in the > > file where the previous data write failed by extending the on-disk > > EOF past it.... > > > > So in this case stale data exposure is a side effect of not > > handling writeback errors appropriately in the application. > > But blaming users regarding not handling writeback errors is pointless > when thinking from security perspective. A bad guy might be trying to > steal data from inaccessible files. My thoughts exactly. I'm not sure what data is supposed to be read() from a file after a write error but I'm pretty sure that "someone else's discarded junk" is /not/ in that set. > > > > > But I have to ask: what is causing the IO to fail? OOM conditions > > should not cause writeback errors - XFS will retry memory > > allocations until they succeed, and the block layer is supposed to > > be resilient against memory shortages, too. Hence I'd be interested > > to know what is actually failing here... > > Yeah. It is strange that this problem occurs when close-to-OOM. > But no failure messages at all (except OOM killer messages and writeback > error messages). That /is/ strange. I wonder if your scsi driver is trying to allocate memory, failing, and the block layer squishes that into EIO? --D