From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id AB69F6B0085 for ; Wed, 10 Nov 2010 18:38:10 -0500 (EST) Date: Wed, 10 Nov 2010 15:37:29 -0800 From: Andrew Morton Subject: Re: [PATCH 3/5] writeback: stop background/kupdate works from livelocking other works Message-Id: <20101110153729.81ae6b19.akpm@linux-foundation.org> In-Reply-To: <20101109235632.GD11214@quack.suse.cz> References: <20101108230916.826791396@intel.com> <20101108231726.993880740@intel.com> <20101109131310.f442d210.akpm@linux-foundation.org> <20101109222827.GJ4936@quack.suse.cz> <20101109150006.05892241.akpm@linux-foundation.org> <20101109235632.GD11214@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Jan Kara Cc: Wu Fengguang , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Christoph Hellwig , Jan Engelhardt , LKML List-ID: On Wed, 10 Nov 2010 00:56:32 +0100 Jan Kara wrote: > On Tue 09-11-10 15:00:06, Andrew Morton wrote: > > On Tue, 9 Nov 2010 23:28:27 +0100 > > Jan Kara wrote: > > > New description which should address above questions: > > > Background writeback is easily livelockable in a loop in wb_writeback() by > > > a process continuously re-dirtying pages (or continuously appending to a > > > file). This is in fact intended as the target of background writeback is to > > > write dirty pages it can find as long as we are over > > > dirty_background_threshold. > > > > Well. The objective of the kupdate function is utterly different. > > > > > But the above behavior gets inconvenient at times because no other work > > > queued in the flusher thread's queue gets processed. In particular, > > > since e.g. sync(1) relies on flusher thread to do all the IO for it, > > > > That's fixable by doing the work synchronously within sync_inodes_sb(), > > rather than twiddling thumbs wasting a thread resource while waiting > > for kernel threads to do it. As an added bonus, this even makes cpu > > time accounting more accurate ;) > > > > Please remind me why we decided to hand the sync_inodes_sb() work off > > to other threads? > Because when sync(1) does IO on it's own, it competes for the device with > the flusher thread running in parallel thus resulting in more seeks. Skeptical. Has that effect been demonstrated? Has it been shown to be a significant problem? A worse problem than livelocking the machine? ;) If this _is_ a problem then it's also a problem for fsync/msync. But see below. > > > sync(1) can hang forever waiting for flusher thread to do the work. > > > > > > Generally, when a flusher thread has some work queued, someone submitted > > > the work to achieve a goal more specific than what background writeback > > > does. Moreover by working on the specific work, we also reduce amount of > > > dirty pages which is exactly the target of background writeout. So it makes > > > sense to give specific work a priority over a generic page cleaning. > > > > > > Thus we interrupt background writeback if there is some other work to do. We > > > return to the background writeback after completing all the queued work. > > > > ... > > > > So... what prevents higher priority works (eg, sync(1)) from > > > > livelocking or seriously retarding background or kudate writeout? > > > If other work than background or kupdate writeout livelocks, it's a bug > > > which should be fixed (either by setting sensible nr_to_write or by tagging > > > like we do it for WB_SYNC_ALL writeback). Of course, higher priority work > > > can be running when background or kupdate writeout would need to run as > > > well. But the idea here is that the purpose of background/kupdate types of > > > writeout is to get rid of dirty data and any type of writeout does this so > > > working on it we also work on background/kupdate writeout only possibly > > > less efficiently. > > > > The kupdate function is a data-integrity/quality-of-service sort of > > thing. > > > > And what I'm asking is whether this change enables scenarios in which > > these threads can be kept so busy that the kupdate function gets > > interrupted so frequently that we can have dirty memory not being > > written back for arbitrarily long periods of time? > So let me compare: > What kupdate writeback does: > queue inodes older than dirty_expire_centisecs > while some inode in the queue > write MAX_WRITEBACK_PAGES from each inode queued > break if nr_to_write <= 0 > > What any other WB_SYNC_NONE writeback (let me call it "normal WB_SYNC_NONE > writeback") does: > queue all dirty inodes > while some inode in the queue > write MAX_WRITEBACK_PAGES from each inode queued > break if nr_to_write <= 0 > > > There only one kind of WB_SYNC_ALL writeback - the one which writes > everything. fsync() uses WB_SYNC_ALL and it doesn't write "everything". > So after WB_SYNC_ALL writeback (provided all livelocks are fixed ;) > obviously no old data should be unwritten in memory. Normal WB_SYNC_NONE > writeback differs from a kupdate one *only* in the fact that we queue all > inodes instead of only the old ones. whoa. That's only true for sync(), or sync_filesystem(). It isn't true for, say, fsync() and msync(). Now when someone comes along and changes fsync/msync to use flusher threads ("resulting in less seeks") then we could get into a situation where fsync-serving flusher threads never serve kupdate? > We start writing old inodes first and OT, but: your faith in those time-ordered inode lists is touching ;) Put a debug function in there which checks that the lists _are_ time-ordered, and call that function from every site in the kernel which modifies the lists. I bet there are still gremlins. > go inode by inode writing MAX_WRITEBACK_PAGES from each. Now because the > queue can be longer for normal WB_SYNC_NONE writeback, it can take longer > before we return to the old inodes. So if normal writeback interrupts > kupdate one, it can take longer before all data of old inodes get to disk. > But we always get the old data to disk - essentially at the same time at > which kupdate writeback would get them to disk if dirty_expire_centisecs > was 0. > > Is this enough? Do you want any of this in the changelog? > > Thanks for the inquiry btw. It made me cleanup my thoughts on the subject ;) > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org