From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754635Ab0JEPZ7 (ORCPT ); Tue, 5 Oct 2010 11:25:59 -0400 Received: from e31.co.us.ibm.com ([32.97.110.149]:33876 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754309Ab0JEPZ6 (ORCPT ); Tue, 5 Oct 2010 11:25:58 -0400 Subject: Re: [RFC][PATCH] try not to let dirty inodes fester From: Dave Hansen To: Dave Chinner Cc: linux-kernel@vger.kernel.org, hch@infradead.org, lnxninja@linux.vnet.ibm.com, axboe@kernel.dk, pbadari@us.ibm.com, Yuri L Volobuev In-Reply-To: <20101002113238.GF4681@dastard> References: <20101001191449.0AA0E233@kernel.beaverton.ibm.com> <20101002113238.GF4681@dastard> Content-Type: text/plain; charset="UTF-8" Date: Tue, 05 Oct 2010 08:25:45 -0700 Message-ID: <1286292345.9970.4231.camel@nimitz> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 2010-10-02 at 21:32 +1000, Dave Chinner wrote: > On Fri, Oct 01, 2010 at 12:14:49PM -0700, Dave Hansen wrote: > > > > I've got a bug that I've been investigating. The inode cache for a > > certain fs grows and grows, desptite running > > > > echo 2 > /proc/sys/vm/drop_caches > > > > all the time. Not that running drop_caches is a good idea, but it > > _should_ force things to stay under control. That is, unless the > > inodes are dirty. > > What's the filesystem, and what's the test case? It's GPFS, which is a binary blob to me, unfortunately. I've seen some of the same behavior with ext3, but only after changing some of the dirty writeout tunables to absurd values. I think the complication with GPFS in particular is that it doesn't use Linux's buffer cache. We don't trigger any of the page-based dirty watermarks since no _pages_ are being dirtied. I've seen it happen when creating or touching large numbers of empty files. Yuri (cc'd) has seen it happen when mmap()'ing files but not modifying them, since noatime is not set. The original case that we were seeing was an NFS server serving up a GPFS filesystem. > > I think I'm seeing a case where the inode's dentry goes away, it > > hits iput_final(). It is dirty, so it stays off the inode_unused > > list waiting around for writeback. > > Right - it should be on the bdi->wb->b_dirty list waiting to be > expired and written back or already of the expired writeback queueѕ > and waiting to be written again. > > > Then, the periodic writeback happens, and we end up in > > wb_writeback(). One of the first things we do in the loop (before > > writing out inodes) is this: > > > > if (work->for_background && !over_bground_thresh()) > > break; > > Sure, but the periodic ->for_kupdate flushing should be writing > any inode older than 30s and should be running every 5s. hence the > background writeback aborting should not be affecting the cleaning > of dirty inodes. Hence I don't think this is the problem your are > looking for. Yeah, I think you're right. I missed that call site when I was going through it. > Without knowing what filesystem or what you are doing to grow the > inode cache, it's pretty hard to say much more than this.... Thanks for looking at it. I'm trying to see if I can reproduce any of this with any of the in-tree fs's. -- Dave