Steve Dickson wrote: > > It was brought to my attention that following series of events > would cause an infinite loop in the 2.4 nfs kernels. > > 1) Mount the fileystem with acregmin=1,acregmax=1 from two clients. > 2) On client 1, create a process that continuously writes to a file. > 3) On client 2, remove that file that is being written > 4) On client 1, interrupted out of the writing process (which is failing > with ESTALEs) and type sync > Here is an update patch to this problem. My original patch does avoid the infinite loop but didn't address the actual cause of the loop. The attached patch does... and here is what is happening: A process is continuity writing to a broken (i.e ESTALE) fd which is queuing up pages to be sent out. A getattr happens (due a cache time out) which fails with ESTALE so _nfs_revalidate_inode() removes the inode from the hash list: if (status == -ESTALE) { NFS_FLAGS(inode) |= NFS_INO_STALE; if (inode != inode->i_sb->s_root->d_inode) remove_inode_hash(inode); } Now when __sync_one() comes along and see the dirty pages, the inode is added to the locked inode list, data is sync-ed out and then __refile_inode() is called: <> list_add(&inode->i_list, &inode->i_sb->s_locked_inodes); <> inode->i_state |= I_LOCK; /* write out data */ inode->i_state &= ~I_LOCK; if (!(inode->i_state & I_FREEING)) __refile_inode(inode); Now here is the problem! Since the inode is has already been removed from the i_hash list, the inode is never refiled __refile_inode(inode): if (inode->i_state & I_FREEING) return; if (list_empty(&inode->i_hash)) return; which causes the infinite loop because the node is never removed from the locked inode list. Now my original patch avoid this loop because __nfs_revalidate_inode() saw the inode was stale before it removed the inode from the hash list. The attached patch still "breaks" the inode earlier (since is stop a bunch of unnecessary i/o) but it also it removes the call to remove_inode_hash() in _nfs_revalidate_inode() which is the real cause of the problem.... So code in question is: if (inode != inode->i_sb->s_root->d_inode) remove_inode_hash(inode); and I hoping someone can shed some light on as to why the inode is being removed from the i_hash list with an ESTALE failure. Does it make sense to remove an inode from the i_hash when there are dirty pages? steved.