From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:37839 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750961AbcBMRro (ORCPT ); Sat, 13 Feb 2016 12:47:44 -0500 Date: Sat, 13 Feb 2016 17:47:38 +0000 From: Al Viro To: Mike Marshall Cc: Martin Brandenburg , Linus Torvalds , linux-fsdevel , Stephen Rothwell Subject: Re: Orangefs ABI documentation Message-ID: <20160213174738.GR17997@ZenIV.linux.org.uk> References: <20160209221623.GI17997@ZenIV.linux.org.uk> <20160209224050.GJ17997@ZenIV.linux.org.uk> <20160209231328.GK17997@ZenIV.linux.org.uk> <20160211004432.GM17997@ZenIV.linux.org.uk> <20160212042757.GP17997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, Feb 13, 2016 at 12:18:12PM -0500, Mike Marshall wrote: > I added the patches, and ran a bunch of tests. > > Stuff works fine when left unbothered, and also > when wrenches are thrown into the works. > > I had multiple userspace things going on at the > same time, dbench, ls -R, find... kill -9 or control-C on > any of them is handled well. When I killed both > the client-core and its restarter, the kernel > dealt with swarm of ops that had nowhere > to go... the WARN_ON in service_operation > was hit. > > Feb 12 16:19:12 be1 kernel: [ 3658.167544] orangefs: please confirm > that pvfs2-client daemon is running. > Feb 12 16:19:12 be1 kernel: [ 3658.167547] fs/orangefs/dir.c line 264: > orangefs_readdir: orangefs_readdir_index_get() failure (-5) I.e. bufmap is gone. > Feb 12 16:19:12 be1 kernel: [ 3658.170741] ------------[ cut here ]------------ > Feb 12 16:19:12 be1 kernel: [ 3658.170746] WARNING: CPU: 0 PID: 1667 > at fs/orangefs/waitqueue.c:203 service_operation+0x4f6/0x7f0() ... and we are in wait_for_direct_io(), holding an r/w slot and finding ourselves with bufmap already gone, despite not having freed that slot yet. Bloody wonderful - we still have bufmap refcounting buggered somewhere. Which tree had that been? Could you push that tree (having checked that you don't have any uncommitted changes) in some branch?