From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:34938 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030237AbcBQULY (ORCPT ); Wed, 17 Feb 2016 15:11:24 -0500 Date: Wed, 17 Feb 2016 20:11:20 +0000 From: Al Viro To: Mike Marshall Cc: Martin Brandenburg , Linus Torvalds , linux-fsdevel , Stephen Rothwell Subject: Re: Orangefs ABI documentation Message-ID: <20160217201120.GN17997@ZenIV.linux.org.uk> References: <20160214234312.GX17997@ZenIV.linux.org.uk> <20160215184554.GY17997@ZenIV.linux.org.uk> <20160215230434.GZ17997@ZenIV.linux.org.uk> <20160216233609.GE17997@ZenIV.linux.org.uk> <20160216235441.GF17997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Feb 17, 2016 at 02:24:34PM -0500, Mike Marshall wrote: > It is still busted, I've been trying to find clues as to why... With reinit_completion() added? > Maybe this is relevant: > > Alloced OP ffff880015698000 <- doomed op for orangefs_create MAILBOX2.CPT > service_operation: orangefs_create op ffff880015698000 > ffff880015698000 got past is_daemon_in_service > > ... lots of stuff ... > > w_f_m_d returned -11 for ffff880015698000 <- first op to get EAGAIN > > first client core is NOT in service > second op to get EAGAIN > ... > last client core is NOT in service > > ... lots of stuff ... > > service_operation returns to orangef_create with handle 0 fsid 0 ret 0 > for MAILBOX2.CPT > > I'm guessing you want me to wait to do the switching of my branch > until we fix this (last?) thing, let me know... What I'd like to check is the value of op->waitq.done at retry_servicing. If we get there with a non-zero value, we've a problem. BTW, do you hit any of gossip_err() in orangefs_clean_up_interrupted_operation()?