From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH] oxenstored: fix short-write issue Date: Mon, 2 Nov 2015 14:27:05 +0000 Message-ID: <563772B9.4090406@citrix.com> References: <1445965809-5144-1-git-send-email-wei.liu2@citrix.com> <1446471883.3088.40.camel@citrix.com> <56376EDF.8030802@citrix.com> <1446474270.3088.51.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZtG4o-0001Yy-Q0 for xen-devel@lists.xenproject.org; Mon, 02 Nov 2015 14:27:14 +0000 In-Reply-To: <1446474270.3088.51.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell , Wei Liu , Xen-devel Cc: Samuel Thibault , Ian Jackson , David Scott List-Id: xen-devel@lists.xenproject.org On 02/11/15 14:24, Ian Campbell wrote: > On Mon, 2015-11-02 at 14:10 +0000, Andrew Cooper wrote: >> On 02/11/15 13:44, Ian Campbell wrote: >>> On Tue, 2015-10-27 at 17:10 +0000, Wei Liu wrote: >>>> When oxenstored wrote to the ring, it wrote a chunk of contiguous >>>> data. >>>> Originally when it tried to write across ring boundary, it returned a >>>> short-write when there is still room. That led to stalling mini-os's >>>> xenstore thread at times. >>> What is a "short-write" in this context? >>> >>> Given data bytes 0..M I assumed it is only writing bytes 0..N and not >>> N+1..M because the ring boundary is at N. But what is it writing to the >>> ->prod ring pointer N or M? >> Prod gets incremented by N in this case. >> >>> AIUI writing N should be allowed by the ring protocol, the client >>> should >>> keep looking for more data until it has a complete request. >>> >>> Writing M would be a server error. >> Correct, and this is what is happening. > The first or second? Your first comment suggests the first, but your second > binds most closely to the second. Oops yes. Writing M would be an error. Prod currently gets incremented by N. > >> The server (believes) that the ring is full, when it is not. It waits >> for the client to make more space in the ring, while the client is >> waiting for the server to complete its message in the ring, thus >> stalling. > That makes sense, thanks. > > I think this needs to be spelled out more fully in the commit message, in > particular "server thinks ring is full when it is not" is the most relevant > thing, the short write is just how we arrived there. This patch here is buggy, and superseeded by one of mine which attempts to fix the C stubs. I need to post a v2. ~Andrew