From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from 8.mo69.mail-out.ovh.net ([46.105.56.233]:39126 "EHLO 8.mo69.mail-out.ovh.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S940217AbdAGPKx (ORCPT ); Sat, 7 Jan 2017 10:10:53 -0500 Received: from player696.ha.ovh.net (b7.ovh.net [213.186.33.57]) by mo69.mail-out.ovh.net (Postfix) with ESMTP id A871CF0E5 for ; Sat, 7 Jan 2017 16:10:51 +0100 (CET) Date: Sat, 7 Jan 2017 16:10:45 +0100 From: Greg Kurz To: Al Viro Cc: Tuomas Tynkkynen , linux-fsdevel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org Subject: Re: [V9fs-developer] 9pfs hangs since 4.7 Message-ID: <20170107161045.742893b1@bahia.lan> In-Reply-To: <20170107062647.GB12074@ZenIV.linux.org.uk> References: <20161124215023.02deb03c@duuni> <20170102102035.7d1cf903@duuni> <20170102162309.GZ1555@ZenIV.linux.org.uk> <20170104013355.4a8923b6@duuni> <20170104014753.GE1555@ZenIV.linux.org.uk> <20170104220447.74f2265d@duuni> <20170104230101.GG1555@ZenIV.linux.org.uk> <20170106145235.51630baf@bahia.lan> <20170107062647.GB12074@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, 7 Jan 2017 06:26:47 +0000 Al Viro wrote: > On Fri, Jan 06, 2017 at 02:52:35PM +0100, Greg Kurz wrote: > > > Looking at the tag numbers, I think we're hitting the hardcoded limit of 128 > > simultaneous requests in QEMU (which doesn't produce any error, new requests > > are silently dropped). > > > > Tuomas, can you change MAX_REQ to some higher value (< 65535 since tag is > > 2-byte and 0xffff is reserved) to confirm ? > > Huh? > > Just how is a client supposed to cope with that behaviour? 9P is not > SunRPC - there's a reason why it doesn't live on top of UDP. Sure, it's > datagram-oriented, but it really wants reliable transport... > > Setting the ring size at MAX_REQ is fine; that'll give you ENOSPC on > attempt to put a request there, and p9_virtio_request() will wait for > things to clear, but if you've accepted a request, that's bloody it - > you really should go and handle it. > Yes you're right and "dropped" in my previous mail meant "not accepted" actually (virtqueue_pop() not called)... sorry for the confusion. :-\ > How does it happen, anyway? qemu-side, I mean... Does it move the buffer > to used ring as soon as it has fetched the request? AFAICS, it doesn't - > virtqueue_push() is called just before pdu_free(); we might get complications > in case of TFLUSH handling (queue with MAX_REQ-1 requests submitted, TFLUSH > arrives, cancel_pdu is found and ->cancelled is set on it, then v9fs_flush() > waits for it to complete. Once the damn thing is done, buffer is released by > virtqueue_push(), but pdu freeing is delayed until v9fs_flush() gets woken > up. In the meanwhile, another request arrives into the slot of freed by > that virtqueue_push() and we are out of pdus. > Indeed. Even if this doesn't seem to be the problem here, I guess this should be fixed. > So it could happen, and the things might get unpleasant to some extent, but... > no TFLUSH had been present in all that traffic. And none of the stuck > processes had been spinning in p9_virtio_request(), so they *did* find > ring slots... So we're back to your previous proposal of checking if virtqueue_kick() returned false...