From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.mpynet.fi ([82.197.21.85]:51285 "EHLO mx2.mpynet.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757098AbdADUEu (ORCPT ); Wed, 4 Jan 2017 15:04:50 -0500 Date: Wed, 4 Jan 2017 22:04:47 +0200 From: Tuomas Tynkkynen To: Al Viro CC: , , Subject: Re: 9pfs hangs since 4.7 Message-ID: <20170104220447.74f2265d@duuni> In-Reply-To: <20170104014753.GE1555@ZenIV.linux.org.uk> References: <20161124215023.02deb03c@duuni> <20170102102035.7d1cf903@duuni> <20170102162309.GZ1555@ZenIV.linux.org.uk> <20170104013355.4a8923b6@duuni> <20170104014753.GE1555@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, 4 Jan 2017 01:47:53 +0000 Al Viro wrote: > On Wed, Jan 04, 2017 at 01:34:50AM +0200, Tuomas Tynkkynen wrote: > > > I got these logs from the server & client with 9p tracepoints enabled: > > > > https://gist.githubusercontent.com/dezgeg/02447100b3182167403099fe7de4d941/raw/3772e408ddf586fb662ac9148fc10943529a6b99/dmesg%2520with%25209p%2520trace > > https://gist.githubusercontent.com/dezgeg/e1e0c7f354042e1d9bdf7e9135934a65/raw/3a0e3b4f7a5229fd0be032c6839b578d47a21ce4/qemu.log > > > > Lovely... > > 27349:out 8 0001 TSTATFS, tag 1 > 27350:out 12 0001 TLOPEN, tag 1 > 27351:complete 9 0001 RSTATFS, tag 1 > 27352:complete 13 0001 RLOPEN, tag 1 > > 27677:out 8 0001 TSTATFS, tag 1 > 27678:out 12 0001 TLOPEN, tag 1 > 27679:complete 9 0001 RSTATFS, tag 1 > 27680:complete 13 0001 RLOPEN, tag 1 > > 29667:out 8 0001 TSTATFS, tag 1 > 29668:out 110 0001 TWALK, tag 1 > 29669:complete 9 0001 RSTATFS, tag 1 > 29670:complete 7 0001 RLERROR, tag 1 > > 42317:out 110 0001 TWALK, tag 1 > 42318:out 8 0001 TSTATFS, tag 1 > 42319:complete 9 0001 RSTATFS, tag 1 > 42320:complete 7 0001 RERROR, tag 1 > > Those are outright protocol violations: tag can be reused only after either > a reply bearing the same tag has arrived *or* TFLUSH for that tag had been > issued and successful (type == RFLUSH) reply bearing the tag of that TFLUSH > has arrived. Your log doesn't contain any TFLUSH (108) at all, so it should've > been plain and simple "no reuse until server sends a reply with matching tag". > Argh, I had completely forgotten there is another 9p mount involved, so the log is mixed between the two mounts. Apologies, there is now a new trace attached with the V9fsState pointer dumped in the mix as well. > Otherwise the the dump looks sane. In the end all issued requests had been > served, so it's not as if the client had been waiting for a free tag or for > a response to be produced by the server. > > Interesting... dmesg doesn't seem to contain the beginning of the 9P trace - > only 89858 out of 108818 in the dump, even though it claims to have lost > only 4445 events. [...] Here's logs that should be complete this time: https://gist.githubusercontent.com/dezgeg/08629d4c8ca79da794bc087e5951e518/raw/a1a82b9bc24e5282c82beb43a9dc91974ffcf75a/9p.qemu.log https://gist.githubusercontent.com/dezgeg/1d5f1cc0647e336c59989fc62780eb2e/raw/d7623755a895b0441c608ddba366d20bbf47ab0b/9p.dmesg.log > > Interesting... Which version of qemu/what arguments are you using there? This is QEMU 2.7.0, with these on the server side: -virtfs local,path=/nix/store,security_model=none,mount_tag=store -virtfs local,path=$TMPDIR/xchg,security_model=none,mount_tag=xchg and on the client side: store on /nix/store type 9p (rw,relatime,dirsync,trans=virtio,version=9p2000.L,cache=loose) xchg on /tmp/xchg type 9p (rw,relatime,dirsync,trans=virtio,version=9p2000.L,cache=loose) 'store' is the one receiving the hammering, 'xchg' I think is mostly sitting idle.