From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08772C10DCE for ; Wed, 11 Mar 2020 01:18:59 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A41E620727 for ; Wed, 11 Mar 2020 01:18:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (4096-bit key) header.d=crudebyte.com header.i=@crudebyte.com header.b="cHplOhH0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A41E620727 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=crudebyte.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:42398 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jBq1R-0003XD-Qr for qemu-devel@archiver.kernel.org; Tue, 10 Mar 2020 21:18:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45705) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jBq0i-00036s-AZ for qemu-devel@nongnu.org; Tue, 10 Mar 2020 21:18:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jBq0f-0001T1-3I for qemu-devel@nongnu.org; Tue, 10 Mar 2020 21:18:12 -0400 Received: from kylie.crudebyte.com ([5.189.157.229]:59519) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jBq0e-0001QU-Jy for qemu-devel@nongnu.org; Tue, 10 Mar 2020 21:18:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=crudebyte.com; s=kylie; h=Content-Type:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Content-ID:Content-Description; bh=UCJhrVzjyJ95bSvh7/+CLTzQZ9eVliGPlETtEZUNTqQ=; b=cHplOhH0RqQFQU50+ZRBYDVbBW fdFBNzx2Uh/9n3rZk6NefO/nJwrh4qmLw2YxYNRTBHezjRbWjJSobpZRNgGq7+R1QzYaYm19MHM76 Efxpdwa/2ra7mKeDM1tkYQpa+K//EwCHfRQDgj9WuZmiAxAYUkg9hST8LdcwWAj408BkcF6nF/8LB fwH5NXUiNegsaBCORlAdf9NmdhLjiXT0Lo+TuEDeLYNtt1n+L8hLmR6xLuNwaFco79QeNABlILegC F1+TeCkEraj/htBpHaFqJUcazT2P3HiL6uBALLMpL1lK4kBpsOWOUdooj3mPguZnG2Pgd42Q4BzV/ eoVllOoPb/TjOlDVw3USfFG4h6T2nzC1Bk/XI8EwyYCDd77ytMXNxwc+JpD6fuXLCYoAtv9HLgMTm goIERrTbOYKcOiUnlSrBYTcGJ8lXlWpjUEdGhSVOKAj2McF/2PjMclK7Hjskw89V3J9BzG0mmMkg5 pHU6gSl80DaWg0CZdLc3dtiiyncCw+wko3gOWneMv4Qs1/fzmAG3vmyoNkOlSuM6T3JgD++yR7lks e0BPjngyY19n5ZSGmWb3CaUQHOSCg8Y8bT0z9xAaNopM4GTXTHQlwuG6BPUxpNWzCb4lx4rQMXoWj msAlczPheaNAqprSukRHf7UfAWBOTTfUIgg/dEkyc=; From: Christian Schoenebeck To: qemu-devel@nongnu.org Cc: Greg Kurz Subject: Re: [PATCH v4 10/11] 9pfs: T_readdir latency optimization Date: Wed, 11 Mar 2020 02:18:04 +0100 Message-ID: <2408892.h2sFNKg3ST@silver> In-Reply-To: <20200310193336.37029536@bahia.home> References: <1651612.bE1MAdl4rA@silver> <20200310193336.37029536@bahia.home> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 5.189.157.229 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Dienstag, 10. M=E4rz 2020 19:33:36 CET Greg Kurz wrote: > > This patch is also too big for my preference, but I don't see a viable = way > > to split it further into separate patches. I already separated all the > > patches I could. If you have suggestions, very much appreciated! >=20 > Well, the patch could be split in two or three parts at least: >=20 > (1) introduce the new function that reads multiple entries in codir.c >=20 > (2) use it from 9p.c >=20 > (3) remove unused stuff if anything remains >=20 > This doesn't seem to change much but the smaller diffstats > for each individual patch make them less scary :) and with > (1) applied only it is easier to compare what the old code > in 9p.c and the new one in codir.c do. >=20 > > The reason for this is that in order to fix these issues with current > > T_readdir implementation, it requires to separate what's currently one > > task > > (i.e. one function) into two separate tasks (i.e. two functions). There= is > > no sensible way to do that. >=20 > Yeah, I won't ask for finer grain. Me confused. Does that mean your split suggestion was just theoretical, or = do=20 you need it? > > But don't be too scared about this patch; if you look just at the diff > > directly then yes, the diff is not very human readable. However if you > > apply the patch and look at the resulting code, you'll notice that there > > are actually only 2 functions (v9fs_do_readdir() in 9p.c and > > do_readdir_lowlat() in codir.c) which require careful reviewing and that > > their resulting code is actually very, very straight forward, and not > > long either. >=20 > These are personal opinions. Careful reviewing can take a lot of time :) Well, depends on what precisely you mean with opinion. :) That this patch=20 actually reduces code complexity is a fact (less branches, less locks, less= =20 dispatches, less side effects, less error/cleanup handlers). These are=20 objectively measurable quantities. But yes, nevertheless reviewing it costs time. > > Current code on master is much more tricky and error prone due to the h= uge > > amount of potential branches, individual error/cleanup handlers, high > > amount of thread dispatching and much more. In the patched version all > > these code complexities and error sources are removed. >=20 > Come on :) The 9pfs code has been a can of worms from the beginning. > It produced more than the average amount of security-related bugs, > and most sadly, due to the overall lack of interest, it bitrotted > and missed a lot of cool improvements like an appropriate support of > unlinked files, QOM-ification of fsdev, conversion of proxy fsdev to > vhost-user, hot plug/unplug support, live migration support and > "much more"... The performance aspect of things is a full time job No, the performance issues are actually very managable in case of 9pfs. I already addressed readdir with this patch (by far the worst performance=20 issue), and then there would just be one more severe performance issue:=20 walkdir. My intention is not to squeeze out the last fractional percent of performan= ce=20 for 9pfs, but you certainly agree that a simple "ls" blocking for more than= 1=20 second is something that should be fixed, and fortunately the amount of=20 changes involved are far less than I originally feared they would. > I never had the opportunity to even consider. So yes, your changes > are likely beneficial but the code base is still extremely fragile > and sensible to huge changes... not breaking things that happen > to work, even in a sub-optimal way, is essentially what I care for > these days. And I think I'm also very careful not breaking anything. I carefully consid= er=20 what to touch and what not. I wrote test cases and I am actively testing my= =20 changes with real installations and snapshots as well. I think the cause of disagreements we have are solely our use cases of 9pfs= :=20 your personal use case does not seem to require any performance considerati= ons=20 or multi-user aspects, whereas my use case requires at least some minimum=20 performance grade for utilizing 9pfs for server applications. > > > Oh, so we'd still have the current implementation being used, even > > > with this patch applied... This would be okay for a RFC patch but > > > in the end I'd really like to merge something that also converts > > > v9fs_do_readdir_with_stat(). > >=20 > > Yes, I know, but I would not recommend mixing these things at this poin= t, > > because it would be an entire effort on its own. > >=20 > > v9fs_do_readdir_with_stat() is used for 9P2000.u, while v9fs_do_readdir= () > > is used for 9P2000.L. They're behaving very differently, so it would not > > only require me to update v9fs_do_readdir_with_stat() and v9fs_read(), I > > would also need to write their own test cases (plural, since there are > > none at all yet) and benchmarks, and of course somebody what need to > > review all that additional amount of code, and who would actually test > > it? I am actively testing my 9P2000.L changes, but I am not actually > > using 9P2000.u, so I could not guarantee for the same quality. > >=20 > > Considering the current commitment situation regarding 9pfs, we can only > > bring things forward with compromises. Hence I suggest I concentrate on > > fixing the worst performance issues on 9P2000.L side first, and later on > > if there are really people interested in seeing these improvements on > > 9P2000.u side and willing to at least help out with testing, then I can > > definitely also adjust 9P2000.u side accordingly as well. > >=20 > > But to also make this clear: this patch 10 is not introducing any > > behaviour > > change for 9P2000.u side. >=20 > Ok, fair enough to leave 9p2000.U behind for now but I had to ask :) > /me is even wondering if we should start deprecating 9p2000.U since > most clients can use 9p2000.L now... I probably wouldn't. On macOS for instance there's only 9p2000.u. In the en= d=20 9p2000.L is Linux specific. By deprecating 9p2000.u that would mean a shift= =20 towards only supporting Linux guests. > > Mmm... maybe later on, as a cleanup patch or something. Current version= is > > tested. I would like to avoid to make things more complicated than they > > already are (i.e. accidental introduction of some bug just due to this). >=20 > It seems we might agree on not breaking things that work ;-) Of course! All the things I work on 9pfs are just fixes. But that's probabl= y=20 where we start to disagree. :) > > > > > +static int do_readdir_lowlat(V9fsPDU *pdu, V9fsFidState *fidp, > > > > > + struct V9fsDirEnt **entries, > > > > > + int32_t maxsize, bool dostat) > > > > > +{ > > > > > + V9fsState *s =3D pdu->s; > > > > > + V9fsString name; > > > > > + int len, err =3D 0; > > > > > + int32_t size =3D 0; > > > > > + off_t saved_dir_pos; > > > > > + struct dirent *dent; > > > > > + struct V9fsDirEnt *e =3D NULL; > > > > > + V9fsPath path; > > > > > + struct stat stbuf; > > > > > + > > > > > + *entries =3D NULL; > > > > > + v9fs_path_init(&path); > > > > > + > > > > > + /* > > > > > + * TODO: Here should be a warn_report_once() if lock failed. > > > > > + * > > > > > + * With a good 9p client we should not get into concurrency > > > > > here, > > > > > + * because a good client would not use the same fid for > > > > > concurrent > > > > > + * requests. We do the lock here for safety reasons though. > > > > > However > > > > > + * the client would then suffer performance issues, so better > > > > > log > > > > > that > > > > > + * issue here. > > > > > + */ > > >=20 > > > This should probably be done with qemu_mutex_trylock() in a loop > > > directly > > > in v9fs_readdir_lock(). > >=20 > > No definitely not in the loop. That's intentionally *one* lock *outside* > > of >=20 > Not sure we're talking about the same loop here... >=20 > I'm just suggesting that v9fs_readdir_lock() could be something like: >=20 > static inline void v9fs_readdir_lock(V9fsDir *dir) > { > while (qemu_mutex_trylock(&dir->readdir_mutex)) { > warn_report_once("blah"); > } > } Ah yeah, in fact I got you wrong on this one. I thought you meant to move t= he=20 lock within do_readdir_lowlat(). About your actual suggestion above ... > > the loop. The normal case is not requiring a lock at all, like the comm= ent > > describes. Doing multiple locks inside the loop unnecessaririly reduces > > performance for no reason. > >=20 > > About *_trylock() instead of v9fs_readdir_lock(): might be a candidate = to >=20 > Hmm... are you suggesting that do_readdir_lowlat() should return if it > can't take the lock ? =2E.. yep, that's what I had in mind for it later on. I would not mind runn= ing=20 trylock in a loop like you suggested; if it fails, log a warning and return. Because if the lock fails, then client is buggy. User can read the warning = in=20 the logs and report the bug for client to be fixed. Not worth to handle tha= t=20 case in any fancy way by server. > > > > > + * Retrieves the requested (max. amount of) directory entries fr= om > > > > > the > > > > > fs > > > > > + * driver. This function must only be called by the main IO thre= ad > > > > > (top > > > > > half). + * Internally this function call will be dispatched to a > > > > > background > > > > > IO thread + * (bottom half) where it is eventually executed by the > > > > > fs > > > > > driver. + * > > > > > + * The old readdir implementation above just retrieves always one > > > > > dir > > > > > entry + * per call. The problem of that implementation above is t= hat > > > > > latency is + * added for (retrieval of) each directory entry, whi= ch > > > > > in > > > > > practice lead to + * latencies of several hundred ms for readdir = of > > > > > only one directory. + * + * This is avoided in this function by > > > > > letting > > > > > the fs driver retrieve all + * required directory entries with on= ly > > > > > call of this function and hence with + * only a single fs driver > > > > > request. > > >=20 > > > True but these kind of considerations rather belong to the changelog.= =2E. > > > otherwise, we'll have to not forget to kill these lines when the "old > > > readdir implementation" is no more. > >=20 > > Mmm... I do think this overall latency issue should be clarified somewh= ere > > as > The issue itself is best described in a changelog, but... >=20 > > a comment. Where exactly is not that important to me. For instance it > > could > > also be described on v9fs_co_run_in_worker() macro definition in coth.h > > instead, as general rule of thumb when dispatching stuff. >=20 > ... if you have useful recommendations that don't involve referring to a > piece of code that might go away, a comment in coth.h is certainly a good > idea. Ok, I'll move it to coth.h then. > > The thing is: for >10 years several developers obviously did not realize > > the severe negative performance impact of frivolously dispatching tasks > > to > I won't dare to comment on some people I don't know not doing the obvious > right things at the time... what I do know is that for >10 years, nobody > cared for this code. Period. You're lucky it is still functional ;-) This was not about judging individuals, just to express that the latency=20 impact of dispatching does not seem to be obvious for people, and trying to= =20 avoid similar mistakes by placing appropriate comment(s). > > > > > + struct V9fsDirEnt > > > > > **entries, > > > > > + int32_t maxsize, bool > > > > > dostat) > > >=20 > > > s/maxsize/max_count since this is the wording use in the caller. > >=20 > > Wouldn't do that. This is the driver side, not the 9p protocol request > > handler. As you might have relized before, "max_count" is semantically > > completely wrong. This variable does not count integral entries, it is > > rather a maximum size (in bytes) of the destination buffer being filled. > >=20 > > On 9p request handler side it is ok to use "max_count" instead, because > > the > > protocol definition uses exactly this term for the request parameter, b= ut > > on driver side it's IMO inappropriate. >=20 > I agree the max_count wording sucks and I don't care that much to > name variables according to protocol definitions, but I do care > to use the same name in caller/callee when it makes sense, for a > better grep or cscope experience. So you insist on that max_count change? Best regards, Christian Schoenebeck