From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F51CC43219 for ; Tue, 18 Jan 2022 20:58:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349427AbiARU6K (ORCPT ); Tue, 18 Jan 2022 15:58:10 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:24230 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245609AbiARU6J (ORCPT ); Tue, 18 Jan 2022 15:58:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1642539488; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=b9WqescKjIAzil7l6cr9HCVfXjlyRssVLY+HORpld6I=; b=P9DFD4AhQ0K4pJNR6xp/gxMmBN47oT4FQMkqt8dRyvC7sFcCdi4MTXtQx9KfegvtznSHj/ 8jAIbPvmRg7/OJCq0omnjTSc5pNmVwuu2XRFm0eLv/eOrxvbdJf8RrnWLGkPEo69CGTXNP 9Z391aBiJlEfBg7O/Ifj5lGxPvwZEbM= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-481-SXZaqlqpORaYOxRX536dqA-1; Tue, 18 Jan 2022 15:58:07 -0500 X-MC-Unique: SXZaqlqpORaYOxRX536dqA-1 Received: by mail-qv1-f72.google.com with SMTP id f7-20020a056214076700b0041c20941155so466469qvz.15 for ; Tue, 18 Jan 2022 12:58:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=b9WqescKjIAzil7l6cr9HCVfXjlyRssVLY+HORpld6I=; b=k1hhBfJIZ0F99pz+4xuEQtzCp6FXFO2C5tQsbP7QCmPvCAz7J4GwDP127ZdaDIfFDF pg8evOMUNhPPwArRsXnaZzZ0pyQsmKUqmeOqH6D643s8eXMB915Cd9bRoCEBnoT3LIBX i45laMDbiyAUs1WVsXGUeq6g3TVDSnjtO/rbEaVjburnMJWVktWczg9CzzTSJjgsuwps y9fwY2H138JBjC3iN2S/fkVaYjjkLFKVWkJHDc8Fhnnwo3ijaklZIBUKBXoRjEOseTKq 88fwZ/wrsqQxqeupPjOu5Isf9ss0mbiiDU8JN339PNi3nvVj1jkOlNI7836vkkn1iNp3 oYtg== X-Gm-Message-State: AOAM530wUilUyjHqq5K7bVGQT676bW8oYVJ6ts2aBu8iOs7A8DIoIGs/ PwNmwGGkMQXmkVLR4UeUm3vBxwXXYmV1owaPRKicResSDi7ZQV7RIaAFC7ZGWZ6K5aMwnNDV7XR ByGzAlzl22Ce9yycwGupm1QL+TQ== X-Received: by 2002:a05:620a:2448:: with SMTP id h8mr4958219qkn.231.1642539487282; Tue, 18 Jan 2022 12:58:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJxq3Ndqgu/1eSPl8H1/GfW4nW+qGmp6SP+v18upZ87Y6BP23e5HdqaD5fVe/w9z/YJHEmEHtw== X-Received: by 2002:a05:620a:2448:: with SMTP id h8mr4958206qkn.231.1642539487045; Tue, 18 Jan 2022 12:58:07 -0800 (PST) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id az15sm5128276qkb.124.2022.01.18.12.58.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jan 2022 12:58:06 -0800 (PST) Date: Tue, 18 Jan 2022 15:58:04 -0500 From: Brian Foster To: Al Viro Cc: Ian Kent , "Darrick J. Wong" , Christoph Hellwig , Miklos Szeredi , David Howells , Kernel Mailing List , linux-fsdevel , xfs , Linus Torvalds Subject: Re: [PATCH] vfs: check dentry is still valid in get_link() Message-ID: References: <275358741c4ee64b5e4e008d514876ed4ec1071c.camel@themaw.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Tue, Jan 18, 2022 at 07:20:35PM +0000, Al Viro wrote: > On Tue, Jan 18, 2022 at 01:25:15PM -0500, Brian Foster wrote: > > > If I go back to the inactive -> reclaimable grace period variant and > > also insert a start_poll_synchronize_rcu() and > > poll_state_synchronize_rcu() pair across the inactive processing > > sequence, I start seeing numbers closer to ~36k cycles. IOW, the > > xfs_inodegc_inactivate() helper looks something like this: > > > > if (poll_state_synchronize_rcu(ip->i_destroy_gp)) > > xfs_inodegc_set_reclaimable(ip); > > else > > call_rcu(&VFS_I(ip)->i_rcu, xfs_inodegc_set_reclaimable_callback); > > > > ... to skip the rcu grace period if one had already passed while the > > inode sat on the inactivation queue and was processed. > > > > However my box went haywire shortly after with rcu stall reports or > > something if I let that continue to run longer, so I'll probably have to > > look into that lockdep splat (complaining about &pag->pag_ici_lock in > > rcu context, perhaps needs to become irq safe?) or see if something else > > is busted.. > > Umm... Could you (or Dave) describe where does the mainline do the > RCU delay mentioned several times in these threads, in case of > * unlink() > * overwriting rename() > * open() + unlink() + close() (that one, obviously, for regular files) > If you're referring to the existing rcu delay in XFS, I suspect that refers to xfs_reclaim_inode(). The last bits of that function remove the inode from the perag tree and then calls __xfs_inode_free(), which frees the inode via rcu callback. BTW, I think the experiment above is either going to require an audit to make the various _set_reclaimable() locks irq safe or something a bit more ugly like putting the inode back on a workqueue after the rcu delay to make the state transition. Given the incremental improvement from using start_poll_synchronize_rcu(), I replaced the above destroy side code with a cond_synchronize_rcu(ip->i_destroy_gp) call on the iget/recycle side and see similar results (~36k cycles per 60s, but so far without any explosions). Brian > The thing is, if we already do have an RCU delay in there, there might be > a better solution - making sure it happens downstream of d_drop() (in case > when dentry has other references) or dentry going negative (in case we are > holding the sole reference to it). > > If we can do that, RCU'd dcache lookups won't run into inode reuse at all. > Sure, right now d_delete() comes last *and* we want the target inode to stay > around past the call of ->unlink(). But if you look at do_unlinkat() you'll > see an interesting-looking wart with ihold/iput around vfs_unlink(). Not > sure if you remember the story on that one; basically, it's "we'd rather > have possible IO on inode freeing to happen outside of the lock on parent". > > nfsd and mqueue do the same thing; ksmbd does not. OTOH, ksmbd appears to > force the "make victim go unhashed, sole reference or not". ecryptfs > definitely does that forcing (deliberately so). > > That area could use some rethinking, and if we can deal with the inode reuse > delay while we are at it... > > Overwriting rename() is also interesting in that respect, of course. > > I can go and try to RTFS my way through xfs iget-related code, but I'd > obviously prefer to do so with at least some overview of that thing > from the folks familiar with it. Seeing that it's a lockless search > structure, missing something subtle there is all too easy, especially > with the lookup-by-fhandle stuff in the mix... >