From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36FF2C4321E for ; Tue, 18 Jan 2022 20:58:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349422AbiARU6K (ORCPT ); Tue, 18 Jan 2022 15:58:10 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:25397 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245691AbiARU6J (ORCPT ); Tue, 18 Jan 2022 15:58:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1642539488; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=b9WqescKjIAzil7l6cr9HCVfXjlyRssVLY+HORpld6I=; b=P9DFD4AhQ0K4pJNR6xp/gxMmBN47oT4FQMkqt8dRyvC7sFcCdi4MTXtQx9KfegvtznSHj/ 8jAIbPvmRg7/OJCq0omnjTSc5pNmVwuu2XRFm0eLv/eOrxvbdJf8RrnWLGkPEo69CGTXNP 9Z391aBiJlEfBg7O/Ifj5lGxPvwZEbM= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-193-j65_gYlQNT2IuYjPSeyQjQ-1; Tue, 18 Jan 2022 15:58:07 -0500 X-MC-Unique: j65_gYlQNT2IuYjPSeyQjQ-1 Received: by mail-qv1-f72.google.com with SMTP id jn6-20020ad45de6000000b004146a2f1f97so446107qvb.19 for ; Tue, 18 Jan 2022 12:58:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=b9WqescKjIAzil7l6cr9HCVfXjlyRssVLY+HORpld6I=; b=eXeGATRPdOq+ux4K+DTtHoXBIodSTMBPsM75aXlor58KXDyeGneF46MxGzFn9mQ4bt gTY0zUFUbXwgzSOWN4Jr2vaiZHcGyKwkmpdAHoHRF2DISmWXfhNjG8hB0iFoRK63VqDO cjAeeMvcazPd9x7EkOQxTQYV6h1h6W0fmMstaQL1ieU2p4McXw/YQufEPXLgT+CUew+D JhnRQKv2OFCGOiqUpTntek8j4grSB21vDbNmbU5yS2SjuIJW3n6jUNMtzURGSnr2+u7a xgyYtGJdxeHXPiQ0K5R6qrzmDUqg77vww/LPSwYSM8pr18vvQ4ZNxlq7anIUoqemCLDl QbIQ== X-Gm-Message-State: AOAM532ItmbmL3phCQ4iezyX3xLiC6aldqDO6LSIt808/s7iBo+0V67s 2bc4uRaHtvpcVEXCsSgT6+F6ngzdgve/i3c/PionSA/kXWTpClZwiQKVzZvPPUx7oEo/mkL3a5n 2MR5psw+tSk2JE6KAWrwS X-Received: by 2002:a05:620a:2448:: with SMTP id h8mr4958214qkn.231.1642539487280; Tue, 18 Jan 2022 12:58:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJxq3Ndqgu/1eSPl8H1/GfW4nW+qGmp6SP+v18upZ87Y6BP23e5HdqaD5fVe/w9z/YJHEmEHtw== X-Received: by 2002:a05:620a:2448:: with SMTP id h8mr4958206qkn.231.1642539487045; Tue, 18 Jan 2022 12:58:07 -0800 (PST) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id az15sm5128276qkb.124.2022.01.18.12.58.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jan 2022 12:58:06 -0800 (PST) Date: Tue, 18 Jan 2022 15:58:04 -0500 From: Brian Foster To: Al Viro Cc: Ian Kent , "Darrick J. Wong" , Christoph Hellwig , Miklos Szeredi , David Howells , Kernel Mailing List , linux-fsdevel , xfs , Linus Torvalds Subject: Re: [PATCH] vfs: check dentry is still valid in get_link() Message-ID: References: <275358741c4ee64b5e4e008d514876ed4ec1071c.camel@themaw.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Jan 18, 2022 at 07:20:35PM +0000, Al Viro wrote: > On Tue, Jan 18, 2022 at 01:25:15PM -0500, Brian Foster wrote: > > > If I go back to the inactive -> reclaimable grace period variant and > > also insert a start_poll_synchronize_rcu() and > > poll_state_synchronize_rcu() pair across the inactive processing > > sequence, I start seeing numbers closer to ~36k cycles. IOW, the > > xfs_inodegc_inactivate() helper looks something like this: > > > > if (poll_state_synchronize_rcu(ip->i_destroy_gp)) > > xfs_inodegc_set_reclaimable(ip); > > else > > call_rcu(&VFS_I(ip)->i_rcu, xfs_inodegc_set_reclaimable_callback); > > > > ... to skip the rcu grace period if one had already passed while the > > inode sat on the inactivation queue and was processed. > > > > However my box went haywire shortly after with rcu stall reports or > > something if I let that continue to run longer, so I'll probably have to > > look into that lockdep splat (complaining about &pag->pag_ici_lock in > > rcu context, perhaps needs to become irq safe?) or see if something else > > is busted.. > > Umm... Could you (or Dave) describe where does the mainline do the > RCU delay mentioned several times in these threads, in case of > * unlink() > * overwriting rename() > * open() + unlink() + close() (that one, obviously, for regular files) > If you're referring to the existing rcu delay in XFS, I suspect that refers to xfs_reclaim_inode(). The last bits of that function remove the inode from the perag tree and then calls __xfs_inode_free(), which frees the inode via rcu callback. BTW, I think the experiment above is either going to require an audit to make the various _set_reclaimable() locks irq safe or something a bit more ugly like putting the inode back on a workqueue after the rcu delay to make the state transition. Given the incremental improvement from using start_poll_synchronize_rcu(), I replaced the above destroy side code with a cond_synchronize_rcu(ip->i_destroy_gp) call on the iget/recycle side and see similar results (~36k cycles per 60s, but so far without any explosions). Brian > The thing is, if we already do have an RCU delay in there, there might be > a better solution - making sure it happens downstream of d_drop() (in case > when dentry has other references) or dentry going negative (in case we are > holding the sole reference to it). > > If we can do that, RCU'd dcache lookups won't run into inode reuse at all. > Sure, right now d_delete() comes last *and* we want the target inode to stay > around past the call of ->unlink(). But if you look at do_unlinkat() you'll > see an interesting-looking wart with ihold/iput around vfs_unlink(). Not > sure if you remember the story on that one; basically, it's "we'd rather > have possible IO on inode freeing to happen outside of the lock on parent". > > nfsd and mqueue do the same thing; ksmbd does not. OTOH, ksmbd appears to > force the "make victim go unhashed, sole reference or not". ecryptfs > definitely does that forcing (deliberately so). > > That area could use some rethinking, and if we can deal with the inode reuse > delay while we are at it... > > Overwriting rename() is also interesting in that respect, of course. > > I can go and try to RTFS my way through xfs iget-related code, but I'd > obviously prefer to do so with at least some overview of that thing > from the folks familiar with it. Seeing that it's a lockless search > structure, missing something subtle there is all too easy, especially > with the lookup-by-fhandle stuff in the mix... >