From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75BB4C433F5 for ; Wed, 8 Dec 2021 16:08:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232138AbhLHQLn (ORCPT ); Wed, 8 Dec 2021 11:11:43 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:57696 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236528AbhLHQLn (ORCPT ); Wed, 8 Dec 2021 11:11:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638979690; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=orvh4kyOfiq4+hN2BN3vGPj6BnjNTCRMi4s8B8HbBSk=; b=Sjo87J0VqvIijl7fp0QKSQ0W8BCrn5XyCZeavpNOLvczqr97IMsgrWl51ec+PJrsrQTy+b zH71Gxe9Z2MROdLcgtb93VtgDVeSCmp8EMPM3lN3UvJGGMPLL1nRWmIETa08NUOe5p3zK1 3XDoMoBilmwYkCs4jDzstdhVQEsO6ZI= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-321-NzjXtLJzOsmA1mlydwvDnw-1; Wed, 08 Dec 2021 11:08:09 -0500 X-MC-Unique: NzjXtLJzOsmA1mlydwvDnw-1 Received: by mail-qk1-f200.google.com with SMTP id o4-20020a05620a2a0400b0046dbce64beaso3700604qkp.1 for ; Wed, 08 Dec 2021 08:08:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=orvh4kyOfiq4+hN2BN3vGPj6BnjNTCRMi4s8B8HbBSk=; b=mpazEz6vsPPiinuY5YnhnPXw5Al+LAIwjHR/kAQ/puztF0aEvNtq3mNwWeJF2WRi8D MNYxoz6kBXmgvlYaohtBdwPdvBFRvX5tU7vUhtBQzo4AbKj6Ph4hmlfDl//7koVmaMsJ YjG4O5KaMtMYUXN21QMXIUNC6nW3LG2LP/HRta79zpmv9WyAiSq82L0mXAnGcxiXYxeT 9F03Ps/YpoIUFZ3nvKkObh45dEelKd2b1VRiM6aM88BHHnSD0dDQIFD39t5ZS4mzVc1o ReMc7Qr0VsqXya5eU6h9FuPZOxPNw1QOmgtw5M/Z9DL6QxQcyTpdf5SiE+4VYZt7gs0b PKdQ== X-Gm-Message-State: AOAM533IW4JRwys6h+nTW35qoLbZEJ4QYxNKiqU37tZtGQTA3/K8g7VH q+49UCC4LCJLzn7MxEL3wNnoEBWHYMBzXkdGVDAd5OgBmbA57BtmzCFj0lkb0uJTlBLHpK+Xal2 wt0uHfMJkI85Bxti1HNqA X-Received: by 2002:a0c:f992:: with SMTP id t18mr8718141qvn.37.1638979688915; Wed, 08 Dec 2021 08:08:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJyHmI60KtpfiQOjxS9uDDkVijnV21dVkLeg9xtg2+C9XWVTnsT0QL/bdk2RQrne4XwtJ0W9ZQ== X-Received: by 2002:a0c:f992:: with SMTP id t18mr8718092qvn.37.1638979688523; Wed, 08 Dec 2021 08:08:08 -0800 (PST) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id 9sm1573772qkm.5.2021.12.08.08.08.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Dec 2021 08:08:08 -0800 (PST) Date: Wed, 8 Dec 2021 11:08:06 -0500 From: Brian Foster To: Dave Chinner Cc: linux-xfs@vger.kernel.org, Dave Chinner , Gonzalo Siero Humet Subject: Re: [RFD] XFS inode reclaim (inactivation) under fs freeze Message-ID: References: <20211206215143.GI449541@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211206215143.GI449541@dread.disaster.area> Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Dec 07, 2021 at 08:51:43AM +1100, Dave Chinner wrote: > On Mon, Dec 06, 2021 at 12:29:28PM -0500, Brian Foster wrote: > > Hi, > > > > We have reports on distro (pre-deferred inactivation) kernels that inode > > reclaim (i.e. via drop_caches) can deadlock on the s_umount lock when > > invoked on a frozen XFS fs. This occurs because drop_caches acquires the > > lock and then blocks in xfs_inactive() on transaction alloc for an inode > > that requires an eofb trim. Unfreeze blocks on the same lock and thus > > the fs is deadlocked (in a frozen state). As far as I'm aware, this has > > been broken for some time and probably just not observed because reclaim > > under freeze is a rare and unusual situation. > > > > With deferred inactivation, the deadlock problem actually goes away > > because ->destroy_inode() will never block when the filesystem is > > frozen. There is new/odd behavior, however, in that lookups of a pending > > inactive inode spin loop waiting for the pending inactive state to > > clear. That won't happen until the fs is unfrozen. > > That's existing behaviour for any inode that is stuck waiting for > inactivation, regardless of how it is stuck. We've always had > situations where lookups would spin waiting on indoes when there are > frozen filesystems preventing XFS_IRECLAIMABLE inodes from making > progress. > > IOWs, this is not new behaviour - accessing files stuck in reclaim > during freeze have done this for a couple of decades now... > Sorry, I know you pointed this out already.. To try and be more clear, I think the question is more whether we've increased the scope of this scenario such that users are more likely to hit it and subsequently complain. IOW, we historically blocked the inactivating task until unfreeze, whereas now the task is free to continue potentially evicting/queuing as many more inodes as it needs to. The historical behavior is obviously broken so this is a step in the right direction, but the scope of the spin looping thing has certainly changed enough to warrant more thought beyond "lookup could always block." Of course, this is all balanced against the fact that the reclaim under freeze situation seems to be historically rare in the first place, so maybe we don't care now that the deadlock is gone and we can just leave things as is. > > Also, the deferred inactivation queues are not consistently flushed on > > freeze. I've observed that xfs_freeze invokes an xfs_inodegc_flush() > > indirectly via xfs_fs_statfs(), but fsfreeze does not. > > Userspace does not need to flush the inactivatin queues on freeze - > - the XFS kernel side freeze code has a full queue flush in it. It's > a bit subtle, but it is there. > > > Therefore, I > > suspect it may be possible to land in this state from the onset of a > > freeze based on prior reclaim behavior. (I.e., we may want to make this > > flush explicit on freeze, depending on the eventual solution.) > > xfs_inodegc_stop() does a full queue flush and it's called during > freeze from xfs_fs_sync_fs() after the page faults have been frozen > and waited on. i.e. it does: > > xfs_inodegc_queue_all() > for_each_online_cpu(cpu) { > gc = per_cpu_ptr(mp->m_inodegc, cpu); > cancel_work_sync(&gc->work); > } > > xfs_inodegc_queue_all() queues works all the pending non-empty per > cpu queues, then we run cancel_work_sync() on them. > cancel_work_sync() runs __flush_work() internally, and so > it is almost the same as running { flush_work(); cancel_work(); } > except that it serialises against other attemps to queue/cancel as > it marks the work as having a cancel in progress. > I don't think that is how this works: On 5.16.0-rc2: <...>-997 [000] ..... 204.883864: xfs_destroy_inode: dev 253:3 ino 0x83 iflags 0x240 <...>-997 [000] ..... 204.883903: xfs_inode_set_need_inactive: dev 253:3 ino 0x83 iflags 0x240 fsfreeze-998 [000] ..... 204.916239: xfs_inodegc_stop: dev 253:3 m_features 0x1bf6a9 opstate (clean|blockgc) s_flags 0x70810000 caller xfs_fs_sync_fs+0x1b0/0x2d0 [xfs]