From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9577AC3A589 for ; Tue, 20 Aug 2019 22:14:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6E0CB206DD for ; Tue, 20 Aug 2019 22:14:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729887AbfHTWOt (ORCPT ); Tue, 20 Aug 2019 18:14:49 -0400 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:59336 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728283AbfHTWOs (ORCPT ); Tue, 20 Aug 2019 18:14:48 -0400 Received: from dread.disaster.area (pa49-195-190-67.pa.nsw.optusnet.com.au [49.195.190.67]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 2466C43D7C3; Wed, 21 Aug 2019 08:14:45 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92) (envelope-from ) id 1i0CNl-00014Y-Jd; Wed, 21 Aug 2019 08:13:37 +1000 Date: Wed, 21 Aug 2019 08:13:37 +1000 From: Dave Chinner To: Brian Foster Cc: kaixuxia , linux-xfs@vger.kernel.org, "Darrick J. Wong" , newtongao@tencent.com, jasperwang@tencent.com Subject: Re: [PATCH V2] xfs: Fix agi&agf ABBA deadlock when performing rename with RENAME_WHITEOUT flag Message-ID: <20190820221337.GH1119@dread.disaster.area> References: <8eda2397-b7fb-6dd4-a448-a81628b48edc@gmail.com> <20190819151335.GB2875@bfoster> <718fa074-2c33-280e-c664-6afcc3bfe777@gmail.com> <20190820080741.GE1119@dread.disaster.area> <62649c5f-5390-8887-fe95-4f873af62804@gmail.com> <20190820105101.GA14307@bfoster> <20190820112304.GF1119@dread.disaster.area> <20190820122300.GB14307@bfoster> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190820122300.GB14307@bfoster> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 a=TR82T6zjGmBjdfWdGgpkDw==:117 a=TR82T6zjGmBjdfWdGgpkDw==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=FmdZ9Uzk2mMA:10 a=7-415B0cAAAA:8 a=fj9TDP3gWvWqASHTkgkA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Aug 20, 2019 at 08:23:00AM -0400, Brian Foster wrote: > On Tue, Aug 20, 2019 at 09:23:04PM +1000, Dave Chinner wrote: > > On Tue, Aug 20, 2019 at 06:51:01AM -0400, Brian Foster wrote: > > > On Tue, Aug 20, 2019 at 04:53:22PM +0800, kaixuxia wrote: > > > FWIW if we do take that approach, then IMO it's worth reconsidering the > > > 1-2 liner I originally proposed to fix the locking. It's slightly hacky, > > > but really all three options are hacky in slightly different ways. The > > > flipside is it's trivial to implement, review and backport and now would > > > be removed shortly thereafter when we replace the on-disk whiteout with > > > the in-core fake whiteout thing. Just my .02 though.. > > > > We've got to keep the existing whiteout method around for, > > essentially, forever, because we have to support kernels that don't > > do in-memory translations of DT_WHT to a magic chardev inode and > > vice versa (i.e. via mknod). IOWs, we'll need a feature bit to > > indicate that we actually have DT_WHT based whiteouts on disk. > > > > I'm not quite following (probably just because I'm not terribly familiar > with the use case). If current kernels know how to fake up whiteout > inodes in memory based on a dentry, why do we need to continue to create > new on-disk whiteout inodes just because a filesystem might already have > such inodes on disk? We don't, unless there's a chance the filesystem will be booted again on an older kernel. So it's a one-way conversion: once we start using DT_WHT based whiteouts, there is no going back. > Wouldn't the old format whiteouts just continue to > work as expected without any extra handling? Yes, but that's not the problem. It's forwards compatibility that matters here. i.e. upgrade a kernel, something doesn't work, roll back to older kernel. Everything should work if the feature set on the filesystem is unchanged, even though the filesystem was used with a newer kernel. If the newer kernel has done something on disk that the older kernel does not understand (e.g. using DT_WHT based whiteouts), then the newer kernel must mark the filesystem with a feature bit to prevent the older kernel from doing the wrong thing with the format it doesn't exect to see. Mis-parsing a whiteout and not applying it correctly is a stale data exposure security bug, so we have to be careful here. Existing kernels don't know how to take a DT_WHT whiteout and fake up a magical chardev inode. Hence DT_WHT whiteouts will not work correctly on current kernels, even though we have DT_WHT in the dir ftype feature (and hence available on both v4 and v5 filesystems). i.e. the issue here is the VFS has defined the on-disk whiteout format, and we want the on-disk storage from chardev inodes to DT_WHT. Hence we need a feature bit because we are changing the method of storing whiteouts on disk, even though we -technically- aren't changing the _XFS_ on-disk format. Think of it as though DT_WHT support doesn't exist in XFS, and now we are going to add it... > I can see needing a feature bit to restrict a filesystem from being used > on an unsupported, older kernel, but is there a reason we wouldn't just > enable that by default anyways? Rule #1: Never enable new on-disk feature bits by default :) Yes, in future we can decide to automatically enable the feature bit in the kernel code, but that's not something we can do until kernel support for the DT_WHT based whiteouts is already widespread as it's not a backwards compatible change. Cheers, Dave. -- Dave Chinner david@fromorbit.com