From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10A01155723; Wed, 15 May 2024 15:54:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715788495; cv=none; b=rtomS8b802dISzQ0uZ/jKw7OlPcWAuZctn6XKPEw7pJxyU5Se9DQIX3qexUmSI8kQCrOdtkZ2uc8Q17PF0dA0+zZLTIJkolfsorJfpGmfTi4aGvVcp5h93PF3VuvfD2s9ibsLOb+C/015NEcUVJSR0IcT+zBYcnlm5wn4Jnh9J0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715788495; c=relaxed/simple; bh=Xa/BCmyXxGdiw+oVTXTupLT9W/P4Q5qnlY9QFYUPpGk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CYIqBsof3uLGRBvpy2zN8gwHa6gfXsoBwvD9vK1vB2VFUAObJyLZDc/dM/QD8f6i5/pU+W4CUPSfbZxpkMfkMQHg0ueny3tOyF8qqd9e7/SPirHxuOo33g40ESDyNknJALx7DO2Ik0bbONT75bssiflKidYZpebFvkmB2Ku52CM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TXeNKx2k; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TXeNKx2k" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1ec92e355bfso66502235ad.3; Wed, 15 May 2024 08:54:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715788493; x=1716393293; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=78ysp/C2pKGy5ZaQSsineoB51te8qrbtvMSWBwG/lG0=; b=TXeNKx2kzRTBrLfdIm0BYUigFl7v61vl7870OJSO6hcR3kRL28URlYg+OwKNPvFtbh RiQY2BPgZeQDzsXXhTp8r2Cl73UXmZHAO6pi6vNTMmsp8DruxdSgApukn2jDuRZ+l4ei fvpbe2LRY5op2Dlzhu3bR2FHXsmBdOpajAoQugxE1TFI83J2sM8PonQU2kf+zBDl1cqc sdysKtjgwDobBH2gzyvrMutMiRyOvBXCoZNayYRojgrbXYp2402sMtBJZad7oz9zR5f0 fvUmOmo0uUq4EKH5LJihA+PESMMNjmf0a5wTcyCu19O8dgf4YLgPMXcoCgoTw4fT3158 9IZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715788493; x=1716393293; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=78ysp/C2pKGy5ZaQSsineoB51te8qrbtvMSWBwG/lG0=; b=WTTWOJelzCCVi702WxPzg6qyRI+SJLXAeNO5eGS6PFgMXV/QFU86RJiqyiyLmwH+Hg 8K/sbhM/9G8lP14Nsrf+rLz53ToaqEuH4M3vJk7q6yqhrzusZddSbTFOot1HYwweGeAa mVxYaCMc4MviHFuI+5BgsOwTXJx67kIZrxPfo7xaBH+zua4fdDYx7Z4hShJKYK5jeRHG enl8TekA8Zj04ip8xTDU1pZ0KgcBhZCZzHA6fFDUJGCpbqa/PZnBvKmgVhMb3gOvt/Q3 ocP54k9yIXwq3ovvH7dY2hqiit+pmC6I+sulV9cVAWqBPK05J7nGm8s5nxGgzJ0FPpxT S9LQ== X-Forwarded-Encrypted: i=1; AJvYcCWvubfkxtBvriBnHGZ8gGCDenrkmqe6beFxuhvXr2NtyjrtmdFEnjH5+Ms15KRCK255XITZHzWAb53GA5+6G423uYOYgyDOQn+KpEd9flst656B73FZkvqzRfulKhoypeEQKesDi1Mj0xN4pBUUuNvRERd7zGYC2u/l5rZYXQ== X-Gm-Message-State: AOJu0YxPyfQdAhmrRs9Qc+6NGkxGXB+fwQug4nTgWfmzAN/EbFwKWqw1 pYlLAX2AxL6euBQVYWpKDYJ6PgFf8Kayz9saOc/wpP3v9SUz8Rld X-Google-Smtp-Source: AGHT+IFoLEA7P6+bcoFKgTJqjFgfqiSrxYW/MooXBQyr4YG9GdEExJd2BBQcCtFYLKdaJ7tW8gN2ZQ== X-Received: by 2002:a17:902:f54d:b0:1ec:6b87:e125 with SMTP id d9443c01a7336-1ef43f51f9bmr227541435ad.50.1715788492869; Wed, 15 May 2024 08:54:52 -0700 (PDT) Received: from localhost.localdomain ([14.22.11.163]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1ef0c037520sm120277755ad.194.2024.05.15.08.54.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 May 2024 08:54:52 -0700 (PDT) From: alexjlzheng@gmail.com X-Google-Original-From: alexjlzheng@tencent.com To: djwong@kernel.org Cc: alexjlzheng@gmail.com, bfoster@redhat.com, david@fromorbit.com, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, raven@themaw.net, rcu@vger.kernel.org, alexjlzheng@tencent.com Subject: Re: About the conflict between XFS inode recycle and VFS rcu-walk Date: Wed, 15 May 2024 23:54:41 +0800 Message-Id: <20240515155441.2788093-1-alexjlzheng@tencent.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20240131193018.GN1371843@frogsfrogsfrogs> References: <20240131193018.GN1371843@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit On Wed, 31 Jan 2024 at 11:30:18 -0800, djwong@kernel.org wrote: > On Wed, Jan 31, 2024 at 02:35:17PM +0800, Jinliang Zheng wrote: > > On Fri, 8 Dec 2023 11:14:32 +1100, david@fromorbit.com wrote: > > > On Tue, Dec 05, 2023 at 07:38:33PM +0800, alexjlzheng@gmail.com wrote: > > > > Hi, all > > > > > > > > I would like to ask if the conflict between xfs inode recycle and vfs rcu-walk > > > > which can lead to null pointer references has been resolved? > > > > > > > > I browsed through emails about the following patches and their discussions: > > > > - https://lore.kernel.org/linux-xfs/20220217172518.3842951-2-bfoster@redhat.com/ > > > > - https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfoster@redhat.com/ > > > > - https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.stgit@mickey.themaw.net/ > > > > > > > > And then came to the conclusion that this problem has not been solved, am I > > > > right? Did I miss some patch that could solve this problem? > > > > > > We fixed the known problems this caused by turning off the VFS > > > functionality that the rcu pathwalks kept tripping over. See commit > > > 7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to > > > the vfs"). > > > > Sorry for the delay. > > > > The problem I encountered in the production environment was that during the > > rcu walk process the ->get_link() pointer was NULL, which caused a crash. > > > > As far as I know, commit 7b7820b83f23 ("xfs: don't expose internal symlink > > metadata buffers to the vfs") first appeared in: > > - https://lore.kernel.org/linux-fsdevel/YZvvP9RFXi3%2FjX0q@bfoster/ > > > > Does this commit solve the problem of NULL ->get_link()? And how? > > I suggest reading the call stack from wherever the VFS enters the XFS > readlink code. If you have a reliable reproducer, then apply this patch > to your kernel (you haven't mentioned which one it is) and see if the > bad dereference goes away. > > --D Sorry for the delay. I encountered the following calltrace: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [20213.578756] BUG: kernel NULL pointer dereference, address: 0000000000000000 [20213.578785] #PF: supervisor instruction fetch in kernel mode [20213.578799] #PF: error_code(0x0010) - not-present page [20213.578812] PGD 3f01d64067 P4D 3f01d64067 PUD 3f01d65067 PMD 0 [20213.578828] Oops: 0010 [#1] SMP NOPTI [20213.578839] CPU: 92 PID: 766 Comm: /usr/local/serv Kdump: loaded Not tainted 5.4.241-1-tlinux4-0017.3 #1 [20213.578860] Hardware name: New H3C Technologies Co., Ltd. UniServer R4900 G3/RS33M2C9SA, BIOS 2.00.38P02 04/14/2020 [20213.578884] RIP: 0010:0x0 [20213.578894] Code: Bad RIP value. [20213.578903] RSP: 0018:ffffc90021ebfc38 EFLAGS: 00010246 [20213.578916] RAX: ffffffff82081f40 RBX: ffffc90021ebfce0 RCX: 0000000000000000 [20213.578932] RDX: ffffc90021ebfd48 RSI: ffff88bfad8d3890 RDI: 0000000000000000 [20213.578948] RBP: ffffc90021ebfc70 R08: 0000000000000001 R09: ffff889b9eeae380 [20213.578965] R10: 302d343200000067 R11: 0000000000000001 R12: 0000000000000000 [20213.578981] R13: ffff88bfad8d3890 R14: ffff889b9eeae380 R15: ffffc90021ebfd48 [20213.578998] FS: 00007f89c534e740(0000) GS:ffff88c07fd00000(0000) knlGS:0000000000000000 [20213.579016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [20213.579030] CR2: ffffffffffffffd6 CR3: 0000003f01d90001 CR4: 00000000007706e0 [20213.579046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [20213.579062] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [20213.579079] PKRU: 55555554 [20213.579087] Call Trace: [20213.579099] trailing_symlink+0x1da/0x260 [20213.579112] path_lookupat.isra.53+0x79/0x220 [20213.579125] filename_lookup.part.69+0xa0/0x170 [20213.579138] ? kmem_cache_alloc+0x3f/0x3f0 [20213.579151] ? getname_flags+0x4f/0x1e0 [20213.579161] user_path_at_empty+0x3e/0x50 [20213.579172] vfs_statx+0x76/0xe0 [20213.579182] __do_sys_newstat+0x3d/0x70 [20213.579194] ? fput+0x13/0x20 [20213.579203] ? ksys_ioctl+0xb0/0x300 [20213.579213] ? generic_file_llseek+0x24/0x30 [20213.579225] ? fput+0x13/0x20 [20213.579233] ? ksys_lseek+0x8d/0xb0 [20213.579243] __x64_sys_newstat+0x16/0x20 [20213.579256] do_syscall_64+0x4d/0x140 [20213.579268] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< And I analyzed the disassembly of trailing_symlink() and confirmed that a NULL ->get_link() happened here: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0xffffffff812e4850 : nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffff812e4855 : push %rbp 0xffffffff812e4856 : mov %rsp,%rbp 0xffffffff812e4859 : push %r15 0xffffffff812e485b : push %r14 0xffffffff812e485d : push %r13 0xffffffff812e485f : push %r12 0xffffffff812e4861 : push %rbx 0xffffffff812e4862 : mov %rdi,%rbx # rbx = &nameidate 0xffffffff812e4865 : sub $0x8,%rsp 0xffffffff812e4869 : mov 0x1765845(%rip),%edx # 0xffffffff82a4a0b4 0xffffffff812e486f : mov 0x38(%rdi),%eax 0xffffffff812e4872 : test %edx,%edx 0xffffffff812e4874 : je 0xffffffff812e48ac 0xffffffff812e4876 : mov %gs:0x1ad00,%rdx 0xffffffff812e487f : mov 0xc8(%rdi),%rcx # rcx = nameidata->link_inode 0xffffffff812e4886 : mov 0xc18(%rdx),%rdx 0xffffffff812e488d : mov 0x4(%rcx),%ecx # ecx = link_inode->uid 0xffffffff812e4890 : cmp %ecx,0x1c(%rdx) 0xffffffff812e4893 : je 0xffffffff812e48ac 0xffffffff812e4895 : mov 0x30(%rdi),%rsi 0xffffffff812e4899 : movzwl (%rsi),%edx 0xffffffff812e489c : and $0x202,%dx 0xffffffff812e48a1 : cmp $0x202,%dx 0xffffffff812e48a6 : je 0xffffffff812e495f 0xffffffff812e48ac : or $0x10,%eax 0xffffffff812e48af : mov %eax,0x38(%rbx) # nd->flags |= LOOKUP_PARENT 0xffffffff812e48b2 : mov 0x50(%rbx),%rax # rax = nd->stack 0xffffffff812e48b6 : movq $0x0,0x20(%rax) # stack[0].name = NULL 0xffffffff812e48be : mov 0x48(%rbx),%eax # nd->depth 0xffffffff812e48c1 : mov 0x50(%rbx),%rdx # nd->stack 0xffffffff812e48c5 : mov 0xc8(%rbx),%r13 # nd->link_inode 0xffffffff812e48cc : lea (%rax,%rax,2),%rax # rax = depth * 3 0xffffffff812e48d0 : shl $0x4,%rax # rax = rax << 4, sizeof(saved):0x30 0xffffffff812e48d4 : lea -0x30(%rdx,%rax,1),%r15 # r15 = last 0xffffffff812e48d9 : mov 0x8(%r15),%r14 # r14 = last->link.dentry 0xffffffff812e48dd : testb $0x40,0x38(%rbx) 0xffffffff812e48e1 : je 0xffffffff812e4950 0xffffffff812e48e3 : mov %r13,%rsi 0xffffffff812e48e6 : mov %r15,%rdi 0xffffffff812e48e9 : callq 0xffffffff812f8a00 0xffffffff812e48ee : test %al,%al 0xffffffff812e48f0 : jne 0xffffffff812e4a56 0xffffffff812e48f6 : mov 0x38(%rbx),%edx 0xffffffff812e48f9 : mov %r13,%rsi 0xffffffff812e48fc : mov %r14,%rdi 0xffffffff812e48ff : shr $0x6,%edx 0xffffffff812e4902 : and $0x1,%edx 0xffffffff812e4905 : callq 0xffffffff81424310 0xffffffff812e490a : movslq %eax,%r12 0xffffffff812e490d : test %eax,%eax 0xffffffff812e490f : jne 0xffffffff812e4939 0xffffffff812e4911 : movl $0x4,0x44(%rbx) 0xffffffff812e4918 : mov 0x248(%r13),%r12 0xffffffff812e491f : test %r12,%r12 0xffffffff812e4922 : je 0xffffffff812e49e5 0xffffffff812e4928 : movzbl (%r12),%eax 0xffffffff812e492d : cmp $0x2f,%al 0xffffffff812e492f : je 0xffffffff812e49b7 0xffffffff812e4935 : test %al,%al 0xffffffff812e4937 : je 0xffffffff812e49ae 0xffffffff812e4939 : test %r12,%r12 0xffffffff812e493c : je 0xffffffff812e49ae 0xffffffff812e493e : add $0x8,%rsp 0xffffffff812e4942 : mov %r12,%rax 0xffffffff812e4945 : pop %rbx 0xffffffff812e4946 : pop %r12 0xffffffff812e4948 : pop %r13 0xffffffff812e494a : pop %r14 0xffffffff812e494c : pop %r15 0xffffffff812e494e : pop %rbp 0xffffffff812e494f : retq 0xffffffff812e4950 : mov %r15,%rdi 0xffffffff812e4953 : callq 0xffffffff812f8ae0 0xffffffff812e4958 : callq 0xffffffff81a26410 <_cond_resched> 0xffffffff812e495d : jmp 0xffffffff812e48f6 0xffffffff812e495f : mov 0x4(%rsi),%edx 0xffffffff812e4962 : cmp $0xffffffff,%edx 0xffffffff812e4965 : je 0xffffffff812e496f 0xffffffff812e4967 : cmp %edx,%ecx 0xffffffff812e4969 : je 0xffffffff812e48ac 0xffffffff812e496f : mov $0xfffffffffffffff6,%r12 0xffffffff812e4976 : test $0x40,%al 0xffffffff812e4978 : jne 0xffffffff812e493e 0xffffffff812e497a : mov %gs:0x1ad00,%rax 0xffffffff812e4983 : mov 0xce0(%rax),%rax 0xffffffff812e498a : test %rax,%rax 0xffffffff812e498d : je 0xffffffff812e4999 0xffffffff812e498f : mov (%rax),%eax 0xffffffff812e4991 : test %eax,%eax 0xffffffff812e4993 : je 0xffffffff812e4a6f 0xffffffff812e4999 : mov $0xffffffff82319b4f,%rdi 0xffffffff812e49a0 : mov $0xfffffffffffffff3,%r12 0xffffffff812e49a7 : callq 0xffffffff81161310 0xffffffff812e49ac : jmp 0xffffffff812e493e 0xffffffff812e49ae : mov $0xffffffff8230164d,%r12 0xffffffff812e49b5 : jmp 0xffffffff812e493e 0xffffffff812e49b7 : cmpq $0x0,0x20(%rbx) 0xffffffff812e49bc : je 0xffffffff812e4a8a 0xffffffff812e49c2 : mov %rbx,%rdi 0xffffffff812e49c5 : callq 0xffffffff812e2da0 0xffffffff812e49ca : test %eax,%eax 0xffffffff812e49cc : jne 0xffffffff812e4a97 0xffffffff812e49d2 : add $0x1,%r12 0xffffffff812e49d6 : movzbl (%r12),%eax 0xffffffff812e49db : cmp $0x2f,%al 0xffffffff812e49dd : jne 0xffffffff812e4935 0xffffffff812e49e3 : jmp 0xffffffff812e49d2 0xffffffff812e49e5 : mov 0x20(%r13),%rax # inode->i_op 0xffffffff812e49e9 : add $0x10,%r15 0xffffffff812e49ed : mov %r13,%rsi 0xffffffff812e49f0 : mov %r15,%rdx 0xffffffff812e49f3 : mov 0x8(%rax),%rcx # inode_operations->get_link 0xffffffff812e49f7 : testb $0x40,0x38(%rbx) 0xffffffff812e49fb : jne 0xffffffff812e4a1f 0xffffffff812e49fd : mov %r14,%rdi # nd->flags & LOOKUP_RCU == 0 0xffffffff812e4a00 : callq 0xffffffff81e00f70 <__x86_indirect_thunk_rcx> # jmpq *%rcx 0xffffffff812e4a05 : mov %rax,%r12 0xffffffff812e4a08 : test %r12,%r12 0xffffffff812e4a0b : je 0xffffffff812e49ae 0xffffffff812e4a0d : cmp $0xfffffffffffff000,%r12 0xffffffff812e4a14 : jbe 0xffffffff812e4928 0xffffffff812e4a1a : jmpq 0xffffffff812e493e 0xffffffff812e4a1f : xor %edi,%edi # nd->flags & LOOKUP_RCU != 0 0xffffffff812e4a21 : mov %rcx,-0x30(%rbp) 0xffffffff812e4a25 : callq 0xffffffff81e00f70 <__x86_indirect_thunk_rcx> # jmpq *%rcx 0xffffffff812e4a2a : mov %rax,%r12 0xffffffff812e4a2d : cmp $0xfffffffffffffff6,%rax 0xffffffff812e4a31 : jne 0xffffffff812e4a08 0xffffffff812e4a33 : mov %rbx,%rdi 0xffffffff812e4a36 : callq 0xffffffff812e3840 0xffffffff812e4a3b : test %eax,%eax 0xffffffff812e4a3d : jne 0xffffffff812e4a97 0xffffffff812e4a3f : mov %r15,%rdx 0xffffffff812e4a42 : mov %r13,%rsi 0xffffffff812e4a45 : mov %r14,%rdi 0xffffffff812e4a48 : mov -0x30(%rbp),%rcx 0xffffffff812e4a4c : callq 0xffffffff81e00f70 <__x86_indirect_thunk_rcx> 0xffffffff812e4a51 : mov %rax,%r12 0xffffffff812e4a54 : jmp 0xffffffff812e4a08 0xffffffff812e4a56 : mov %rbx,%rdi 0xffffffff812e4a59 : callq 0xffffffff812e3840 0xffffffff812e4a5e : test %eax,%eax 0xffffffff812e4a60 : jne 0xffffffff812e4a97 0xffffffff812e4a62 : mov %r15,%rdi 0xffffffff812e4a65 : callq 0xffffffff812f8ae0 0xffffffff812e4a6a : jmpq 0xffffffff812e48f6 0xffffffff812e4a6f : mov 0x50(%rbx),%rax 0xffffffff812e4a73 : mov 0xb8(%rbx),%rdi 0xffffffff812e4a7a : xor %edx,%edx 0xffffffff812e4a7c : mov 0x8(%rax),%rsi 0xffffffff812e4a80 : callq 0xffffffff811673f0 <__audit_inode> 0xffffffff812e4a85 : jmpq 0xffffffff812e4999 0xffffffff812e4a8a : mov %rbx,%rdi 0xffffffff812e4a8d : callq 0xffffffff812e4790 0xffffffff812e4a92 : jmpq 0xffffffff812e49c2 0xffffffff812e4a97 : mov $0xfffffffffffffff6,%r12 0xffffffff812e4a9e : jmpq 0xffffffff812e493e <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< According to my understanding, the problem solved by commit 7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to the vfs") is a data NULL pointer dereference, but the problem here is an instruction NULL pointer dereference. Further, I analyzed the possible triggering process as follows: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rcu_walk do_unlinkat ~~> prune_dcache_sb create rcu_read_lock read_seqcount_retry (the last check) iput_final evict destroy_inode xfs_fs_destroy_inode xfs_inode_set_reclaim_tag xfs_ialloc spin_lock(ip->i_flags_lock) xfs_dialloc set(ip, XFS_IRECLAIMABLE) xfs_iget wakeup(xfs_reclaim_worker) rcu_read_lock spin_unlock(ip->i_flags_lock) xfs_iget_cache_hit spin_lock(ip->i_flags_lock) if (XFS_IRECLAIMABLE && !XFS_IRECLAIM) set(ip, XFS_IRECLAIM) spin_unlock(ip->i_flags_lock) rcu_read_unlock < ------------ > // miss synchronize_rcu() xfs_reinit_inode ->get_link = NULL get_link() // NULL rcu_read_unlock <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Therefore, I think that after commit 7b7820b83f23 ("xfs: don't expose internal symlink metadata buffers to the vfs"), we should start processing this NULL ->get_link pointer dereference. Or, am I thinking wrong somewhere? Thanks, Jinliang Zheng > > > > > > > Apart from that issue, I'm not aware of any other issues that the > > > XFS inode recycling directly exposes. > > > > > > > According to my understanding, the essence of this problem is that XFS reuses > > > > the inode evicted by VFS, but VFS rcu-walk assumes that this will not happen. > > > > > > It assumes that the inode will not change identity during the RCU > > > grace period after the inode has been evicted from cache. We can > > > safely reinstantiate an evicted inode without waiting for an RCU > > > grace period as long as it is the same inode with the same content > > > and same state. > > > > > > Problems *may* arise when we unlink the inode, then evict it, then a > > > new file is created and the old slab cache memory address is used > > > for the new inode. I describe the issue here: > > > > > > https://lore.kernel.org/linux-xfs/20220118232547.GD59729@dread.disaster.area/ > > > > And judging from the relevant emails, the main reason why ->get_link() is set > > to NULL should be the lack of synchronize_rcu() before xfs_reinit_inode() when > > the inode is chosen to be reused. > > > > However, perhaps due to performance reasons, this solution has not been merged > > for a long time. How is it now? > > > > Maybe I am missing something in the threads of mail? > > > > Thank you very much. :) > > Jinliang Zheng > > > > > > > > That said, we have exactly zero evidence that this is actually a > > > problem in production systems. We did get systems tripping over the > > > symlink issue, but there's no evidence that the > > > unlink->close->open(O_CREAT) issues are manifesting in the wild and > > > hence there hasn't been any particular urgency to address it. > > > > > > > Are there any recommended workarounds until an elegant and efficient solution > > > > can be proposed? After all, causing a crash is extremely unacceptable in a > > > > production environment. > > > > > > What crashes are you seeing in your production environment? > > > > > > -Dave. > > > -- > > > Dave Chinner > > > david@fromorbit.com > >