* [PATCH] ceph: make sure all the files successfully put before unmounting @ 2022-12-01 6:58 xiubli 2022-12-01 13:04 ` Ilya Dryomov 2022-12-01 19:18 ` Eric Biggers 0 siblings, 2 replies; 11+ messages in thread From: xiubli @ 2022-12-01 6:58 UTC (permalink / raw) To: idryomov, ceph-devel; +Cc: jlayton, khiremat, linux-fscrypt, Xiubo Li From: Xiubo Li <xiubli@redhat.com> When close a file it will be deferred to call the fput(), which will hold the inode's i_count. And when unmounting the mountpoint the evict_inodes() may skip evicting some inodes. If encrypt is enabled the kernel generate a warning when removing the encrypt keys when the skipped inodes still hold the keyring: WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0 CPU: 4 PID: 168846 Comm: umount Tainted: G S 6.1.0-rc5-ceph-g72ead199864c #1 Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0 RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00 RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000 RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000 R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40 R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000 FS: 00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> generic_shutdown_super+0x47/0x120 kill_anon_super+0x14/0x30 ceph_kill_sb+0x36/0x90 [ceph] deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x67/0xb0 exit_to_user_mode_prepare+0x23d/0x240 syscall_exit_to_user_mode+0x25/0x60 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fd83dc39e9b URL: https://tracker.ceph.com/issues/58126 Signed-off-by: Xiubo Li <xiubli@redhat.com> --- fs/ceph/super.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 3db6f95768a3..1f46db92e81f 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -9,6 +9,7 @@ #include <linux/in6.h> #include <linux/module.h> #include <linux/mount.h> +#include <linux/file.h> #include <linux/fs_context.h> #include <linux/fs_parser.h> #include <linux/sched.h> @@ -1477,6 +1478,14 @@ static void ceph_kill_sb(struct super_block *s) ceph_mdsc_pre_umount(fsc->mdsc); flush_fs_workqueues(fsc); + /* + * If the encrypt is enabled we need to make sure the delayed + * fput to finish, which will make sure all the inodes will + * be evicted before removing the encrypt keys. + */ + if (s->s_master_keys) + flush_delayed_fput(); + kill_anon_super(s); fsc->client->extra_mon_dispatch = NULL; -- 2.31.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] ceph: make sure all the files successfully put before unmounting 2022-12-01 6:58 [PATCH] ceph: make sure all the files successfully put before unmounting xiubli @ 2022-12-01 13:04 ` Ilya Dryomov 2022-12-01 13:52 ` Xiubo Li 2022-12-01 19:18 ` Eric Biggers 1 sibling, 1 reply; 11+ messages in thread From: Ilya Dryomov @ 2022-12-01 13:04 UTC (permalink / raw) To: xiubli; +Cc: ceph-devel, jlayton, khiremat, linux-fscrypt On Thu, Dec 1, 2022 at 7:58 AM <xiubli@redhat.com> wrote: > > From: Xiubo Li <xiubli@redhat.com> > > When close a file it will be deferred to call the fput(), which > will hold the inode's i_count. And when unmounting the mountpoint > the evict_inodes() may skip evicting some inodes. > > If encrypt is enabled the kernel generate a warning when removing > the encrypt keys when the skipped inodes still hold the keyring: > > WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0 > CPU: 4 PID: 168846 Comm: umount Tainted: G S 6.1.0-rc5-ceph-g72ead199864c #1 > Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 > RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0 > RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202 > RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00 > RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000 > RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000 > R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40 > R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000 > FS: 00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > generic_shutdown_super+0x47/0x120 > kill_anon_super+0x14/0x30 > ceph_kill_sb+0x36/0x90 [ceph] > deactivate_locked_super+0x29/0x60 > cleanup_mnt+0xb8/0x140 > task_work_run+0x67/0xb0 > exit_to_user_mode_prepare+0x23d/0x240 > syscall_exit_to_user_mode+0x25/0x60 > do_syscall_64+0x40/0x80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7fd83dc39e9b > > URL: https://tracker.ceph.com/issues/58126 > Signed-off-by: Xiubo Li <xiubli@redhat.com> > --- > fs/ceph/super.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/fs/ceph/super.c b/fs/ceph/super.c > index 3db6f95768a3..1f46db92e81f 100644 > --- a/fs/ceph/super.c > +++ b/fs/ceph/super.c > @@ -9,6 +9,7 @@ > #include <linux/in6.h> > #include <linux/module.h> > #include <linux/mount.h> > +#include <linux/file.h> > #include <linux/fs_context.h> > #include <linux/fs_parser.h> > #include <linux/sched.h> > @@ -1477,6 +1478,14 @@ static void ceph_kill_sb(struct super_block *s) > ceph_mdsc_pre_umount(fsc->mdsc); > flush_fs_workqueues(fsc); > > + /* > + * If the encrypt is enabled we need to make sure the delayed > + * fput to finish, which will make sure all the inodes will > + * be evicted before removing the encrypt keys. > + */ > + if (s->s_master_keys) > + flush_delayed_fput(); Hi Xiubo, In the tracker ticket comments, you are wondering whether this is a generic fscrypt bug but then proceed with working around it in CephFS: > By reading the code it should be a bug in fs/crypto/ code. When > closing the file it will be delayed in kernel space by adding it into > the delayed_fput_list delay queue. > And if that queue is delayed for some reasons and when unmounting the > mountpoint it will skip evicting the corresponding inode in > evict_inodes(). So the fscrypt_put_encryption_info(), which will > decrease the mk->mk_active_refs reference count, will be missed. And > at last in generic_shutdown_super() will hit this warning. > Still reading the code to see whether could I fix this in ceph layer. If the root cause lies in fs/crypto, I'd rather see it fixed there than papered over in fs/ceph. Thanks, Ilya > + > kill_anon_super(s); > > fsc->client->extra_mon_dispatch = NULL; > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ceph: make sure all the files successfully put before unmounting 2022-12-01 13:04 ` Ilya Dryomov @ 2022-12-01 13:52 ` Xiubo Li 0 siblings, 0 replies; 11+ messages in thread From: Xiubo Li @ 2022-12-01 13:52 UTC (permalink / raw) To: Ilya Dryomov, ebiggers; +Cc: ceph-devel, jlayton, khiremat, linux-fscrypt On 01/12/2022 21:04, Ilya Dryomov wrote: > On Thu, Dec 1, 2022 at 7:58 AM <xiubli@redhat.com> wrote: >> From: Xiubo Li <xiubli@redhat.com> >> >> When close a file it will be deferred to call the fput(), which >> will hold the inode's i_count. And when unmounting the mountpoint >> the evict_inodes() may skip evicting some inodes. >> >> If encrypt is enabled the kernel generate a warning when removing >> the encrypt keys when the skipped inodes still hold the keyring: >> >> WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0 >> CPU: 4 PID: 168846 Comm: umount Tainted: G S 6.1.0-rc5-ceph-g72ead199864c #1 >> Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 >> RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0 >> RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202 >> RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00 >> RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000 >> RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000 >> R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40 >> R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000 >> FS: 00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Call Trace: >> <TASK> >> generic_shutdown_super+0x47/0x120 >> kill_anon_super+0x14/0x30 >> ceph_kill_sb+0x36/0x90 [ceph] >> deactivate_locked_super+0x29/0x60 >> cleanup_mnt+0xb8/0x140 >> task_work_run+0x67/0xb0 >> exit_to_user_mode_prepare+0x23d/0x240 >> syscall_exit_to_user_mode+0x25/0x60 >> do_syscall_64+0x40/0x80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> RIP: 0033:0x7fd83dc39e9b >> >> URL: https://tracker.ceph.com/issues/58126 >> Signed-off-by: Xiubo Li <xiubli@redhat.com> >> --- >> fs/ceph/super.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/fs/ceph/super.c b/fs/ceph/super.c >> index 3db6f95768a3..1f46db92e81f 100644 >> --- a/fs/ceph/super.c >> +++ b/fs/ceph/super.c >> @@ -9,6 +9,7 @@ >> #include <linux/in6.h> >> #include <linux/module.h> >> #include <linux/mount.h> >> +#include <linux/file.h> >> #include <linux/fs_context.h> >> #include <linux/fs_parser.h> >> #include <linux/sched.h> >> @@ -1477,6 +1478,14 @@ static void ceph_kill_sb(struct super_block *s) >> ceph_mdsc_pre_umount(fsc->mdsc); >> flush_fs_workqueues(fsc); >> >> + /* >> + * If the encrypt is enabled we need to make sure the delayed >> + * fput to finish, which will make sure all the inodes will >> + * be evicted before removing the encrypt keys. >> + */ >> + if (s->s_master_keys) >> + flush_delayed_fput(); > Hi Xiubo, > > In the tracker ticket comments, you are wondering whether this > is a generic fscrypt bug but then proceed with working around it > in CephFS: > >> By reading the code it should be a bug in fs/crypto/ code. When >> closing the file it will be delayed in kernel space by adding it into >> the delayed_fput_list delay queue. >> And if that queue is delayed for some reasons and when unmounting the >> mountpoint it will skip evicting the corresponding inode in >> evict_inodes(). So the fscrypt_put_encryption_info(), which will >> decrease the mk->mk_active_refs reference count, will be missed. And >> at last in generic_shutdown_super() will hit this warning. >> Still reading the code to see whether could I fix this in ceph layer. > If the root cause lies in fs/crypto, I'd rather see it fixed there > than papered over in fs/ceph. Hi Ilya, I was thinking maybe we could move this code to generic_shutdown_super() just before evict_inode(). But I am not very sure whether the other filesystems have the same issue. Eric, What do you think ? Will that make sense ? Thanks! - Xiubo > Thanks, > > Ilya > >> + >> kill_anon_super(s); >> >> fsc->client->extra_mon_dispatch = NULL; >> -- >> 2.31.1 >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ceph: make sure all the files successfully put before unmounting 2022-12-01 6:58 [PATCH] ceph: make sure all the files successfully put before unmounting xiubli 2022-12-01 13:04 ` Ilya Dryomov @ 2022-12-01 19:18 ` Eric Biggers 2022-12-01 21:10 ` Eric Biggers 2022-12-02 1:51 ` Xiubo Li 1 sibling, 2 replies; 11+ messages in thread From: Eric Biggers @ 2022-12-01 19:18 UTC (permalink / raw) To: xiubli; +Cc: idryomov, ceph-devel, jlayton, khiremat, linux-fscrypt On Thu, Dec 01, 2022 at 02:58:00PM +0800, xiubli@redhat.com wrote: > From: Xiubo Li <xiubli@redhat.com> > > When close a file it will be deferred to call the fput(), which > will hold the inode's i_count. And when unmounting the mountpoint > the evict_inodes() may skip evicting some inodes. > > If encrypt is enabled the kernel generate a warning when removing > the encrypt keys when the skipped inodes still hold the keyring: This does not make sense. Unmounting is only possible once all the files on the filesystem have been closed. - Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ceph: make sure all the files successfully put before unmounting 2022-12-01 19:18 ` Eric Biggers @ 2022-12-01 21:10 ` Eric Biggers 2022-12-02 1:49 ` Xiubo Li 2022-12-02 1:51 ` Xiubo Li 1 sibling, 1 reply; 11+ messages in thread From: Eric Biggers @ 2022-12-01 21:10 UTC (permalink / raw) To: xiubli; +Cc: idryomov, ceph-devel, jlayton, khiremat, linux-fscrypt On Thu, Dec 01, 2022 at 11:18:33AM -0800, Eric Biggers wrote: > On Thu, Dec 01, 2022 at 02:58:00PM +0800, xiubli@redhat.com wrote: > > From: Xiubo Li <xiubli@redhat.com> > > > > When close a file it will be deferred to call the fput(), which > > will hold the inode's i_count. And when unmounting the mountpoint > > the evict_inodes() may skip evicting some inodes. > > > > If encrypt is enabled the kernel generate a warning when removing > > the encrypt keys when the skipped inodes still hold the keyring: > > This does not make sense. Unmounting is only possible once all the files on the > filesystem have been closed. > Specifically, __fput() puts the reference to the dentry (and thus the inode) *before* it puts the reference to the mount. And an unmount cannot be done while the mount still has references. So there should not be any issue here. - Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ceph: make sure all the files successfully put before unmounting 2022-12-01 21:10 ` Eric Biggers @ 2022-12-02 1:49 ` Xiubo Li 2022-12-02 4:19 ` Eric Biggers 0 siblings, 1 reply; 11+ messages in thread From: Xiubo Li @ 2022-12-02 1:49 UTC (permalink / raw) To: Eric Biggers; +Cc: idryomov, ceph-devel, jlayton, khiremat, linux-fscrypt On 02/12/2022 05:10, Eric Biggers wrote: > On Thu, Dec 01, 2022 at 11:18:33AM -0800, Eric Biggers wrote: >> On Thu, Dec 01, 2022 at 02:58:00PM +0800, xiubli@redhat.com wrote: >>> From: Xiubo Li <xiubli@redhat.com> >>> >>> When close a file it will be deferred to call the fput(), which >>> will hold the inode's i_count. And when unmounting the mountpoint >>> the evict_inodes() may skip evicting some inodes. >>> >>> If encrypt is enabled the kernel generate a warning when removing >>> the encrypt keys when the skipped inodes still hold the keyring: >> This does not make sense. Unmounting is only possible once all the files on the >> filesystem have been closed. >> > Specifically, __fput() puts the reference to the dentry (and thus the inode) > *before* it puts the reference to the mount. And an unmount cannot be done > while the mount still has references. So there should not be any issue here. Eric, When I unmounting I can see the following logs, which I added a debug log in the evcit_inodes(): diff --git a/fs/inode.c b/fs/inode.c index b608528efd3a..f6e69b778d9c 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -716,8 +716,11 @@ void evict_inodes(struct super_block *sb) again: spin_lock(&sb->s_inode_list_lock); list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { - if (atomic_read(&inode->i_count)) + if (atomic_read(&inode->i_count)) { + printk("evict_inodes inode %p, i_count = %d, was skipped!\n", + inode, atomic_read(&inode->i_count)); continue; + } spin_lock(&inode->i_lock); if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { The logs: <4>[ 95.977395] evict_inodes inode 00000000f90aab7b, i_count = 1, was skipped! Any reason could cause this ? Since the inode couldn't be evicted in time and then when removing the master keys it will print this warning. Thanks - Xiubo > - Eric > ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] ceph: make sure all the files successfully put before unmounting 2022-12-02 1:49 ` Xiubo Li @ 2022-12-02 4:19 ` Eric Biggers 2022-12-02 7:04 ` Xiubo Li 0 siblings, 1 reply; 11+ messages in thread From: Eric Biggers @ 2022-12-02 4:19 UTC (permalink / raw) To: Xiubo Li; +Cc: idryomov, ceph-devel, jlayton, khiremat, linux-fscrypt On Fri, Dec 02, 2022 at 09:49:49AM +0800, Xiubo Li wrote: > > On 02/12/2022 05:10, Eric Biggers wrote: > > On Thu, Dec 01, 2022 at 11:18:33AM -0800, Eric Biggers wrote: > > > On Thu, Dec 01, 2022 at 02:58:00PM +0800, xiubli@redhat.com wrote: > > > > From: Xiubo Li <xiubli@redhat.com> > > > > > > > > When close a file it will be deferred to call the fput(), which > > > > will hold the inode's i_count. And when unmounting the mountpoint > > > > the evict_inodes() may skip evicting some inodes. > > > > > > > > If encrypt is enabled the kernel generate a warning when removing > > > > the encrypt keys when the skipped inodes still hold the keyring: > > > This does not make sense. Unmounting is only possible once all the files on the > > > filesystem have been closed. > > > > > Specifically, __fput() puts the reference to the dentry (and thus the inode) > > *before* it puts the reference to the mount. And an unmount cannot be done > > while the mount still has references. So there should not be any issue here. > > Eric, > > When I unmounting I can see the following logs, which I added a debug log in > the evcit_inodes(): > > diff --git a/fs/inode.c b/fs/inode.c > index b608528efd3a..f6e69b778d9c 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -716,8 +716,11 @@ void evict_inodes(struct super_block *sb) > again: > spin_lock(&sb->s_inode_list_lock); > list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { > - if (atomic_read(&inode->i_count)) > + if (atomic_read(&inode->i_count)) { > + printk("evict_inodes inode %p, i_count = %d, was > skipped!\n", > + inode, atomic_read(&inode->i_count)); > continue; > + } > > spin_lock(&inode->i_lock); > if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { > > The logs: > > <4>[ 95.977395] evict_inodes inode 00000000f90aab7b, i_count = 1, was > skipped! > > Any reason could cause this ? Since the inode couldn't be evicted in time > and then when removing the master keys it will print this warning. > It is expected for evict_inodes() to see some inodes with nonzero refcount, but they should only be filesystem internal inodes. For example, with ext4 this happens with the journal inode. However, filesystem internal inodes cannot be encrypted, so they are irrelevant here. I'd guess that CephFS has a bug where it is leaking a reference to a user inode somewhere. (Based on the code, it might also be possible for evict_inodes() to also see nonzero refcount inodes due to fsnotify. However, fsnotify_sb_delete() runs before fscrypt_destroy_keyring(), so likewise it seems irrelevant here.) - Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ceph: make sure all the files successfully put before unmounting 2022-12-02 4:19 ` Eric Biggers @ 2022-12-02 7:04 ` Xiubo Li 2022-12-29 23:53 ` Eric Biggers 0 siblings, 1 reply; 11+ messages in thread From: Xiubo Li @ 2022-12-02 7:04 UTC (permalink / raw) To: Eric Biggers; +Cc: idryomov, ceph-devel, jlayton, khiremat, linux-fscrypt On 02/12/2022 12:19, Eric Biggers wrote: > On Fri, Dec 02, 2022 at 09:49:49AM +0800, Xiubo Li wrote: >> On 02/12/2022 05:10, Eric Biggers wrote: >>> On Thu, Dec 01, 2022 at 11:18:33AM -0800, Eric Biggers wrote: >>>> On Thu, Dec 01, 2022 at 02:58:00PM +0800, xiubli@redhat.com wrote: >>>>> From: Xiubo Li <xiubli@redhat.com> >>>>> >>>>> When close a file it will be deferred to call the fput(), which >>>>> will hold the inode's i_count. And when unmounting the mountpoint >>>>> the evict_inodes() may skip evicting some inodes. >>>>> >>>>> If encrypt is enabled the kernel generate a warning when removing >>>>> the encrypt keys when the skipped inodes still hold the keyring: >>>> This does not make sense. Unmounting is only possible once all the files on the >>>> filesystem have been closed. >>>> >>> Specifically, __fput() puts the reference to the dentry (and thus the inode) >>> *before* it puts the reference to the mount. And an unmount cannot be done >>> while the mount still has references. So there should not be any issue here. >> Eric, >> >> When I unmounting I can see the following logs, which I added a debug log in >> the evcit_inodes(): >> >> diff --git a/fs/inode.c b/fs/inode.c >> index b608528efd3a..f6e69b778d9c 100644 >> --- a/fs/inode.c >> +++ b/fs/inode.c >> @@ -716,8 +716,11 @@ void evict_inodes(struct super_block *sb) >> again: >> spin_lock(&sb->s_inode_list_lock); >> list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { >> - if (atomic_read(&inode->i_count)) >> + if (atomic_read(&inode->i_count)) { >> + printk("evict_inodes inode %p, i_count = %d, was >> skipped!\n", >> + inode, atomic_read(&inode->i_count)); >> continue; >> + } >> >> spin_lock(&inode->i_lock); >> if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { >> >> The logs: >> >> <4>[ 95.977395] evict_inodes inode 00000000f90aab7b, i_count = 1, was >> skipped! >> >> Any reason could cause this ? Since the inode couldn't be evicted in time >> and then when removing the master keys it will print this warning. >> > It is expected for evict_inodes() to see some inodes with nonzero refcount, but > they should only be filesystem internal inodes. For example, with ext4 this > happens with the journal inode. > > However, filesystem internal inodes cannot be encrypted, so they are irrelevant > here. > > I'd guess that CephFS has a bug where it is leaking a reference to a user inode > somewhere. I also added some debug logs to tracker all the inodes in ceph, and all the requests has been finished. I will debug it more to see whether it's leaking a reference here. Thanks Eric. - Xiubo > (Based on the code, it might also be possible for evict_inodes() to also see > nonzero refcount inodes due to fsnotify. However, fsnotify_sb_delete() runs > before fscrypt_destroy_keyring(), so likewise it seems irrelevant here.) > > - Eric > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ceph: make sure all the files successfully put before unmounting 2022-12-02 7:04 ` Xiubo Li @ 2022-12-29 23:53 ` Eric Biggers 2022-12-30 5:44 ` Xiubo Li 0 siblings, 1 reply; 11+ messages in thread From: Eric Biggers @ 2022-12-29 23:53 UTC (permalink / raw) To: Xiubo Li; +Cc: idryomov, ceph-devel, jlayton, khiremat, linux-fscrypt Hi Xiubo, On Fri, Dec 02, 2022 at 03:04:58PM +0800, Xiubo Li wrote: > > On 02/12/2022 12:19, Eric Biggers wrote: > > On Fri, Dec 02, 2022 at 09:49:49AM +0800, Xiubo Li wrote: > > > On 02/12/2022 05:10, Eric Biggers wrote: > > > > On Thu, Dec 01, 2022 at 11:18:33AM -0800, Eric Biggers wrote: > > > > > On Thu, Dec 01, 2022 at 02:58:00PM +0800, xiubli@redhat.com wrote: > > > > > > From: Xiubo Li <xiubli@redhat.com> > > > > > > > > > > > > When close a file it will be deferred to call the fput(), which > > > > > > will hold the inode's i_count. And when unmounting the mountpoint > > > > > > the evict_inodes() may skip evicting some inodes. > > > > > > > > > > > > If encrypt is enabled the kernel generate a warning when removing > > > > > > the encrypt keys when the skipped inodes still hold the keyring: > > > > > This does not make sense. Unmounting is only possible once all the files on the > > > > > filesystem have been closed. > > > > > > > > > Specifically, __fput() puts the reference to the dentry (and thus the inode) > > > > *before* it puts the reference to the mount. And an unmount cannot be done > > > > while the mount still has references. So there should not be any issue here. > > > Eric, > > > > > > When I unmounting I can see the following logs, which I added a debug log in > > > the evcit_inodes(): > > > > > > diff --git a/fs/inode.c b/fs/inode.c > > > index b608528efd3a..f6e69b778d9c 100644 > > > --- a/fs/inode.c > > > +++ b/fs/inode.c > > > @@ -716,8 +716,11 @@ void evict_inodes(struct super_block *sb) > > > again: > > > spin_lock(&sb->s_inode_list_lock); > > > list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { > > > - if (atomic_read(&inode->i_count)) > > > + if (atomic_read(&inode->i_count)) { > > > + printk("evict_inodes inode %p, i_count = %d, was > > > skipped!\n", > > > + inode, atomic_read(&inode->i_count)); > > > continue; > > > + } > > > > > > spin_lock(&inode->i_lock); > > > if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { > > > > > > The logs: > > > > > > <4>[ 95.977395] evict_inodes inode 00000000f90aab7b, i_count = 1, was > > > skipped! > > > > > > Any reason could cause this ? Since the inode couldn't be evicted in time > > > and then when removing the master keys it will print this warning. > > > > > It is expected for evict_inodes() to see some inodes with nonzero refcount, but > > they should only be filesystem internal inodes. For example, with ext4 this > > happens with the journal inode. > > > > However, filesystem internal inodes cannot be encrypted, so they are irrelevant > > here. > > > > I'd guess that CephFS has a bug where it is leaking a reference to a user inode > > somewhere. > > I also added some debug logs to tracker all the inodes in ceph, and all the > requests has been finished. > > I will debug it more to see whether it's leaking a reference here. > > Thanks Eric. > Any progress on tracking this down? - Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ceph: make sure all the files successfully put before unmounting 2022-12-29 23:53 ` Eric Biggers @ 2022-12-30 5:44 ` Xiubo Li 0 siblings, 0 replies; 11+ messages in thread From: Xiubo Li @ 2022-12-30 5:44 UTC (permalink / raw) To: Eric Biggers; +Cc: idryomov, ceph-devel, jlayton, khiremat, linux-fscrypt Hi Eric, Happy New Year! Yeah, it's a ceph side bug and I have sent a patch to fix it [1]. When unmounting and just before closing the sessions the cephfs server still could send cap message to kceph and it will hold the inodes. So the unmount will skip them. IMO it still makes sense to improve the vfs code because I hit a crash after this happening but just one time. [1] https://patchwork.kernel.org/project/ceph-devel/patch/20221221093031.132792-1-xiubli@redhat.com/ Thanks - Xiubo On 30/12/2022 07:53, Eric Biggers wrote: > Hi Xiubo, > > On Fri, Dec 02, 2022 at 03:04:58PM +0800, Xiubo Li wrote: >> On 02/12/2022 12:19, Eric Biggers wrote: >>> On Fri, Dec 02, 2022 at 09:49:49AM +0800, Xiubo Li wrote: >>>> On 02/12/2022 05:10, Eric Biggers wrote: >>>>> On Thu, Dec 01, 2022 at 11:18:33AM -0800, Eric Biggers wrote: >>>>>> On Thu, Dec 01, 2022 at 02:58:00PM +0800, xiubli@redhat.com wrote: >>>>>>> From: Xiubo Li <xiubli@redhat.com> >>>>>>> >>>>>>> When close a file it will be deferred to call the fput(), which >>>>>>> will hold the inode's i_count. And when unmounting the mountpoint >>>>>>> the evict_inodes() may skip evicting some inodes. >>>>>>> >>>>>>> If encrypt is enabled the kernel generate a warning when removing >>>>>>> the encrypt keys when the skipped inodes still hold the keyring: >>>>>> This does not make sense. Unmounting is only possible once all the files on the >>>>>> filesystem have been closed. >>>>>> >>>>> Specifically, __fput() puts the reference to the dentry (and thus the inode) >>>>> *before* it puts the reference to the mount. And an unmount cannot be done >>>>> while the mount still has references. So there should not be any issue here. >>>> Eric, >>>> >>>> When I unmounting I can see the following logs, which I added a debug log in >>>> the evcit_inodes(): >>>> >>>> diff --git a/fs/inode.c b/fs/inode.c >>>> index b608528efd3a..f6e69b778d9c 100644 >>>> --- a/fs/inode.c >>>> +++ b/fs/inode.c >>>> @@ -716,8 +716,11 @@ void evict_inodes(struct super_block *sb) >>>> again: >>>> spin_lock(&sb->s_inode_list_lock); >>>> list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { >>>> - if (atomic_read(&inode->i_count)) >>>> + if (atomic_read(&inode->i_count)) { >>>> + printk("evict_inodes inode %p, i_count = %d, was >>>> skipped!\n", >>>> + inode, atomic_read(&inode->i_count)); >>>> continue; >>>> + } >>>> >>>> spin_lock(&inode->i_lock); >>>> if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { >>>> >>>> The logs: >>>> >>>> <4>[ 95.977395] evict_inodes inode 00000000f90aab7b, i_count = 1, was >>>> skipped! >>>> >>>> Any reason could cause this ? Since the inode couldn't be evicted in time >>>> and then when removing the master keys it will print this warning. >>>> >>> It is expected for evict_inodes() to see some inodes with nonzero refcount, but >>> they should only be filesystem internal inodes. For example, with ext4 this >>> happens with the journal inode. >>> >>> However, filesystem internal inodes cannot be encrypted, so they are irrelevant >>> here. >>> >>> I'd guess that CephFS has a bug where it is leaking a reference to a user inode >>> somewhere. >> I also added some debug logs to tracker all the inodes in ceph, and all the >> requests has been finished. >> >> I will debug it more to see whether it's leaking a reference here. >> >> Thanks Eric. >> > Any progress on tracking this down? > > - Eric > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ceph: make sure all the files successfully put before unmounting 2022-12-01 19:18 ` Eric Biggers 2022-12-01 21:10 ` Eric Biggers @ 2022-12-02 1:51 ` Xiubo Li 1 sibling, 0 replies; 11+ messages in thread From: Xiubo Li @ 2022-12-02 1:51 UTC (permalink / raw) To: Eric Biggers; +Cc: idryomov, ceph-devel, jlayton, khiremat, linux-fscrypt On 02/12/2022 03:18, Eric Biggers wrote: > On Thu, Dec 01, 2022 at 02:58:00PM +0800, xiubli@redhat.com wrote: >> From: Xiubo Li <xiubli@redhat.com> >> >> When close a file it will be deferred to call the fput(), which >> will hold the inode's i_count. And when unmounting the mountpoint >> the evict_inodes() may skip evicting some inodes. >> >> If encrypt is enabled the kernel generate a warning when removing >> the encrypt keys when the skipped inodes still hold the keyring: > This does not make sense. Unmounting is only possible once all the files on the > filesystem have been closed. Yeah, but I didn't see any where is checking this. Maybe I missed something important. - Xiubo > - Eric > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-12-30 5:45 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-12-01 6:58 [PATCH] ceph: make sure all the files successfully put before unmounting xiubli 2022-12-01 13:04 ` Ilya Dryomov 2022-12-01 13:52 ` Xiubo Li 2022-12-01 19:18 ` Eric Biggers 2022-12-01 21:10 ` Eric Biggers 2022-12-02 1:49 ` Xiubo Li 2022-12-02 4:19 ` Eric Biggers 2022-12-02 7:04 ` Xiubo Li 2022-12-29 23:53 ` Eric Biggers 2022-12-30 5:44 ` Xiubo Li 2022-12-02 1:51 ` Xiubo Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox