* Re: rcu_sync_dtor() warning question [not found] <20240726-ansonsten-piste-c4f04d4909fc@brauner> @ 2024-07-26 18:02 ` Paul E. McKenney 2024-07-26 18:39 ` Mateusz Guzik 2024-07-29 8:26 ` Christian Brauner 0 siblings, 2 replies; 9+ messages in thread From: Paul E. McKenney @ 2024-07-26 18:02 UTC (permalink / raw) To: Christian Brauner; +Cc: Mateusz Guzik, Oleg Nesterov, rcu On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote: > Hey, > > I could use some help with understanding a bug related to rcu that was > reported today. It first seems to have shown up on the 25th of July: > > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7 > > We seem to be hitting the WARN_ON_ONCE() in: > > void rcu_sync_dtor(struct rcu_sync *rsp) > { > int gp_state; > > WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED); > > from destroy_super_work() which gets called when a superblock is really freed. > > If the superblock has been visible in userspace we do it via call_rcu(): > > static void destroy_super_work(struct work_struct *work) > { > struct super_block *s = container_of(work, struct super_block, > destroy_work); > fsnotify_sb_free(s); > security_sb_free(s); > put_user_ns(s->s_user_ns); > kfree(s->s_subtype); > for (int i = 0; i < SB_FREEZE_LEVELS; i++) > percpu_free_rwsem(&s->s_writers.rw_sem[i]); > kfree(s); > } > > static void destroy_super_rcu(struct rcu_head *head) > { > struct super_block *s = container_of(head, struct super_block, rcu); > INIT_WORK(&s->destroy_work, destroy_super_work); > schedule_work(&s->destroy_work); > } > > And I'm really confused because I don't understand the details for sync > rcu enough to come up with a clear problem statement even. Could someone > please explain what the WARN_ON_ONCE() is about? If I am not too confused (and Oleg will correct me if I am), this is checking a use-after-free error. A given rcu_sync structure normally transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with possible side-trips from the GP_EXIT state through GP_REPLAY and back to GP_EXIT in special cases such as during early boot. One way that this can happen is calling percpu_free_rwsem() without having previously done the percpu_up_write() needed to match an earlier percpu_down_write(). I don't see any other way that this can happen right off-hand, but someone might have added something, or I might be suffering a failure of imagination. > (I had been wondering if this is related to commit 7180f8d91fcb ("vfs: > add rcu-based find_inode variants for iget ops") and the fix in commit > 5bc9ad78c2f8 ("vfs: handle __wait_on_freeing_inode() and evict() race") > but I don't see how.) I don't see how either, unless it introduced some timing change that made the scenario above more probable. > Thanks for taking a look! Hope this helps! If not, you know where to find me. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rcu_sync_dtor() warning question 2024-07-26 18:02 ` rcu_sync_dtor() warning question Paul E. McKenney @ 2024-07-26 18:39 ` Mateusz Guzik 2024-07-26 18:43 ` Mateusz Guzik 2024-07-29 8:26 ` Christian Brauner 1 sibling, 1 reply; 9+ messages in thread From: Mateusz Guzik @ 2024-07-26 18:39 UTC (permalink / raw) To: paulmck; +Cc: Christian Brauner, Oleg Nesterov, rcu On Fri, Jul 26, 2024 at 8:02 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote: > > Hey, > > > > I could use some help with understanding a bug related to rcu that was > > reported today. It first seems to have shown up on the 25th of July: > > > > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7 > > > > We seem to be hitting the WARN_ON_ONCE() in: > > > > void rcu_sync_dtor(struct rcu_sync *rsp) > > { > > int gp_state; > > > > WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED); > > > > from destroy_super_work() which gets called when a superblock is really freed. > > > > If the superblock has been visible in userspace we do it via call_rcu(): > > > > static void destroy_super_work(struct work_struct *work) > > { > > struct super_block *s = container_of(work, struct super_block, > > destroy_work); > > fsnotify_sb_free(s); > > security_sb_free(s); > > put_user_ns(s->s_user_ns); > > kfree(s->s_subtype); > > for (int i = 0; i < SB_FREEZE_LEVELS; i++) > > percpu_free_rwsem(&s->s_writers.rw_sem[i]); > > kfree(s); > > } > > > > static void destroy_super_rcu(struct rcu_head *head) > > { > > struct super_block *s = container_of(head, struct super_block, rcu); > > INIT_WORK(&s->destroy_work, destroy_super_work); > > schedule_work(&s->destroy_work); > > } > > > > And I'm really confused because I don't understand the details for sync > > rcu enough to come up with a clear problem statement even. Could someone > > please explain what the WARN_ON_ONCE() is about? > > If I am not too confused (and Oleg will correct me if I am), this is > checking a use-after-free error. A given rcu_sync structure normally > transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with > possible side-trips from the GP_EXIT state through GP_REPLAY and back > to GP_EXIT in special cases such as during early boot. > use-after-free? In that case I have a candidate for a culprit. Code prior to any of my changes was doing the following in iget_locked(): spin_lock(&inode_hash_lock); inode = find_inode_fast(sb, head, ino); spin_unlock(&inode_hash_lock); if (inode) { if (IS_ERR(inode)) return NULL; wait_on_inode(inode); if (unlikely(inode_unhashed(inode))) { iput(inode); goto again; } return inode; } My patch removed the spinlock acquire and made it significantly more likely for the code to end up doing the wait_on_inode + inode_unhashed combo when racing against inode teardown. Now that you bring up use-after-free I'm not particularly confident the stock code is correct. For example evict_inodes() -> evict() can mess with the inode and result in the iput() call in iget_locked(), which then will invoke evict() again. And I'm not particularly confident the routine + everything it calls is idempotent. That's from a quick poke around, maybe I missed something. syzkaller claims to have a reproducer. Trivial usage in my debug vm does not result in anything, so I may need to grab their entire setup to reproduce. I'm going to look into it. -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rcu_sync_dtor() warning question 2024-07-26 18:39 ` Mateusz Guzik @ 2024-07-26 18:43 ` Mateusz Guzik 2024-07-26 20:21 ` Paul E. McKenney ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Mateusz Guzik @ 2024-07-26 18:43 UTC (permalink / raw) To: paulmck; +Cc: Christian Brauner, Oleg Nesterov, rcu On Fri, Jul 26, 2024 at 8:39 PM Mateusz Guzik <mjguzik@gmail.com> wrote: > > On Fri, Jul 26, 2024 at 8:02 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote: > > > Hey, > > > > > > I could use some help with understanding a bug related to rcu that was > > > reported today. It first seems to have shown up on the 25th of July: > > > > > > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7 > > > > > > We seem to be hitting the WARN_ON_ONCE() in: > > > > > > void rcu_sync_dtor(struct rcu_sync *rsp) > > > { > > > int gp_state; > > > > > > WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED); > > > > > > from destroy_super_work() which gets called when a superblock is really freed. > > > > > > If the superblock has been visible in userspace we do it via call_rcu(): > > > > > > static void destroy_super_work(struct work_struct *work) > > > { > > > struct super_block *s = container_of(work, struct super_block, > > > destroy_work); > > > fsnotify_sb_free(s); > > > security_sb_free(s); > > > put_user_ns(s->s_user_ns); > > > kfree(s->s_subtype); > > > for (int i = 0; i < SB_FREEZE_LEVELS; i++) > > > percpu_free_rwsem(&s->s_writers.rw_sem[i]); > > > kfree(s); > > > } > > > > > > static void destroy_super_rcu(struct rcu_head *head) > > > { > > > struct super_block *s = container_of(head, struct super_block, rcu); > > > INIT_WORK(&s->destroy_work, destroy_super_work); > > > schedule_work(&s->destroy_work); > > > } > > > > > > And I'm really confused because I don't understand the details for sync > > > rcu enough to come up with a clear problem statement even. Could someone > > > please explain what the WARN_ON_ONCE() is about? > > > > If I am not too confused (and Oleg will correct me if I am), this is > > checking a use-after-free error. A given rcu_sync structure normally > > transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with > > possible side-trips from the GP_EXIT state through GP_REPLAY and back > > to GP_EXIT in special cases such as during early boot. > > > > use-after-free? In that case I have a candidate for a culprit. > > Code prior to any of my changes was doing the following in iget_locked(): > spin_lock(&inode_hash_lock); > inode = find_inode_fast(sb, head, ino); > spin_unlock(&inode_hash_lock); > if (inode) { > if (IS_ERR(inode)) > return NULL; > wait_on_inode(inode); > if (unlikely(inode_unhashed(inode))) { > iput(inode); > goto again; > } > return inode; > } > > My patch removed the spinlock acquire and made it significantly more > likely for the code to end up doing the wait_on_inode + inode_unhashed > combo when racing against inode teardown. > > Now that you bring up use-after-free I'm not particularly confident > the stock code is correct. > > For example evict_inodes() -> evict() can mess with the inode and > result in the iput() call in iget_locked(), which then will invoke > evict() again. And I'm not particularly confident the routine + > everything it calls is idempotent. > > That's from a quick poke around, maybe I missed something. > > syzkaller claims to have a reproducer. Trivial usage in my debug vm > does not result in anything, so I may need to grab their entire setup > to reproduce. > > I'm going to look into it. Welp. syzbot did the bisect, it's not any of the above, instead: commit b62e71be2110d8b52bf5faf3c3ed7ca1a0c113a5 Author: Chao Yu <chao@kernel.org> Date: Sun Apr 23 15:49:15 2023 +0000 f2fs: support errors=remount-ro|continue|panic mountoption https://lore.kernel.org/linux-fsdevel/0000000000004ff2dc061e281637@google.com/T/#m90c03813e12e5cdff1eeada8f9ab581d5f039c76 That said, the stuff I mentioned still looks highly suspicious so I have to something to investigate regardless. I think this thread can be abandoned. :) -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rcu_sync_dtor() warning question 2024-07-26 18:43 ` Mateusz Guzik @ 2024-07-26 20:21 ` Paul E. McKenney 2024-07-28 13:41 ` Oleg Nesterov 2024-07-29 8:28 ` Christian Brauner 2 siblings, 0 replies; 9+ messages in thread From: Paul E. McKenney @ 2024-07-26 20:21 UTC (permalink / raw) To: Mateusz Guzik; +Cc: Christian Brauner, Oleg Nesterov, rcu On Fri, Jul 26, 2024 at 08:43:01PM +0200, Mateusz Guzik wrote: > On Fri, Jul 26, 2024 at 8:39 PM Mateusz Guzik <mjguzik@gmail.com> wrote: > > > > On Fri, Jul 26, 2024 at 8:02 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote: > > > > Hey, > > > > > > > > I could use some help with understanding a bug related to rcu that was > > > > reported today. It first seems to have shown up on the 25th of July: > > > > > > > > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7 > > > > > > > > We seem to be hitting the WARN_ON_ONCE() in: > > > > > > > > void rcu_sync_dtor(struct rcu_sync *rsp) > > > > { > > > > int gp_state; > > > > > > > > WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED); > > > > > > > > from destroy_super_work() which gets called when a superblock is really freed. > > > > > > > > If the superblock has been visible in userspace we do it via call_rcu(): > > > > > > > > static void destroy_super_work(struct work_struct *work) > > > > { > > > > struct super_block *s = container_of(work, struct super_block, > > > > destroy_work); > > > > fsnotify_sb_free(s); > > > > security_sb_free(s); > > > > put_user_ns(s->s_user_ns); > > > > kfree(s->s_subtype); > > > > for (int i = 0; i < SB_FREEZE_LEVELS; i++) > > > > percpu_free_rwsem(&s->s_writers.rw_sem[i]); > > > > kfree(s); > > > > } > > > > > > > > static void destroy_super_rcu(struct rcu_head *head) > > > > { > > > > struct super_block *s = container_of(head, struct super_block, rcu); > > > > INIT_WORK(&s->destroy_work, destroy_super_work); > > > > schedule_work(&s->destroy_work); > > > > } > > > > > > > > And I'm really confused because I don't understand the details for sync > > > > rcu enough to come up with a clear problem statement even. Could someone > > > > please explain what the WARN_ON_ONCE() is about? > > > > > > If I am not too confused (and Oleg will correct me if I am), this is > > > checking a use-after-free error. A given rcu_sync structure normally > > > transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with > > > possible side-trips from the GP_EXIT state through GP_REPLAY and back > > > to GP_EXIT in special cases such as during early boot. > > > > > > > use-after-free? In that case I have a candidate for a culprit. > > > > Code prior to any of my changes was doing the following in iget_locked(): > > spin_lock(&inode_hash_lock); > > inode = find_inode_fast(sb, head, ino); > > spin_unlock(&inode_hash_lock); > > if (inode) { > > if (IS_ERR(inode)) > > return NULL; > > wait_on_inode(inode); > > if (unlikely(inode_unhashed(inode))) { > > iput(inode); > > goto again; > > } > > return inode; > > } > > > > My patch removed the spinlock acquire and made it significantly more > > likely for the code to end up doing the wait_on_inode + inode_unhashed > > combo when racing against inode teardown. > > > > Now that you bring up use-after-free I'm not particularly confident > > the stock code is correct. > > > > For example evict_inodes() -> evict() can mess with the inode and > > result in the iput() call in iget_locked(), which then will invoke > > evict() again. And I'm not particularly confident the routine + > > everything it calls is idempotent. > > > > That's from a quick poke around, maybe I missed something. > > > > syzkaller claims to have a reproducer. Trivial usage in my debug vm > > does not result in anything, so I may need to grab their entire setup > > to reproduce. > > > > I'm going to look into it. > > Welp. > > syzbot did the bisect, it's not any of the above, instead: > > commit b62e71be2110d8b52bf5faf3c3ed7ca1a0c113a5 > Author: Chao Yu <chao@kernel.org> > Date: Sun Apr 23 15:49:15 2023 +0000 > > f2fs: support errors=remount-ro|continue|panic mountoption > > https://lore.kernel.org/linux-fsdevel/0000000000004ff2dc061e281637@google.com/T/#m90c03813e12e5cdff1eeada8f9ab581d5f039c76 > > That said, the stuff I mentioned still looks highly suspicious so I > have to something to investigate regardless. > > I think this thread can be abandoned. :) Thank you for digging into this one! Thanx, Paul ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rcu_sync_dtor() warning question 2024-07-26 18:43 ` Mateusz Guzik 2024-07-26 20:21 ` Paul E. McKenney @ 2024-07-28 13:41 ` Oleg Nesterov 2024-07-29 8:25 ` Christian Brauner 2024-07-29 8:28 ` Christian Brauner 2 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2024-07-28 13:41 UTC (permalink / raw) To: Mateusz Guzik, Hillf Danton; +Cc: paulmck, Christian Brauner, rcu Sorry for late reply. You do not need my help, I know nothing about fs ;) but just in case... On 07/26, Mateusz Guzik wrote: > > Welp. > > syzbot did the bisect, it's not any of the above, instead: > > commit b62e71be2110d8b52bf5faf3c3ed7ca1a0c113a5 > Author: Chao Yu <chao@kernel.org> > Date: Sun Apr 23 15:49:15 2023 +0000 > > f2fs: support errors=remount-ro|continue|panic mountoption > > https://lore.kernel.org/linux-fsdevel/0000000000004ff2dc061e281637@google.com/T/#m90c03813e12e5cdff1eeada8f9ab581d5f039c76 > > That said, the stuff I mentioned still looks highly suspicious so I > have to something to investigate regardless. Did you see the patch from Hillf ? https://lore.kernel.org/all/20240727011616.2144-1-hdanton@sina.com/ it seems to fix the problem... Of course I don't understand this patch, but afaics SB_RDONLY can confuse thaw_super_locked(). If sb_rdonly() is true, thaw_super_locked() assumes that freeze_super() didn't call sb_wait_write() -> percpu_down_write(). So in this case thaw_super_locked() just clears sb->s_writers.frozen and goes to the "out_deactivate" label bypassing sb_freeze_unlock(). Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rcu_sync_dtor() warning question 2024-07-28 13:41 ` Oleg Nesterov @ 2024-07-29 8:25 ` Christian Brauner 2024-07-29 9:02 ` Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Christian Brauner @ 2024-07-29 8:25 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Mateusz Guzik, Hillf Danton, paulmck, rcu On Sun, Jul 28, 2024 at 03:41:01PM GMT, Oleg Nesterov wrote: > Sorry for late reply. > > You do not need my help, I know nothing about fs ;) but just in case... > > On 07/26, Mateusz Guzik wrote: > > > > Welp. > > > > syzbot did the bisect, it's not any of the above, instead: > > > > commit b62e71be2110d8b52bf5faf3c3ed7ca1a0c113a5 > > Author: Chao Yu <chao@kernel.org> > > Date: Sun Apr 23 15:49:15 2023 +0000 > > > > f2fs: support errors=remount-ro|continue|panic mountoption > > > > https://lore.kernel.org/linux-fsdevel/0000000000004ff2dc061e281637@google.com/T/#m90c03813e12e5cdff1eeada8f9ab581d5f039c76 > > > > That said, the stuff I mentioned still looks highly suspicious so I > > have to something to investigate regardless. > > Did you see the patch from Hillf ? > https://lore.kernel.org/all/20240727011616.2144-1-hdanton@sina.com/ > it seems to fix the problem... I only skimmed the bisect while I was entertaining a toddler during a trainride. :) > > Of course I don't understand this patch, but afaics SB_RDONLY can confuse > thaw_super_locked(). If sb_rdonly() is true, thaw_super_locked() assumes > that freeze_super() didn't call sb_wait_write() -> percpu_down_write(). > > So in this case thaw_super_locked() just clears sb->s_writers.frozen and > goes to the "out_deactivate" label bypassing sb_freeze_unlock(). So the thing is that remounting a superblock read-only directly without going through the VFS first is pretty risky and is likely to cause trouble. The order of operations to transition rw->ro: (1) Check that the filsystem is unfrozen. If not, fail with EBUSY. (2) Mark all mounts of that filesystem as read-only. Fail with EBUSY if there are any active writers on any of the mounts. (3) Call into ->reconfigure() method of the filesystem so it can do it's internal thing to mount read-only. All of that requires sb->s_umount semaphore to be held. But f2fs seems to transition rw->ro without holding sb->s_umount in various callsites of: f2fs_stop_checkpoint() -> f2fs_handle_critical_error() So that would mean freeze_super() would've indeed acquired percpu_down_write(SB_FREEZE_WRITE+SB_FREEZE_PAGEFAULT+SB_FREEZE_FS) but thaw_super() would skip calling sb_freeze_unlock(SB_FREEZE_FS+SB_FREEZE_PAGEFAULT+SB_FREEZE_WRITE) and destroying the superblock during umount with an imbalance that gets noticed in rcu_sync_dtor(). So afaict this would mean that we never called rcu_sync_exit() and thus never set GP_EXIT and notice that imbalance during rcu_sync_dtor(). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rcu_sync_dtor() warning question 2024-07-29 8:25 ` Christian Brauner @ 2024-07-29 9:02 ` Oleg Nesterov 0 siblings, 0 replies; 9+ messages in thread From: Oleg Nesterov @ 2024-07-29 9:02 UTC (permalink / raw) To: Christian Brauner; +Cc: Mateusz Guzik, Hillf Danton, paulmck, rcu On 07/29, Christian Brauner wrote: > > On Sun, Jul 28, 2024 at 03:41:01PM GMT, Oleg Nesterov wrote: > > > > So in this case thaw_super_locked() just clears sb->s_writers.frozen and > > goes to the "out_deactivate" label bypassing sb_freeze_unlock(). ... > So that would mean freeze_super() would've indeed acquired > percpu_down_write(SB_FREEZE_WRITE+SB_FREEZE_PAGEFAULT+SB_FREEZE_FS) but > thaw_super() would skip calling > sb_freeze_unlock(SB_FREEZE_FS+SB_FREEZE_PAGEFAULT+SB_FREEZE_WRITE) and > destroying the superblock during umount with an imbalance that gets > noticed in rcu_sync_dtor(). > > So afaict this would mean that we never called rcu_sync_exit() and thus > never set GP_EXIT and notice that imbalance during rcu_sync_dtor(). Yes, yes, this is what I meant. Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rcu_sync_dtor() warning question 2024-07-26 18:43 ` Mateusz Guzik 2024-07-26 20:21 ` Paul E. McKenney 2024-07-28 13:41 ` Oleg Nesterov @ 2024-07-29 8:28 ` Christian Brauner 2 siblings, 0 replies; 9+ messages in thread From: Christian Brauner @ 2024-07-29 8:28 UTC (permalink / raw) To: Mateusz Guzik; +Cc: paulmck, Oleg Nesterov, rcu On Fri, Jul 26, 2024 at 08:43:01PM GMT, Mateusz Guzik wrote: > On Fri, Jul 26, 2024 at 8:39 PM Mateusz Guzik <mjguzik@gmail.com> wrote: > > > > On Fri, Jul 26, 2024 at 8:02 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote: > > > > Hey, > > > > > > > > I could use some help with understanding a bug related to rcu that was > > > > reported today. It first seems to have shown up on the 25th of July: > > > > > > > > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7 > > > > > > > > We seem to be hitting the WARN_ON_ONCE() in: > > > > > > > > void rcu_sync_dtor(struct rcu_sync *rsp) > > > > { > > > > int gp_state; > > > > > > > > WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED); > > > > > > > > from destroy_super_work() which gets called when a superblock is really freed. > > > > > > > > If the superblock has been visible in userspace we do it via call_rcu(): > > > > > > > > static void destroy_super_work(struct work_struct *work) > > > > { > > > > struct super_block *s = container_of(work, struct super_block, > > > > destroy_work); > > > > fsnotify_sb_free(s); > > > > security_sb_free(s); > > > > put_user_ns(s->s_user_ns); > > > > kfree(s->s_subtype); > > > > for (int i = 0; i < SB_FREEZE_LEVELS; i++) > > > > percpu_free_rwsem(&s->s_writers.rw_sem[i]); > > > > kfree(s); > > > > } > > > > > > > > static void destroy_super_rcu(struct rcu_head *head) > > > > { > > > > struct super_block *s = container_of(head, struct super_block, rcu); > > > > INIT_WORK(&s->destroy_work, destroy_super_work); > > > > schedule_work(&s->destroy_work); > > > > } > > > > > > > > And I'm really confused because I don't understand the details for sync > > > > rcu enough to come up with a clear problem statement even. Could someone > > > > please explain what the WARN_ON_ONCE() is about? > > > > > > If I am not too confused (and Oleg will correct me if I am), this is > > > checking a use-after-free error. A given rcu_sync structure normally > > > transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with > > > possible side-trips from the GP_EXIT state through GP_REPLAY and back > > > to GP_EXIT in special cases such as during early boot. > > > > > > > use-after-free? In that case I have a candidate for a culprit. > > > > Code prior to any of my changes was doing the following in iget_locked(): > > spin_lock(&inode_hash_lock); > > inode = find_inode_fast(sb, head, ino); > > spin_unlock(&inode_hash_lock); > > if (inode) { > > if (IS_ERR(inode)) > > return NULL; > > wait_on_inode(inode); > > if (unlikely(inode_unhashed(inode))) { > > iput(inode); > > goto again; > > } > > return inode; > > } > > > > My patch removed the spinlock acquire and made it significantly more > > likely for the code to end up doing the wait_on_inode + inode_unhashed > > combo when racing against inode teardown. > > > > Now that you bring up use-after-free I'm not particularly confident > > the stock code is correct. > > > > For example evict_inodes() -> evict() can mess with the inode and > > result in the iput() call in iget_locked(), which then will invoke > > evict() again. And I'm not particularly confident the routine + > > everything it calls is idempotent. > > > > That's from a quick poke around, maybe I missed something. > > > > syzkaller claims to have a reproducer. Trivial usage in my debug vm > > does not result in anything, so I may need to grab their entire setup > > to reproduce. > > > > I'm going to look into it. > > Welp. > > syzbot did the bisect, it's not any of the above, instead: > > commit b62e71be2110d8b52bf5faf3c3ed7ca1a0c113a5 > Author: Chao Yu <chao@kernel.org> > Date: Sun Apr 23 15:49:15 2023 +0000 > > f2fs: support errors=remount-ro|continue|panic mountoption > > https://lore.kernel.org/linux-fsdevel/0000000000004ff2dc061e281637@google.com/T/#m90c03813e12e5cdff1eeada8f9ab581d5f039c76 > > That said, the stuff I mentioned still looks highly suspicious so I > have to something to investigate regardless. Thanks for looking into this! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rcu_sync_dtor() warning question 2024-07-26 18:02 ` rcu_sync_dtor() warning question Paul E. McKenney 2024-07-26 18:39 ` Mateusz Guzik @ 2024-07-29 8:26 ` Christian Brauner 1 sibling, 0 replies; 9+ messages in thread From: Christian Brauner @ 2024-07-29 8:26 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Mateusz Guzik, Oleg Nesterov, rcu On Fri, Jul 26, 2024 at 11:02:35AM GMT, Paul E. McKenney wrote: > On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote: > > Hey, > > > > I could use some help with understanding a bug related to rcu that was > > reported today. It first seems to have shown up on the 25th of July: > > > > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7 > > > > We seem to be hitting the WARN_ON_ONCE() in: > > > > void rcu_sync_dtor(struct rcu_sync *rsp) > > { > > int gp_state; > > > > WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED); > > > > from destroy_super_work() which gets called when a superblock is really freed. > > > > If the superblock has been visible in userspace we do it via call_rcu(): > > > > static void destroy_super_work(struct work_struct *work) > > { > > struct super_block *s = container_of(work, struct super_block, > > destroy_work); > > fsnotify_sb_free(s); > > security_sb_free(s); > > put_user_ns(s->s_user_ns); > > kfree(s->s_subtype); > > for (int i = 0; i < SB_FREEZE_LEVELS; i++) > > percpu_free_rwsem(&s->s_writers.rw_sem[i]); > > kfree(s); > > } > > > > static void destroy_super_rcu(struct rcu_head *head) > > { > > struct super_block *s = container_of(head, struct super_block, rcu); > > INIT_WORK(&s->destroy_work, destroy_super_work); > > schedule_work(&s->destroy_work); > > } > > > > And I'm really confused because I don't understand the details for sync > > rcu enough to come up with a clear problem statement even. Could someone > > please explain what the WARN_ON_ONCE() is about? > > If I am not too confused (and Oleg will correct me if I am), this is > checking a use-after-free error. A given rcu_sync structure normally > transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with > possible side-trips from the GP_EXIT state through GP_REPLAY and back > to GP_EXIT in special cases such as during early boot. > > One way that this can happen is calling percpu_free_rwsem() without > having previously done the percpu_up_write() needed to match an earlier > percpu_down_write(). I don't see any other way that this can happen > right off-hand, but someone might have added something, or I might be > suffering a failure of imagination. And you were spot on of course! > > > (I had been wondering if this is related to commit 7180f8d91fcb ("vfs: > > add rcu-based find_inode variants for iget ops") and the fix in commit > > 5bc9ad78c2f8 ("vfs: handle __wait_on_freeing_inode() and evict() race") > > but I don't see how.) > > I don't see how either, unless it introduced some timing change > that made the scenario above more probable. > > > Thanks for taking a look! > > Hope this helps! If not, you know where to find me. ;-) Thank you for taking the time, Paul. That was indeed helpful! ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-07-29 9:03 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240726-ansonsten-piste-c4f04d4909fc@brauner>
2024-07-26 18:02 ` rcu_sync_dtor() warning question Paul E. McKenney
2024-07-26 18:39 ` Mateusz Guzik
2024-07-26 18:43 ` Mateusz Guzik
2024-07-26 20:21 ` Paul E. McKenney
2024-07-28 13:41 ` Oleg Nesterov
2024-07-29 8:25 ` Christian Brauner
2024-07-29 9:02 ` Oleg Nesterov
2024-07-29 8:28 ` Christian Brauner
2024-07-29 8:26 ` Christian Brauner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.