Re: rcu_sync_dtor() warning question

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: rcu_sync_dtor() warning question
       [not found] <20240726-ansonsten-piste-c4f04d4909fc@brauner>
@ 2024-07-26 18:02 ` Paul E. McKenney
  2024-07-26 18:39   ` Mateusz Guzik
  2024-07-29  8:26   ` Christian Brauner
  0 siblings, 2 replies; 9+ messages in thread
From: Paul E. McKenney @ 2024-07-26 18:02 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Mateusz Guzik, Oleg Nesterov, rcu

On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote:
> Hey,
> 
> I could use some help with understanding a bug related to rcu that was
> reported today. It first seems to have shown up on the 25th of July:
> 
> https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7
> 
> We seem to be hitting the WARN_ON_ONCE() in:
> 
> void rcu_sync_dtor(struct rcu_sync *rsp)
> {
>         int gp_state;
> 
>         WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
> 
> from destroy_super_work() which gets called when a superblock is really freed.
> 
> If the superblock has been visible in userspace we do it via call_rcu():
> 
> static void destroy_super_work(struct work_struct *work)
> {
>         struct super_block *s = container_of(work, struct super_block,
>                                                         destroy_work);
>         fsnotify_sb_free(s);
>         security_sb_free(s);
>         put_user_ns(s->s_user_ns);
>         kfree(s->s_subtype);
>         for (int i = 0; i < SB_FREEZE_LEVELS; i++)
>                 percpu_free_rwsem(&s->s_writers.rw_sem[i]);
>         kfree(s);
> }
> 
> static void destroy_super_rcu(struct rcu_head *head)
> {
>         struct super_block *s = container_of(head, struct super_block, rcu);
>         INIT_WORK(&s->destroy_work, destroy_super_work);
>         schedule_work(&s->destroy_work);
> }
> 
> And I'm really confused because I don't understand the details for sync
> rcu enough to come up with a clear problem statement even. Could someone
> please explain what the WARN_ON_ONCE() is about?

If I am not too confused (and Oleg will correct me if I am), this is
checking a use-after-free error.  A given rcu_sync structure normally
transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with
possible side-trips from the GP_EXIT state through GP_REPLAY and back
to GP_EXIT in special cases such as during early boot.

One way that this can happen is calling percpu_free_rwsem() without
having previously done the percpu_up_write() needed to match an earlier
percpu_down_write().  I don't see any other way that this can happen
right off-hand, but someone might have added something, or I might be
suffering a failure of imagination.

> (I had been wondering if this is related to commit 7180f8d91fcb ("vfs:
> add rcu-based find_inode variants for iget ops") and the fix in commit
> 5bc9ad78c2f8 ("vfs: handle __wait_on_freeing_inode() and evict() race")
> but I don't see how.)

I don't see how either, unless it introduced some timing change
that made the scenario above more probable.

> Thanks for taking a look!

Hope this helps!  If not, you know where to find me.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu_sync_dtor() warning question
  2024-07-26 18:02 ` rcu_sync_dtor() warning question Paul E. McKenney
@ 2024-07-26 18:39   ` Mateusz Guzik
  2024-07-26 18:43     ` Mateusz Guzik
  2024-07-29  8:26   ` Christian Brauner
  1 sibling, 1 reply; 9+ messages in thread
From: Mateusz Guzik @ 2024-07-26 18:39 UTC (permalink / raw)
  To: paulmck; +Cc: Christian Brauner, Oleg Nesterov, rcu

On Fri, Jul 26, 2024 at 8:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote:
> > Hey,
> >
> > I could use some help with understanding a bug related to rcu that was
> > reported today. It first seems to have shown up on the 25th of July:
> >
> > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7
> >
> > We seem to be hitting the WARN_ON_ONCE() in:
> >
> > void rcu_sync_dtor(struct rcu_sync *rsp)
> > {
> >         int gp_state;
> >
> >         WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
> >
> > from destroy_super_work() which gets called when a superblock is really freed.
> >
> > If the superblock has been visible in userspace we do it via call_rcu():
> >
> > static void destroy_super_work(struct work_struct *work)
> > {
> >         struct super_block *s = container_of(work, struct super_block,
> >                                                         destroy_work);
> >         fsnotify_sb_free(s);
> >         security_sb_free(s);
> >         put_user_ns(s->s_user_ns);
> >         kfree(s->s_subtype);
> >         for (int i = 0; i < SB_FREEZE_LEVELS; i++)
> >                 percpu_free_rwsem(&s->s_writers.rw_sem[i]);
> >         kfree(s);
> > }
> >
> > static void destroy_super_rcu(struct rcu_head *head)
> > {
> >         struct super_block *s = container_of(head, struct super_block, rcu);
> >         INIT_WORK(&s->destroy_work, destroy_super_work);
> >         schedule_work(&s->destroy_work);
> > }
> >
> > And I'm really confused because I don't understand the details for sync
> > rcu enough to come up with a clear problem statement even. Could someone
> > please explain what the WARN_ON_ONCE() is about?
>
> If I am not too confused (and Oleg will correct me if I am), this is
> checking a use-after-free error.  A given rcu_sync structure normally
> transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with
> possible side-trips from the GP_EXIT state through GP_REPLAY and back
> to GP_EXIT in special cases such as during early boot.
>

use-after-free? In that case I have a candidate for a culprit.

Code prior to any of my changes was doing the following in iget_locked():
        spin_lock(&inode_hash_lock);
        inode = find_inode_fast(sb, head, ino);
        spin_unlock(&inode_hash_lock);
        if (inode) {
                if (IS_ERR(inode))
                        return NULL;
                wait_on_inode(inode);
                if (unlikely(inode_unhashed(inode))) {
                        iput(inode);
                        goto again;
                }
                return inode;
        }

My patch removed the spinlock acquire and made it significantly more
likely for the code to end up doing the wait_on_inode + inode_unhashed
combo when racing against inode teardown.

Now that you bring up use-after-free I'm not particularly confident
the stock code is correct.

For example evict_inodes() -> evict() can mess with the inode and
result in the iput() call in iget_locked(), which then will invoke
evict() again. And I'm not particularly confident the routine +
everything it calls is idempotent.

That's from a quick poke around, maybe I missed something.

syzkaller claims to have a reproducer. Trivial usage in my debug vm
does not result in anything, so I may need to grab their entire setup
to reproduce.

I'm going to look into it.
-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu_sync_dtor() warning question
  2024-07-26 18:39   ` Mateusz Guzik
@ 2024-07-26 18:43     ` Mateusz Guzik
  2024-07-26 20:21       ` Paul E. McKenney
                         ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Mateusz Guzik @ 2024-07-26 18:43 UTC (permalink / raw)
  To: paulmck; +Cc: Christian Brauner, Oleg Nesterov, rcu

On Fri, Jul 26, 2024 at 8:39 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Fri, Jul 26, 2024 at 8:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote:
> > > Hey,
> > >
> > > I could use some help with understanding a bug related to rcu that was
> > > reported today. It first seems to have shown up on the 25th of July:
> > >
> > > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7
> > >
> > > We seem to be hitting the WARN_ON_ONCE() in:
> > >
> > > void rcu_sync_dtor(struct rcu_sync *rsp)
> > > {
> > >         int gp_state;
> > >
> > >         WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
> > >
> > > from destroy_super_work() which gets called when a superblock is really freed.
> > >
> > > If the superblock has been visible in userspace we do it via call_rcu():
> > >
> > > static void destroy_super_work(struct work_struct *work)
> > > {
> > >         struct super_block *s = container_of(work, struct super_block,
> > >                                                         destroy_work);
> > >         fsnotify_sb_free(s);
> > >         security_sb_free(s);
> > >         put_user_ns(s->s_user_ns);
> > >         kfree(s->s_subtype);
> > >         for (int i = 0; i < SB_FREEZE_LEVELS; i++)
> > >                 percpu_free_rwsem(&s->s_writers.rw_sem[i]);
> > >         kfree(s);
> > > }
> > >
> > > static void destroy_super_rcu(struct rcu_head *head)
> > > {
> > >         struct super_block *s = container_of(head, struct super_block, rcu);
> > >         INIT_WORK(&s->destroy_work, destroy_super_work);
> > >         schedule_work(&s->destroy_work);
> > > }
> > >
> > > And I'm really confused because I don't understand the details for sync
> > > rcu enough to come up with a clear problem statement even. Could someone
> > > please explain what the WARN_ON_ONCE() is about?
> >
> > If I am not too confused (and Oleg will correct me if I am), this is
> > checking a use-after-free error.  A given rcu_sync structure normally
> > transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with
> > possible side-trips from the GP_EXIT state through GP_REPLAY and back
> > to GP_EXIT in special cases such as during early boot.
> >
>
> use-after-free? In that case I have a candidate for a culprit.
>
> Code prior to any of my changes was doing the following in iget_locked():
>         spin_lock(&inode_hash_lock);
>         inode = find_inode_fast(sb, head, ino);
>         spin_unlock(&inode_hash_lock);
>         if (inode) {
>                 if (IS_ERR(inode))
>                         return NULL;
>                 wait_on_inode(inode);
>                 if (unlikely(inode_unhashed(inode))) {
>                         iput(inode);
>                         goto again;
>                 }
>                 return inode;
>         }
>
> My patch removed the spinlock acquire and made it significantly more
> likely for the code to end up doing the wait_on_inode + inode_unhashed
> combo when racing against inode teardown.
>
> Now that you bring up use-after-free I'm not particularly confident
> the stock code is correct.
>
> For example evict_inodes() -> evict() can mess with the inode and
> result in the iput() call in iget_locked(), which then will invoke
> evict() again. And I'm not particularly confident the routine +
> everything it calls is idempotent.
>
> That's from a quick poke around, maybe I missed something.
>
> syzkaller claims to have a reproducer. Trivial usage in my debug vm
> does not result in anything, so I may need to grab their entire setup
> to reproduce.
>
> I'm going to look into it.

Welp.

syzbot did the bisect, it's not any of the above, instead:

commit b62e71be2110d8b52bf5faf3c3ed7ca1a0c113a5
Author: Chao Yu <chao@kernel.org>
Date:   Sun Apr 23 15:49:15 2023 +0000

    f2fs: support errors=remount-ro|continue|panic mountoption

https://lore.kernel.org/linux-fsdevel/0000000000004ff2dc061e281637@google.com/T/#m90c03813e12e5cdff1eeada8f9ab581d5f039c76

That said, the stuff I mentioned still looks highly suspicious so I
have to something to investigate regardless.

I think this thread can be abandoned. :)
-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu_sync_dtor() warning question
  2024-07-26 18:43     ` Mateusz Guzik
@ 2024-07-26 20:21       ` Paul E. McKenney
  2024-07-28 13:41       ` Oleg Nesterov
  2024-07-29  8:28       ` Christian Brauner
  2 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2024-07-26 20:21 UTC (permalink / raw)
  To: Mateusz Guzik; +Cc: Christian Brauner, Oleg Nesterov, rcu

On Fri, Jul 26, 2024 at 08:43:01PM +0200, Mateusz Guzik wrote:
> On Fri, Jul 26, 2024 at 8:39 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > On Fri, Jul 26, 2024 at 8:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote:
> > > > Hey,
> > > >
> > > > I could use some help with understanding a bug related to rcu that was
> > > > reported today. It first seems to have shown up on the 25th of July:
> > > >
> > > > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7
> > > >
> > > > We seem to be hitting the WARN_ON_ONCE() in:
> > > >
> > > > void rcu_sync_dtor(struct rcu_sync *rsp)
> > > > {
> > > >         int gp_state;
> > > >
> > > >         WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
> > > >
> > > > from destroy_super_work() which gets called when a superblock is really freed.
> > > >
> > > > If the superblock has been visible in userspace we do it via call_rcu():
> > > >
> > > > static void destroy_super_work(struct work_struct *work)
> > > > {
> > > >         struct super_block *s = container_of(work, struct super_block,
> > > >                                                         destroy_work);
> > > >         fsnotify_sb_free(s);
> > > >         security_sb_free(s);
> > > >         put_user_ns(s->s_user_ns);
> > > >         kfree(s->s_subtype);
> > > >         for (int i = 0; i < SB_FREEZE_LEVELS; i++)
> > > >                 percpu_free_rwsem(&s->s_writers.rw_sem[i]);
> > > >         kfree(s);
> > > > }
> > > >
> > > > static void destroy_super_rcu(struct rcu_head *head)
> > > > {
> > > >         struct super_block *s = container_of(head, struct super_block, rcu);
> > > >         INIT_WORK(&s->destroy_work, destroy_super_work);
> > > >         schedule_work(&s->destroy_work);
> > > > }
> > > >
> > > > And I'm really confused because I don't understand the details for sync
> > > > rcu enough to come up with a clear problem statement even. Could someone
> > > > please explain what the WARN_ON_ONCE() is about?
> > >
> > > If I am not too confused (and Oleg will correct me if I am), this is
> > > checking a use-after-free error.  A given rcu_sync structure normally
> > > transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with
> > > possible side-trips from the GP_EXIT state through GP_REPLAY and back
> > > to GP_EXIT in special cases such as during early boot.
> > >
> >
> > use-after-free? In that case I have a candidate for a culprit.
> >
> > Code prior to any of my changes was doing the following in iget_locked():
> >         spin_lock(&inode_hash_lock);
> >         inode = find_inode_fast(sb, head, ino);
> >         spin_unlock(&inode_hash_lock);
> >         if (inode) {
> >                 if (IS_ERR(inode))
> >                         return NULL;
> >                 wait_on_inode(inode);
> >                 if (unlikely(inode_unhashed(inode))) {
> >                         iput(inode);
> >                         goto again;
> >                 }
> >                 return inode;
> >         }
> >
> > My patch removed the spinlock acquire and made it significantly more
> > likely for the code to end up doing the wait_on_inode + inode_unhashed
> > combo when racing against inode teardown.
> >
> > Now that you bring up use-after-free I'm not particularly confident
> > the stock code is correct.
> >
> > For example evict_inodes() -> evict() can mess with the inode and
> > result in the iput() call in iget_locked(), which then will invoke
> > evict() again. And I'm not particularly confident the routine +
> > everything it calls is idempotent.
> >
> > That's from a quick poke around, maybe I missed something.
> >
> > syzkaller claims to have a reproducer. Trivial usage in my debug vm
> > does not result in anything, so I may need to grab their entire setup
> > to reproduce.
> >
> > I'm going to look into it.
> 
> Welp.
> 
> syzbot did the bisect, it's not any of the above, instead:
> 
> commit b62e71be2110d8b52bf5faf3c3ed7ca1a0c113a5
> Author: Chao Yu <chao@kernel.org>
> Date:   Sun Apr 23 15:49:15 2023 +0000
> 
>     f2fs: support errors=remount-ro|continue|panic mountoption
> 
> https://lore.kernel.org/linux-fsdevel/0000000000004ff2dc061e281637@google.com/T/#m90c03813e12e5cdff1eeada8f9ab581d5f039c76
> 
> That said, the stuff I mentioned still looks highly suspicious so I
> have to something to investigate regardless.
> 
> I think this thread can be abandoned. :)

Thank you for digging into this one!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu_sync_dtor() warning question
  2024-07-26 18:43     ` Mateusz Guzik
  2024-07-26 20:21       ` Paul E. McKenney
@ 2024-07-28 13:41       ` Oleg Nesterov
  2024-07-29  8:25         ` Christian Brauner
  2024-07-29  8:28       ` Christian Brauner
  2 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2024-07-28 13:41 UTC (permalink / raw)
  To: Mateusz Guzik, Hillf Danton; +Cc: paulmck, Christian Brauner, rcu

Sorry for late reply.

You do not need my help, I know nothing about fs ;)  but just in case...

On 07/26, Mateusz Guzik wrote:
>
> Welp.
>
> syzbot did the bisect, it's not any of the above, instead:
>
> commit b62e71be2110d8b52bf5faf3c3ed7ca1a0c113a5
> Author: Chao Yu <chao@kernel.org>
> Date:   Sun Apr 23 15:49:15 2023 +0000
>
>     f2fs: support errors=remount-ro|continue|panic mountoption
>
> https://lore.kernel.org/linux-fsdevel/0000000000004ff2dc061e281637@google.com/T/#m90c03813e12e5cdff1eeada8f9ab581d5f039c76
>
> That said, the stuff I mentioned still looks highly suspicious so I
> have to something to investigate regardless.

Did you see the patch from Hillf ?
https://lore.kernel.org/all/20240727011616.2144-1-hdanton@sina.com/
it seems to fix the problem...

Of course I don't understand this patch, but afaics SB_RDONLY can confuse
thaw_super_locked(). If sb_rdonly() is true, thaw_super_locked() assumes
that freeze_super() didn't call sb_wait_write() -> percpu_down_write().

So in this case thaw_super_locked() just clears sb->s_writers.frozen and
goes to the "out_deactivate" label bypassing sb_freeze_unlock().

Oleg.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu_sync_dtor() warning question
  2024-07-28 13:41       ` Oleg Nesterov
@ 2024-07-29  8:25         ` Christian Brauner
  2024-07-29  9:02           ` Oleg Nesterov
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Brauner @ 2024-07-29  8:25 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Mateusz Guzik, Hillf Danton, paulmck, rcu

On Sun, Jul 28, 2024 at 03:41:01PM GMT, Oleg Nesterov wrote:
> Sorry for late reply.
> 
> You do not need my help, I know nothing about fs ;)  but just in case...
> 
> On 07/26, Mateusz Guzik wrote:
> >
> > Welp.
> >
> > syzbot did the bisect, it's not any of the above, instead:
> >
> > commit b62e71be2110d8b52bf5faf3c3ed7ca1a0c113a5
> > Author: Chao Yu <chao@kernel.org>
> > Date:   Sun Apr 23 15:49:15 2023 +0000
> >
> >     f2fs: support errors=remount-ro|continue|panic mountoption
> >
> > https://lore.kernel.org/linux-fsdevel/0000000000004ff2dc061e281637@google.com/T/#m90c03813e12e5cdff1eeada8f9ab581d5f039c76
> >
> > That said, the stuff I mentioned still looks highly suspicious so I
> > have to something to investigate regardless.
> 
> Did you see the patch from Hillf ?
> https://lore.kernel.org/all/20240727011616.2144-1-hdanton@sina.com/
> it seems to fix the problem...

I only skimmed the bisect while I was entertaining a toddler during a
trainride. :)

> 
> Of course I don't understand this patch, but afaics SB_RDONLY can confuse
> thaw_super_locked(). If sb_rdonly() is true, thaw_super_locked() assumes
> that freeze_super() didn't call sb_wait_write() -> percpu_down_write().
> 
> So in this case thaw_super_locked() just clears sb->s_writers.frozen and
> goes to the "out_deactivate" label bypassing sb_freeze_unlock().

So the thing is that remounting a superblock read-only directly without
going through the VFS first is pretty risky and is likely to cause
trouble. The order of operations to transition rw->ro:

(1) Check that the filsystem is unfrozen. If not, fail with EBUSY.
(2) Mark all mounts of that filesystem as read-only. Fail with EBUSY if
    there are any active writers on any of the mounts.
(3) Call into ->reconfigure() method of the filesystem so it can do it's
    internal thing to mount read-only.

All of that requires sb->s_umount semaphore to be held. But f2fs
seems to transition rw->ro without holding sb->s_umount in various
callsites of:

f2fs_stop_checkpoint()
-> f2fs_handle_critical_error()

So that would mean freeze_super() would've indeed acquired
percpu_down_write(SB_FREEZE_WRITE+SB_FREEZE_PAGEFAULT+SB_FREEZE_FS) but
thaw_super() would skip calling
sb_freeze_unlock(SB_FREEZE_FS+SB_FREEZE_PAGEFAULT+SB_FREEZE_WRITE) and
destroying the superblock during umount with an imbalance that gets
noticed in rcu_sync_dtor().

So afaict this would mean that we never called rcu_sync_exit() and thus
never set GP_EXIT and notice that imbalance during rcu_sync_dtor().

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu_sync_dtor() warning question
  2024-07-26 18:02 ` rcu_sync_dtor() warning question Paul E. McKenney
  2024-07-26 18:39   ` Mateusz Guzik
@ 2024-07-29  8:26   ` Christian Brauner
  1 sibling, 0 replies; 9+ messages in thread
From: Christian Brauner @ 2024-07-29  8:26 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Mateusz Guzik, Oleg Nesterov, rcu

On Fri, Jul 26, 2024 at 11:02:35AM GMT, Paul E. McKenney wrote:
> On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote:
> > Hey,
> > 
> > I could use some help with understanding a bug related to rcu that was
> > reported today. It first seems to have shown up on the 25th of July:
> > 
> > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7
> > 
> > We seem to be hitting the WARN_ON_ONCE() in:
> > 
> > void rcu_sync_dtor(struct rcu_sync *rsp)
> > {
> >         int gp_state;
> > 
> >         WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
> > 
> > from destroy_super_work() which gets called when a superblock is really freed.
> > 
> > If the superblock has been visible in userspace we do it via call_rcu():
> > 
> > static void destroy_super_work(struct work_struct *work)
> > {
> >         struct super_block *s = container_of(work, struct super_block,
> >                                                         destroy_work);
> >         fsnotify_sb_free(s);
> >         security_sb_free(s);
> >         put_user_ns(s->s_user_ns);
> >         kfree(s->s_subtype);
> >         for (int i = 0; i < SB_FREEZE_LEVELS; i++)
> >                 percpu_free_rwsem(&s->s_writers.rw_sem[i]);
> >         kfree(s);
> > }
> > 
> > static void destroy_super_rcu(struct rcu_head *head)
> > {
> >         struct super_block *s = container_of(head, struct super_block, rcu);
> >         INIT_WORK(&s->destroy_work, destroy_super_work);
> >         schedule_work(&s->destroy_work);
> > }
> > 
> > And I'm really confused because I don't understand the details for sync
> > rcu enough to come up with a clear problem statement even. Could someone
> > please explain what the WARN_ON_ONCE() is about?
> 
> If I am not too confused (and Oleg will correct me if I am), this is
> checking a use-after-free error.  A given rcu_sync structure normally
> transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with
> possible side-trips from the GP_EXIT state through GP_REPLAY and back
> to GP_EXIT in special cases such as during early boot.
> 
> One way that this can happen is calling percpu_free_rwsem() without
> having previously done the percpu_up_write() needed to match an earlier
> percpu_down_write().  I don't see any other way that this can happen
> right off-hand, but someone might have added something, or I might be
> suffering a failure of imagination.

And you were spot on of course!

> 
> > (I had been wondering if this is related to commit 7180f8d91fcb ("vfs:
> > add rcu-based find_inode variants for iget ops") and the fix in commit
> > 5bc9ad78c2f8 ("vfs: handle __wait_on_freeing_inode() and evict() race")
> > but I don't see how.)
> 
> I don't see how either, unless it introduced some timing change
> that made the scenario above more probable.
> 
> > Thanks for taking a look!
> 
> Hope this helps!  If not, you know where to find me.  ;-)

Thank you for taking the time, Paul. That was indeed helpful!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu_sync_dtor() warning question
  2024-07-26 18:43     ` Mateusz Guzik
  2024-07-26 20:21       ` Paul E. McKenney
  2024-07-28 13:41       ` Oleg Nesterov
@ 2024-07-29  8:28       ` Christian Brauner
  2 siblings, 0 replies; 9+ messages in thread
From: Christian Brauner @ 2024-07-29  8:28 UTC (permalink / raw)
  To: Mateusz Guzik; +Cc: paulmck, Oleg Nesterov, rcu

On Fri, Jul 26, 2024 at 08:43:01PM GMT, Mateusz Guzik wrote:
> On Fri, Jul 26, 2024 at 8:39 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > On Fri, Jul 26, 2024 at 8:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote:
> > > > Hey,
> > > >
> > > > I could use some help with understanding a bug related to rcu that was
> > > > reported today. It first seems to have shown up on the 25th of July:
> > > >
> > > > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7
> > > >
> > > > We seem to be hitting the WARN_ON_ONCE() in:
> > > >
> > > > void rcu_sync_dtor(struct rcu_sync *rsp)
> > > > {
> > > >         int gp_state;
> > > >
> > > >         WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
> > > >
> > > > from destroy_super_work() which gets called when a superblock is really freed.
> > > >
> > > > If the superblock has been visible in userspace we do it via call_rcu():
> > > >
> > > > static void destroy_super_work(struct work_struct *work)
> > > > {
> > > >         struct super_block *s = container_of(work, struct super_block,
> > > >                                                         destroy_work);
> > > >         fsnotify_sb_free(s);
> > > >         security_sb_free(s);
> > > >         put_user_ns(s->s_user_ns);
> > > >         kfree(s->s_subtype);
> > > >         for (int i = 0; i < SB_FREEZE_LEVELS; i++)
> > > >                 percpu_free_rwsem(&s->s_writers.rw_sem[i]);
> > > >         kfree(s);
> > > > }
> > > >
> > > > static void destroy_super_rcu(struct rcu_head *head)
> > > > {
> > > >         struct super_block *s = container_of(head, struct super_block, rcu);
> > > >         INIT_WORK(&s->destroy_work, destroy_super_work);
> > > >         schedule_work(&s->destroy_work);
> > > > }
> > > >
> > > > And I'm really confused because I don't understand the details for sync
> > > > rcu enough to come up with a clear problem statement even. Could someone
> > > > please explain what the WARN_ON_ONCE() is about?
> > >
> > > If I am not too confused (and Oleg will correct me if I am), this is
> > > checking a use-after-free error.  A given rcu_sync structure normally
> > > transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with
> > > possible side-trips from the GP_EXIT state through GP_REPLAY and back
> > > to GP_EXIT in special cases such as during early boot.
> > >
> >
> > use-after-free? In that case I have a candidate for a culprit.
> >
> > Code prior to any of my changes was doing the following in iget_locked():
> >         spin_lock(&inode_hash_lock);
> >         inode = find_inode_fast(sb, head, ino);
> >         spin_unlock(&inode_hash_lock);
> >         if (inode) {
> >                 if (IS_ERR(inode))
> >                         return NULL;
> >                 wait_on_inode(inode);
> >                 if (unlikely(inode_unhashed(inode))) {
> >                         iput(inode);
> >                         goto again;
> >                 }
> >                 return inode;
> >         }
> >
> > My patch removed the spinlock acquire and made it significantly more
> > likely for the code to end up doing the wait_on_inode + inode_unhashed
> > combo when racing against inode teardown.
> >
> > Now that you bring up use-after-free I'm not particularly confident
> > the stock code is correct.
> >
> > For example evict_inodes() -> evict() can mess with the inode and
> > result in the iput() call in iget_locked(), which then will invoke
> > evict() again. And I'm not particularly confident the routine +
> > everything it calls is idempotent.
> >
> > That's from a quick poke around, maybe I missed something.
> >
> > syzkaller claims to have a reproducer. Trivial usage in my debug vm
> > does not result in anything, so I may need to grab their entire setup
> > to reproduce.
> >
> > I'm going to look into it.
> 
> Welp.
> 
> syzbot did the bisect, it's not any of the above, instead:
> 
> commit b62e71be2110d8b52bf5faf3c3ed7ca1a0c113a5
> Author: Chao Yu <chao@kernel.org>
> Date:   Sun Apr 23 15:49:15 2023 +0000
> 
>     f2fs: support errors=remount-ro|continue|panic mountoption
> 
> https://lore.kernel.org/linux-fsdevel/0000000000004ff2dc061e281637@google.com/T/#m90c03813e12e5cdff1eeada8f9ab581d5f039c76
> 
> That said, the stuff I mentioned still looks highly suspicious so I
> have to something to investigate regardless.

Thanks for looking into this! 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu_sync_dtor() warning question
  2024-07-29  8:25         ` Christian Brauner
@ 2024-07-29  9:02           ` Oleg Nesterov
  0 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2024-07-29  9:02 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Mateusz Guzik, Hillf Danton, paulmck, rcu

On 07/29, Christian Brauner wrote:
>
> On Sun, Jul 28, 2024 at 03:41:01PM GMT, Oleg Nesterov wrote:
> >
> > So in this case thaw_super_locked() just clears sb->s_writers.frozen and
> > goes to the "out_deactivate" label bypassing sb_freeze_unlock().

...

> So that would mean freeze_super() would've indeed acquired
> percpu_down_write(SB_FREEZE_WRITE+SB_FREEZE_PAGEFAULT+SB_FREEZE_FS) but
> thaw_super() would skip calling
> sb_freeze_unlock(SB_FREEZE_FS+SB_FREEZE_PAGEFAULT+SB_FREEZE_WRITE) and
> destroying the superblock during umount with an imbalance that gets
> noticed in rcu_sync_dtor().
>
> So afaict this would mean that we never called rcu_sync_exit() and thus
> never set GP_EXIT and notice that imbalance during rcu_sync_dtor().

Yes, yes, this is what I meant.

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-07-29  9:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240726-ansonsten-piste-c4f04d4909fc@brauner>
2024-07-26 18:02 ` rcu_sync_dtor() warning question Paul E. McKenney
2024-07-26 18:39   ` Mateusz Guzik
2024-07-26 18:43     ` Mateusz Guzik
2024-07-26 20:21       ` Paul E. McKenney
2024-07-28 13:41       ` Oleg Nesterov
2024-07-29  8:25         ` Christian Brauner
2024-07-29  9:02           ` Oleg Nesterov
2024-07-29  8:28       ` Christian Brauner
2024-07-29  8:26   ` Christian Brauner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.