From: Long Li <leo.lilong@huaweicloud.com>
To: Dave Chinner <david@fromorbit.com>, Long Li <leo.lilong@huawei.com>
Cc: djwong@kernel.org, linux-xfs@vger.kernel.org,
yi.zhang@huawei.com, houtao1@huawei.com, yangerkun@huawei.com
Subject: Re: [PATCH v2] xfs: fix a UAF when inode item push
Date: Wed, 26 Jul 2023 15:32:19 +0800 [thread overview]
Message-ID: <20230726073219.GA1050117@ceph-admin> (raw)
In-Reply-To: <ZMCgmBDM6vjVuyLV@dread.disaster.area>
On Wed, Jul 26, 2023 at 02:27:04PM +1000, Dave Chinner wrote:
> On Sat, Jul 22, 2023 at 10:57:21AM +0800, Long Li wrote:
> > KASAN reported a UAF bug while fault injection test:
> >
> > ==================================================================
> > BUG: KASAN: use-after-free in xfs_inode_item_push+0x2db/0x2f0
> > Read of size 8 at addr ffff888022f74788 by task xfsaild/sda/479
> >
> > CPU: 0 PID: 479 Comm: xfsaild/sda Not tainted 6.2.0-rc7-00003-ga8a43e2eb5f6 #89
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0x51/0x6a
> > print_report+0x171/0x4a6
> > kasan_report+0xb7/0x130
> > xfs_inode_item_push+0x2db/0x2f0
> > xfsaild+0x729/0x1f70
> > kthread+0x290/0x340
> > ret_from_fork+0x1f/0x30
> > </TASK>
> >
> > Allocated by task 494:
> > kasan_save_stack+0x22/0x40
> > kasan_set_track+0x25/0x30
> > __kasan_slab_alloc+0x58/0x70
> > kmem_cache_alloc+0x197/0x5d0
> > xfs_inode_item_init+0x62/0x170
> > xfs_trans_ijoin+0x15e/0x240
> > xfs_init_new_inode+0x573/0x1820
> > xfs_create+0x6a1/0x1020
> > xfs_generic_create+0x544/0x5d0
> > vfs_mkdir+0x5d0/0x980
> > do_mkdirat+0x14e/0x220
> > __x64_sys_mkdir+0x6a/0x80
> > do_syscall_64+0x39/0x80
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > Freed by task 14:
> > kasan_save_stack+0x22/0x40
> > kasan_set_track+0x25/0x30
> > kasan_save_free_info+0x2e/0x40
> > __kasan_slab_free+0x114/0x1b0
> > kmem_cache_free+0xee/0x4e0
> > xfs_inode_free_callback+0x187/0x2a0
> > rcu_do_batch+0x317/0xce0
> > rcu_core+0x686/0xa90
> > __do_softirq+0x1b6/0x626
> >
> > The buggy address belongs to the object at ffff888022f74758
> > which belongs to the cache xfs_ili of size 200
> > The buggy address is located 48 bytes inside of
> > 200-byte region [ffff888022f74758, ffff888022f74820)
> >
> > The buggy address belongs to the physical page:
> > page:ffffea00008bdd00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22f74
> > head:ffffea00008bdd00 order:1 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0
> > flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
> > raw: 001fffff80010200 ffff888010ed4040 ffffea00008b2510 ffffea00008bde10
> > raw: 0000000000000000 00000000001a001a 00000001ffffffff 0000000000000000
> > page dumped because: kasan: bad access detected
> >
> > Memory state around the buggy address:
> > ffff888022f74680: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
> > ffff888022f74700: fc fc fc fc fc fc fc fc fc fc fc fa fb fb fb fb
> > >ffff888022f74780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ^
> > ffff888022f74800: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
> > ffff888022f74880: fc fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > ==================================================================
> >
> > When push inode item in xfsaild, it will race with reclaim inodes task.
> > Consider the following call graph, both tasks deal with the same inode.
> > During flushing the cluster, it will enter xfs_iflush_abort() in shutdown
> > conditions, inode's XFS_IFLUSHING flag will be cleared and lip->li_buf set
> > to null. Concurrently, inode will be reclaimed in shutdown conditions,
> > there is no need to wait xfs buf lock because of lip->li_buf is null at
> > this time, inode will be freed via rcu callback if xfsaild task schedule
> > out during flushing the cluster. so, it is unsafe to reference lip after
> > flushing the cluster in xfs_inode_item_push().
> >
> > <log item is in AIL>
> > <filesystem shutdown>
> > spin_lock(&ailp->ail_lock)
> > xfs_inode_item_push(lip)
> > xfs_buf_trylock(bp)
> > spin_unlock(&lip->li_ailp->ail_lock)
> > xfs_iflush_cluster(bp)
> > if (xfs_is_shutdown())
> > xfs_iflush_abort(ip)
> > xfs_trans_ail_delete(ip)
> > spin_lock(&ailp->ail_lock)
> > spin_unlock(&ailp->ail_lock)
> > xfs_iflush_abort_clean(ip)
> > error = -EIO
> > <log item removed from AIL>
> > <log item li_buf set to null>
> > if (error)
> > xfs_force_shutdown()
> > xlog_shutdown_wait(mp->m_log)
> > might_sleep()
> > xfs_reclaim_inode(ip)
> > if (shutdown)
> > xfs_iflush_shutdown_abort(ip)
> > if (!bp)
> > xfs_iflush_abort(ip)
> > return
> > __xfs_inode_free(ip)
> > call_rcu(ip, xfs_inode_free_callback)
> > ......
> > <rcu grace period expires>
> > <rcu free callbacks run somewhere>
> > xfs_inode_free_callback(ip)
> > kmem_cache_free(ip->i_itemp)
> > ......
> > <starts running again>
> > xfs_buf_ioend_fail(bp);
> > xfs_buf_ioend(bp)
> > xfs_buf_relse(bp);
> > return error
> > spin_lock(&lip->li_ailp->ail_lock)
> > <UAF on log item>
>
> Yup. It's not safe to reference the inode log item here...
>
> > Fix the race condition by add XFS_ILOCK_SHARED lock for inode in
> > xfs_inode_item_push(). The XFS_ILOCK_EXCL lock is held when the inode is
> > reclaimed, so this prevents the uaf from occurring.
>
> Having reclaim come in and free the inode after we've already
> aborted and removed from the buffer isn't a problem. The inode
> flushing code is designed to handle that safely.
>
> The problem is that xfs_inode_item_push() tries to use the inode
> item after the failure has occurred and we've already aborted the
> inode item and finished it. i.e. the problem is this line:
>
> spin_lock(&lip->li_ailp->ail_lock);
>
> because it is using the log item that has been freed to get the
> ailp. We can safely store the alip at the start of the function
> whilst we still hold the ail_lock.
>
Hi Dave,
That's how I solved it in v1[1], but I found that it doesn't completely
solve the problem, because it's still possible to reference the freed
lip in xfsaild_push(). Unless we don't refer to lip in tracepoint after
xfsaild_push_item().
xfsaild_push()
xfsaild_push_item()
lip->li_ops->iop_push()
xfs_inode_item_push(lip)
xfs_iflush_cluster(bp)
......
xfs_reclaim_inode(ip)
......
__xfs_inode_free(ip)
call_rcu(ip, xfs_inode_free_callback)
......
<rcu grace period expires>
<rcu free callbacks run somewhere>
xfs_inode_free_callback(ip)
kmem_cache_free(ip->i_itemp)
......
trace_xfs_ail_xxx(lip) //uaf
[1] https://patchwork.kernel.org/project/xfs/patch/20230211022941.GA1515023@ceph-admin/
Thanks
Long Li
> i.e.
>
> struct xfs_inode_log_item *iip = INODE_ITEM(lip);
> + struct xfs_ail *ailp = lip->li_ailp;
> struct xfs_inode *ip = iip->ili_inode;
> struct xfs_buf *bp = lip->li_buf;
> uint rval = XFS_ITEM_SUCCESS;
> ....
>
> - spin_unlock(&lip->li_ailp->ail_lock);
> + spin_unlock(&ailp->ail_lock);
>
> .....
> } else {
> /*
> * Release the buffer if we were unable to flush anything. On
> - * any other error, the buffer has already been released.
> + * any other error, the buffer has already been released and it
> + * now unsafe to reference the inode item as a flush abort may
> + * have removed it from the AIL and reclaim freed the inode.
> */
> if (error == -EAGAIN)
> xfs_buf_relse(bp);
> rval = XFS_ITEM_LOCKED;
> }
>
> - spin_lock(&lip->li_ailp->ail_lock);
> + /* unsafe to reference anything log item related from here on. */
> + spin_lock(&ailp->ail_lock);
> return rval;
>
> Cheers,
>
> Dave.
>
> --
> Dave Chinner
> david@fromorbit.com
next prev parent reply other threads:[~2023-07-26 7:37 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-22 2:57 [PATCH v2] xfs: fix a UAF when inode item push Long Li
2023-07-26 4:27 ` Dave Chinner
2023-07-26 7:32 ` Long Li [this message]
2023-07-27 1:25 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230726073219.GA1050117@ceph-admin \
--to=leo.lilong@huaweicloud.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=houtao1@huawei.com \
--cc=leo.lilong@huawei.com \
--cc=linux-xfs@vger.kernel.org \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox