* [PATCH 0/2] btrfs: remove the inode_need_compress() call in
@ 2020-08-04 7:15 Qu Wenruo
2020-08-04 7:15 ` [PATCH 1/2] btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups() Qu Wenruo
2020-08-04 7:15 ` [PATCH 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent() Qu Wenruo
0 siblings, 2 replies; 3+ messages in thread
From: Qu Wenruo @ 2020-08-04 7:15 UTC (permalink / raw)
To: linux-btrfs
This is an attempt to remove the inode_need_compress() call in
compress_file_extent().
As that compress_file_extent() can race with inode ioctl or bad
compression ratio, to cause NULL pointer dereferecen for @pages, it's
nature to try to remove that inode_need_compress() to remove the race
completely.
However that's not that easy, we have the following problems:
- We still need to check @pages anyway
That @pages check is for kcalloc() failure, so what we really get is
just removing one indent from the if (inode_need_compress()).
Everything else is still the same (in fact, even worse, see below
problems)
- Behavior change
Before that change, every async_chunk does their check on
INODE_NO_COMPRESS flags.
If we hit any bad compression ratio, all incoming async_chunk will
fall back to plain text write.
But if we remove that inode_need_compress() check, then we still try
to compress, and lead to potentially wasted CPU times.
- Still race between compression disable and NULL pointer dereferecen
There is a hidden race, mostly exposed by btrfs/071 test case, that we
have "compress_type = fs_info->compress_type", so we can still hit case
where that compress_type is NONE (caused by remount -o nocompress), and
then btrfs_compress_pages() will return -E2BIG, without modifying
@nr_pages
Then later when we cleanup @pages, we try to access pages[i]->mapping,
triggering NULL pointer dereference.
This will be address in the first patch though.
Qu Wenruo (2):
btrfs: sysfs: fix NULL pointer dereference at
btrfs_sysfs_del_qgroups()
btrfs: inode: don't re-evaluate inode_need_compress() in
compress_file_extent()
fs/btrfs/compression.c | 10 ++--
fs/btrfs/inode.c | 106 ++++++++++++++++++++---------------------
2 files changed, 59 insertions(+), 57 deletions(-)
--
2.28.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/2] btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups()
2020-08-04 7:15 [PATCH 0/2] btrfs: remove the inode_need_compress() call in Qu Wenruo
@ 2020-08-04 7:15 ` Qu Wenruo
2020-08-04 7:15 ` [PATCH 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent() Qu Wenruo
1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2020-08-04 7:15 UTC (permalink / raw)
To: linux-btrfs
[BUG]
With next-20200731 tag (079ad2fb4bf9eba8a0aaab014b49705cd7f07c66),
unmounting a btrfs with quota disabled will cause the following NULL
pointer dereference:
BTRFS info (device dm-5): has skinny extents
BUG: kernel NULL pointer dereference, address: 0000000000000018
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
CPU: 7 PID: 637 Comm: umount Not tainted 5.8.0-rc7-next-20200731-custom #76
RIP: 0010:kobject_del+0x6/0x20
Call Trace:
btrfs_sysfs_del_qgroups+0xac/0xf0 [btrfs]
btrfs_free_qgroup_config+0x63/0x70 [btrfs]
close_ctree+0x1f5/0x323 [btrfs]
btrfs_put_super+0x15/0x17 [btrfs]
generic_shutdown_super+0x72/0x110
kill_anon_super+0x18/0x30
btrfs_kill_super+0x17/0x30 [btrfs]
deactivate_locked_super+0x3b/0xa0
deactivate_super+0x40/0x50
cleanup_mnt+0x135/0x190
__cleanup_mnt+0x12/0x20
task_work_run+0x64/0xb0
exit_to_user_mode_prepare+0x18a/0x190
syscall_exit_to_user_mode+0x4f/0x270
do_syscall_64+0x45/0x50
entry_SYSCALL_64_after_hwframe+0x44/0xa9
---[ end trace 37b7adca5c1d5c5d ]---
[CAUSE]
Commit 079ad2fb4bf9 ("kobject: Avoid premature parent object freeing in
kobject_cleanup()") changed kobject_del() that it no longer accepts NULL
pointer.
Before that commit, kobject_del() and kobject_put() all accept NULL
pointers and just ignore such NULL pointers.
But that mentioned commit needs to access the parent node, killing the
old NULL pointer behavior.
Unfortunately btrfs is relying on that hidden feature thus we will
trigger such NULL pointer dereference.
[FIX]
Instead of just saving several lines, do proper fs_info->qgroups_kobj
check before calling kobject_del() and kobject_put().
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/compression.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 1ab56a734e70..17c27edd804b 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -115,10 +115,14 @@ static int compression_compress_pages(int type, struct list_head *ws,
case BTRFS_COMPRESS_NONE:
default:
/*
- * This can't happen, the type is validated several times
- * before we get here. As a sane fallback, return what the
- * callers will understand as 'no compression happened'.
+ * This happens when compression races with remount to no
+ * compress, while caller doesn't call inode_need_compress()
+ * to check if we really need to compress.
+ *
+ * Not a big deal, just need to inform caller that we
+ * haven't allocated any pages yet.
*/
+ *out_pages = 0;
return -E2BIG;
}
}
--
2.28.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent()
2020-08-04 7:15 [PATCH 0/2] btrfs: remove the inode_need_compress() call in Qu Wenruo
2020-08-04 7:15 ` [PATCH 1/2] btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups() Qu Wenruo
@ 2020-08-04 7:15 ` Qu Wenruo
1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2020-08-04 7:15 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
The extra inode_need_compress() has already caused problems for pages
releasing.
We had hot fix to solve that problem, now it's time to fix it from the
root.
This patch will remove the extra inode_need_compress() to address the
problem.
This would lead to the following behavior change:
- Worse bad compression ratio detection
Now if we had one async_chunk hitting bad compression ratio, other
async_chunk will still try to compress.
Only newer delalloc range will follow the new INODE_NO_COMPRESS flag
then.
Although one could argue that, if only part of the file content has
bad compression, we should still try on other ranges.
Despite the behavior change, the code cleanup isn't that elegant either,
as kcalloc() can still fail for @pages, thus from cont: tag, we still
need to check @pages manually, thus the cleanup doesn't bring much
benefit, just one indent removal and comments reformatting.
Suggested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/inode.c | 106 +++++++++++++++++++++++------------------------
1 file changed, 52 insertions(+), 54 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 96064eb41d55..37d9cff0b1b8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -549,67 +549,65 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
ret = 0;
/*
- * we do compression for mount -o compress and when the
- * inode has not been flagged as nocompress. This flag can
- * change at any time if we discover bad compression ratios.
+ * We're in compress_file_range() because run_delalloc_range() has
+ * already evaluated inode_need_compress().
+ * So don't re-check it again to avoid race between ioctl.
+ * This behavior would make bad compression ratio detection less
+ * effective, as we will only skip compression until next
+ * run_delalloc_range().
*/
- if (inode_need_compress(BTRFS_I(inode), start, end)) {
- WARN_ON(pages);
- pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
- if (!pages) {
- /* just bail out to the uncompressed code */
- nr_pages = 0;
- goto cont;
- }
+ WARN_ON(pages);
+ pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
+ if (!pages) {
+ /* just bail out to the uncompressed code */
+ nr_pages = 0;
+ goto cont;
+ }
- if (BTRFS_I(inode)->defrag_compress)
- compress_type = BTRFS_I(inode)->defrag_compress;
- else if (BTRFS_I(inode)->prop_compress)
- compress_type = BTRFS_I(inode)->prop_compress;
+ if (BTRFS_I(inode)->defrag_compress)
+ compress_type = BTRFS_I(inode)->defrag_compress;
+ else if (BTRFS_I(inode)->prop_compress)
+ compress_type = BTRFS_I(inode)->prop_compress;
- /*
- * we need to call clear_page_dirty_for_io on each
- * page in the range. Otherwise applications with the file
- * mmap'd can wander in and change the page contents while
- * we are compressing them.
- *
- * If the compression fails for any reason, we set the pages
- * dirty again later on.
- *
- * Note that the remaining part is redirtied, the start pointer
- * has moved, the end is the original one.
- */
- if (!redirty) {
- extent_range_clear_dirty_for_io(inode, start, end);
- redirty = 1;
- }
+ /*
+ * We need to call clear_page_dirty_for_io on each page in the range.
+ * Otherwise applications with the file mmap'd can wander in and
+ * change the page contents while we are compressing them.
+ *
+ * If the compression fails for any reason, we set the pages dirty
+ * again later on.
+ *
+ * Note that the remaining part is redirtied, the start pointer has
+ * moved, the end is the original one.
+ */
+ if (!redirty) {
+ extent_range_clear_dirty_for_io(inode, start, end);
+ redirty = 1;
+ }
- /* Compression level is applied here and only here */
- ret = btrfs_compress_pages(
- compress_type | (fs_info->compress_level << 4),
- inode->i_mapping, start,
- pages,
- &nr_pages,
- &total_in,
- &total_compressed);
+ /* Compression level is applied here and only here */
+ ret = btrfs_compress_pages(
+ compress_type | (fs_info->compress_level << 4),
+ inode->i_mapping, start, pages, &nr_pages,
+ &total_in, &total_compressed);
- if (!ret) {
- unsigned long offset = offset_in_page(total_compressed);
- struct page *page = pages[nr_pages - 1];
- char *kaddr;
+ if (!ret) {
+ unsigned long offset = offset_in_page(total_compressed);
+ struct page *page = pages[nr_pages - 1];
+ char *kaddr;
- /* zero the tail end of the last page, we might be
- * sending it down to disk
- */
- if (offset) {
- kaddr = kmap_atomic(page);
- memset(kaddr + offset, 0,
- PAGE_SIZE - offset);
- kunmap_atomic(kaddr);
- }
- will_compress = 1;
+ /* zero the tail end of the last page, we might be
+ * sending it down to disk
+ */
+ if (offset) {
+ kaddr = kmap_atomic(page);
+ memset(kaddr + offset, 0,
+ PAGE_SIZE - offset);
+ kunmap_atomic(kaddr);
}
+ will_compress = 1;
}
+
cont:
if (start == 0) {
/* lets try to make an inline extent */
@@ -656,7 +654,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
/*
* Ensure we only free the compressed pages if we have
* them allocated, as we can still reach here with
- * inode_need_compress() == false.
+ * previous kcalloc() failure.
*/
if (pages) {
for (i = 0; i < nr_pages; i++) {
--
2.28.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-08-04 7:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-04 7:15 [PATCH 0/2] btrfs: remove the inode_need_compress() call in Qu Wenruo
2020-08-04 7:15 ` [PATCH 1/2] btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups() Qu Wenruo
2020-08-04 7:15 ` [PATCH 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent() Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox