public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs: remove the inode_need_compress() call in
@ 2020-08-04  7:15 Qu Wenruo
  2020-08-04  7:15 ` [PATCH 1/2] btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups() Qu Wenruo
  2020-08-04  7:15 ` [PATCH 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent() Qu Wenruo
  0 siblings, 2 replies; 3+ messages in thread
From: Qu Wenruo @ 2020-08-04  7:15 UTC (permalink / raw)
  To: linux-btrfs

This is an attempt to remove the inode_need_compress() call in
compress_file_extent().

As that compress_file_extent() can race with inode ioctl or bad
compression ratio, to cause NULL pointer dereferecen for @pages, it's
nature to try to remove that inode_need_compress() to remove the race
completely.

However that's not that easy, we have the following problems:

- We still need to check @pages anyway
  That @pages check is for kcalloc() failure, so what we really get is
  just removing one indent from the if (inode_need_compress()).
  Everything else is still the same (in fact, even worse, see below
  problems)

- Behavior change
  Before that change, every async_chunk does their check on
  INODE_NO_COMPRESS flags.
  If we hit any bad compression ratio, all incoming async_chunk will
  fall back to plain text write.

  But if we remove that inode_need_compress() check, then we still try
  to compress, and lead to potentially wasted CPU times.

- Still race between compression disable and NULL pointer dereferecen
  There is a hidden race, mostly exposed by btrfs/071 test case, that we
  have "compress_type = fs_info->compress_type", so we can still hit case
  where that compress_type is NONE (caused by remount -o nocompress), and
  then btrfs_compress_pages() will return -E2BIG, without modifying
  @nr_pages

  Then later when we cleanup @pages, we try to access pages[i]->mapping,
  triggering NULL pointer dereference.

  This will be address in the first patch though.

Qu Wenruo (2):
  btrfs: sysfs: fix NULL pointer dereference at
    btrfs_sysfs_del_qgroups()
  btrfs: inode: don't re-evaluate inode_need_compress() in
    compress_file_extent()

 fs/btrfs/compression.c |  10 ++--
 fs/btrfs/inode.c       | 106 ++++++++++++++++++++---------------------
 2 files changed, 59 insertions(+), 57 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups()
  2020-08-04  7:15 [PATCH 0/2] btrfs: remove the inode_need_compress() call in Qu Wenruo
@ 2020-08-04  7:15 ` Qu Wenruo
  2020-08-04  7:15 ` [PATCH 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent() Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2020-08-04  7:15 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
With next-20200731 tag (079ad2fb4bf9eba8a0aaab014b49705cd7f07c66),
unmounting a btrfs with quota disabled will cause the following NULL
pointer dereference:

  BTRFS info (device dm-5): has skinny extents
  BUG: kernel NULL pointer dereference, address: 0000000000000018
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  CPU: 7 PID: 637 Comm: umount Not tainted 5.8.0-rc7-next-20200731-custom #76
  RIP: 0010:kobject_del+0x6/0x20
  Call Trace:
   btrfs_sysfs_del_qgroups+0xac/0xf0 [btrfs]
   btrfs_free_qgroup_config+0x63/0x70 [btrfs]
   close_ctree+0x1f5/0x323 [btrfs]
   btrfs_put_super+0x15/0x17 [btrfs]
   generic_shutdown_super+0x72/0x110
   kill_anon_super+0x18/0x30
   btrfs_kill_super+0x17/0x30 [btrfs]
   deactivate_locked_super+0x3b/0xa0
   deactivate_super+0x40/0x50
   cleanup_mnt+0x135/0x190
   __cleanup_mnt+0x12/0x20
   task_work_run+0x64/0xb0
   exit_to_user_mode_prepare+0x18a/0x190
   syscall_exit_to_user_mode+0x4f/0x270
   do_syscall_64+0x45/0x50
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ---[ end trace 37b7adca5c1d5c5d ]---

[CAUSE]
Commit 079ad2fb4bf9 ("kobject: Avoid premature parent object freeing in
kobject_cleanup()") changed kobject_del() that it no longer accepts NULL
pointer.

Before that commit, kobject_del() and kobject_put() all accept NULL
pointers and just ignore such NULL pointers.

But that mentioned commit needs to access the parent node, killing the
old NULL pointer behavior.

Unfortunately btrfs is relying on that hidden feature thus we will
trigger such NULL pointer dereference.

[FIX]
Instead of just saving several lines, do proper fs_info->qgroups_kobj
check before calling kobject_del() and kobject_put().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 1ab56a734e70..17c27edd804b 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -115,10 +115,14 @@ static int compression_compress_pages(int type, struct list_head *ws,
 	case BTRFS_COMPRESS_NONE:
 	default:
 		/*
-		 * This can't happen, the type is validated several times
-		 * before we get here. As a sane fallback, return what the
-		 * callers will understand as 'no compression happened'.
+		 * This happens when compression races with remount to no
+		 * compress, while caller doesn't call inode_need_compress()
+		 * to check if we really need to compress.
+		 *
+		 * Not a big deal, just need to inform caller that we
+		 * haven't allocated any pages yet.
 		 */
+		*out_pages = 0;
 		return -E2BIG;
 	}
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent()
  2020-08-04  7:15 [PATCH 0/2] btrfs: remove the inode_need_compress() call in Qu Wenruo
  2020-08-04  7:15 ` [PATCH 1/2] btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups() Qu Wenruo
@ 2020-08-04  7:15 ` Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2020-08-04  7:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

The extra inode_need_compress() has already caused problems for pages
releasing.
We had hot fix to solve that problem, now it's time to fix it from the
root.

This patch will remove the extra inode_need_compress() to address the
problem.

This would lead to the following behavior change:
- Worse bad compression ratio detection
  Now if we had one async_chunk hitting bad compression ratio, other
  async_chunk will still try to compress.

  Only newer delalloc range will follow the new INODE_NO_COMPRESS flag
  then.
  Although one could argue that, if only part of the file content has
  bad compression, we should still try on other ranges.

Despite the behavior change, the code cleanup isn't that elegant either,
as kcalloc() can still fail for @pages, thus from cont: tag, we still
need to check @pages manually, thus the cleanup doesn't bring much
benefit, just one indent removal and comments reformatting.

Suggested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 106 +++++++++++++++++++++++------------------------
 1 file changed, 52 insertions(+), 54 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 96064eb41d55..37d9cff0b1b8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -549,67 +549,65 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 	ret = 0;
 
 	/*
-	 * we do compression for mount -o compress and when the
-	 * inode has not been flagged as nocompress.  This flag can
-	 * change at any time if we discover bad compression ratios.
+	 * We're in compress_file_range() because run_delalloc_range() has
+	 * already evaluated inode_need_compress().
+	 * So don't re-check it again to avoid race between ioctl.
+	 * This behavior would make bad compression ratio detection less
+	 * effective, as we will only skip compression until next
+	 * run_delalloc_range().
 	 */
-	if (inode_need_compress(BTRFS_I(inode), start, end)) {
-		WARN_ON(pages);
-		pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
-		if (!pages) {
-			/* just bail out to the uncompressed code */
-			nr_pages = 0;
-			goto cont;
-		}
+	WARN_ON(pages);
+	pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
+	if (!pages) {
+		/* just bail out to the uncompressed code */
+		nr_pages = 0;
+		goto cont;
+	}
 
-		if (BTRFS_I(inode)->defrag_compress)
-			compress_type = BTRFS_I(inode)->defrag_compress;
-		else if (BTRFS_I(inode)->prop_compress)
-			compress_type = BTRFS_I(inode)->prop_compress;
+	if (BTRFS_I(inode)->defrag_compress)
+		compress_type = BTRFS_I(inode)->defrag_compress;
+	else if (BTRFS_I(inode)->prop_compress)
+		compress_type = BTRFS_I(inode)->prop_compress;
 
-		/*
-		 * we need to call clear_page_dirty_for_io on each
-		 * page in the range.  Otherwise applications with the file
-		 * mmap'd can wander in and change the page contents while
-		 * we are compressing them.
-		 *
-		 * If the compression fails for any reason, we set the pages
-		 * dirty again later on.
-		 *
-		 * Note that the remaining part is redirtied, the start pointer
-		 * has moved, the end is the original one.
-		 */
-		if (!redirty) {
-			extent_range_clear_dirty_for_io(inode, start, end);
-			redirty = 1;
-		}
+	/*
+	 * We need to call clear_page_dirty_for_io on each page in the range.
+	 * Otherwise applications with the file mmap'd can wander in and
+	 * change the page contents while we are compressing them.
+	 *
+	 * If the compression fails for any reason, we set the pages dirty
+	 * again later on.
+	 *
+	 * Note that the remaining part is redirtied, the start pointer has
+	 * moved, the end is the original one.
+	 */
+	if (!redirty) {
+		extent_range_clear_dirty_for_io(inode, start, end);
+		redirty = 1;
+	}
 
-		/* Compression level is applied here and only here */
-		ret = btrfs_compress_pages(
-			compress_type | (fs_info->compress_level << 4),
-					   inode->i_mapping, start,
-					   pages,
-					   &nr_pages,
-					   &total_in,
-					   &total_compressed);
+	/* Compression level is applied here and only here */
+	ret = btrfs_compress_pages(
+		compress_type | (fs_info->compress_level << 4),
+				   inode->i_mapping, start, pages, &nr_pages,
+				   &total_in, &total_compressed);
 
-		if (!ret) {
-			unsigned long offset = offset_in_page(total_compressed);
-			struct page *page = pages[nr_pages - 1];
-			char *kaddr;
+	if (!ret) {
+		unsigned long offset = offset_in_page(total_compressed);
+		struct page *page = pages[nr_pages - 1];
+		char *kaddr;
 
-			/* zero the tail end of the last page, we might be
-			 * sending it down to disk
-			 */
-			if (offset) {
-				kaddr = kmap_atomic(page);
-				memset(kaddr + offset, 0,
-				       PAGE_SIZE - offset);
-				kunmap_atomic(kaddr);
-			}
-			will_compress = 1;
+		/* zero the tail end of the last page, we might be
+		 * sending it down to disk
+		 */
+		if (offset) {
+			kaddr = kmap_atomic(page);
+			memset(kaddr + offset, 0,
+			       PAGE_SIZE - offset);
+			kunmap_atomic(kaddr);
 		}
+		will_compress = 1;
 	}
+
 cont:
 	if (start == 0) {
 		/* lets try to make an inline extent */
@@ -656,7 +654,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 			/*
 			 * Ensure we only free the compressed pages if we have
 			 * them allocated, as we can still reach here with
-			 * inode_need_compress() == false.
+			 * previous kcalloc() failure.
 			 */
 			if (pages) {
 				for (i = 0; i < nr_pages; i++) {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-04  7:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-04  7:15 [PATCH 0/2] btrfs: remove the inode_need_compress() call in Qu Wenruo
2020-08-04  7:15 ` [PATCH 1/2] btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups() Qu Wenruo
2020-08-04  7:15 ` [PATCH 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent() Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox