All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Daeho Jeong <daehojeong@google.com>,
	Jaegeuk Kim <jaegeuk@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.10 47/63] f2fs: fix race of pending_pages in decompression
Date: Mon,  4 Jan 2021 16:57:40 +0100	[thread overview]
Message-ID: <20210104155711.097185632@linuxfoundation.org> (raw)
In-Reply-To: <20210104155708.800470590@linuxfoundation.org>

From: Daeho Jeong <daehojeong@google.com>

[ Upstream commit 6422a71ef40e4751d59b8c9412e7e2dafe085878 ]

I found out f2fs_free_dic() is invoked in a wrong timing, but
f2fs_verify_bio() still needed the dic info and it triggered the
below kernel panic. It has been caused by the race condition of
pending_pages value between decompression and verity logic, when
the same compression cluster had been split in different bios.
By split bios, f2fs_verify_bio() ended up with decreasing
pending_pages value before it is reset to nr_cpages by
f2fs_decompress_pages() and caused the kernel panic.

[ 4416.564763] Unable to handle kernel NULL pointer dereference
               at virtual address 0000000000000000
...
[ 4416.896016] Workqueue: fsverity_read_queue f2fs_verity_work
[ 4416.908515] pc : fsverity_verify_page+0x20/0x78
[ 4416.913721] lr : f2fs_verify_bio+0x11c/0x29c
[ 4416.913722] sp : ffffffc019533cd0
[ 4416.913723] x29: ffffffc019533cd0 x28: 0000000000000402
[ 4416.913724] x27: 0000000000000001 x26: 0000000000000100
[ 4416.913726] x25: 0000000000000001 x24: 0000000000000004
[ 4416.913727] x23: 0000000000001000 x22: 0000000000000000
[ 4416.913728] x21: 0000000000000000 x20: ffffffff2076f9c0
[ 4416.913729] x19: ffffffff2076f9c0 x18: ffffff8a32380c30
[ 4416.913731] x17: ffffffc01f966d97 x16: 0000000000000298
[ 4416.913732] x15: 0000000000000000 x14: 0000000000000000
[ 4416.913733] x13: f074faec89ffffff x12: 0000000000000000
[ 4416.913734] x11: 0000000000001000 x10: 0000000000001000
[ 4416.929176] x9 : ffffffff20d1f5c7 x8 : 0000000000000000
[ 4416.929178] x7 : 626d7464ff286b6b x6 : ffffffc019533ade
[ 4416.929179] x5 : 000000008049000e x4 : ffffffff2793e9e0
[ 4416.929180] x3 : 000000008049000e x2 : ffffff89ecfa74d0
[ 4416.929181] x1 : 0000000000000c40 x0 : ffffffff2076f9c0
[ 4416.929184] Call trace:
[ 4416.929187]  fsverity_verify_page+0x20/0x78
[ 4416.929189]  f2fs_verify_bio+0x11c/0x29c
[ 4416.929192]  f2fs_verity_work+0x58/0x84
[ 4417.050667]  process_one_work+0x270/0x47c
[ 4417.055354]  worker_thread+0x27c/0x4d8
[ 4417.059784]  kthread+0x13c/0x320
[ 4417.063693]  ret_from_fork+0x10/0x18

Chao pointed this can happen by the below race condition.

Thread A        f2fs_post_read_wq          fsverity_wq
- f2fs_read_multi_pages()
  - f2fs_alloc_dic
   - dic->pending_pages = 2
   - submit_bio()
   - submit_bio()
               - f2fs_post_read_work() handle first bio
                - f2fs_decompress_work()
                 - __read_end_io()
                  - f2fs_decompress_pages()
                   - dic->pending_pages--
                - enqueue f2fs_verity_work()
                                           - f2fs_verity_work() handle first bio
                                            - f2fs_verify_bio()
                                             - dic->pending_pages--
               - f2fs_post_read_work() handle second bio
                - f2fs_decompress_work()
                - enqueue f2fs_verity_work()
                                            - f2fs_verify_pages()
                                            - f2fs_free_dic()

                                          - f2fs_verity_work() handle second bio
                                           - f2fs_verfy_bio()
                                                 - use-after-free on dic

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/f2fs/compress.c |  2 --
 fs/f2fs/data.c     | 58 +++++++++++++++++++++++++++++++++++++---------
 fs/f2fs/f2fs.h     |  1 +
 3 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index 14262e0f1cd60..c5fee4d7ea72f 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -798,8 +798,6 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity)
 	if (cops->destroy_decompress_ctx)
 		cops->destroy_decompress_ctx(dic);
 out_free_dic:
-	if (verity)
-		atomic_set(&dic->pending_pages, dic->nr_cpages);
 	if (!verity)
 		f2fs_decompress_end_io(dic->rpages, dic->cluster_size,
 								ret, false);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index be4da52604edc..b29243ee1c3e5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -202,7 +202,7 @@ static void f2fs_verify_bio(struct bio *bio)
 		dic = (struct decompress_io_ctx *)page_private(page);
 
 		if (dic) {
-			if (atomic_dec_return(&dic->pending_pages))
+			if (atomic_dec_return(&dic->verity_pages))
 				continue;
 			f2fs_verify_pages(dic->rpages,
 						dic->cluster_size);
@@ -1027,7 +1027,8 @@ static inline bool f2fs_need_verity(const struct inode *inode, pgoff_t idx)
 
 static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
 				      unsigned nr_pages, unsigned op_flag,
-				      pgoff_t first_idx, bool for_write)
+				      pgoff_t first_idx, bool for_write,
+				      bool for_verity)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	struct bio *bio;
@@ -1049,7 +1050,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
 		post_read_steps |= 1 << STEP_DECRYPT;
 	if (f2fs_compressed_file(inode))
 		post_read_steps |= 1 << STEP_DECOMPRESS_NOWQ;
-	if (f2fs_need_verity(inode, first_idx))
+	if (for_verity && f2fs_need_verity(inode, first_idx))
 		post_read_steps |= 1 << STEP_VERITY;
 
 	if (post_read_steps) {
@@ -1079,7 +1080,7 @@ static int f2fs_submit_page_read(struct inode *inode, struct page *page,
 	struct bio *bio;
 
 	bio = f2fs_grab_read_bio(inode, blkaddr, 1, op_flags,
-					page->index, for_write);
+					page->index, for_write, true);
 	if (IS_ERR(bio))
 		return PTR_ERR(bio);
 
@@ -2133,7 +2134,7 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page,
 	if (bio == NULL) {
 		bio = f2fs_grab_read_bio(inode, block_nr, nr_pages,
 				is_readahead ? REQ_RAHEAD : 0, page->index,
-				false);
+				false, true);
 		if (IS_ERR(bio)) {
 			ret = PTR_ERR(bio);
 			bio = NULL;
@@ -2180,6 +2181,8 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
 	const unsigned blkbits = inode->i_blkbits;
 	const unsigned blocksize = 1 << blkbits;
 	struct decompress_io_ctx *dic = NULL;
+	struct bio_post_read_ctx *ctx;
+	bool for_verity = false;
 	int i;
 	int ret = 0;
 
@@ -2245,10 +2248,29 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
 		goto out_put_dnode;
 	}
 
+	/*
+	 * It's possible to enable fsverity on the fly when handling a cluster,
+	 * which requires complicated error handling. Instead of adding more
+	 * complexity, let's give a rule where end_io post-processes fsverity
+	 * per cluster. In order to do that, we need to submit bio, if previous
+	 * bio sets a different post-process policy.
+	 */
+	if (fsverity_active(cc->inode)) {
+		atomic_set(&dic->verity_pages, cc->nr_cpages);
+		for_verity = true;
+
+		if (bio) {
+			ctx = bio->bi_private;
+			if (!(ctx->enabled_steps & (1 << STEP_VERITY))) {
+				__submit_bio(sbi, bio, DATA);
+				bio = NULL;
+			}
+		}
+	}
+
 	for (i = 0; i < dic->nr_cpages; i++) {
 		struct page *page = dic->cpages[i];
 		block_t blkaddr;
-		struct bio_post_read_ctx *ctx;
 
 		blkaddr = data_blkaddr(dn.inode, dn.node_page,
 						dn.ofs_in_node + i + 1);
@@ -2264,17 +2286,31 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
 		if (!bio) {
 			bio = f2fs_grab_read_bio(inode, blkaddr, nr_pages,
 					is_readahead ? REQ_RAHEAD : 0,
-					page->index, for_write);
+					page->index, for_write, for_verity);
 			if (IS_ERR(bio)) {
+				unsigned int remained = dic->nr_cpages - i;
+				bool release = false;
+
 				ret = PTR_ERR(bio);
 				dic->failed = true;
-				if (!atomic_sub_return(dic->nr_cpages - i,
-							&dic->pending_pages)) {
+
+				if (for_verity) {
+					if (!atomic_sub_return(remained,
+						&dic->verity_pages))
+						release = true;
+				} else {
+					if (!atomic_sub_return(remained,
+						&dic->pending_pages))
+						release = true;
+				}
+
+				if (release) {
 					f2fs_decompress_end_io(dic->rpages,
-							cc->cluster_size, true,
-							false);
+						cc->cluster_size, true,
+						false);
 					f2fs_free_dic(dic);
 				}
+
 				f2fs_put_dnode(&dn);
 				*bio_ret = NULL;
 				return ret;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e4344d98a780c..06e5a6053f3f9 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1410,6 +1410,7 @@ struct decompress_io_ctx {
 	size_t rlen;			/* valid data length in rbuf */
 	size_t clen;			/* valid data length in cbuf */
 	atomic_t pending_pages;		/* in-flight compressed page count */
+	atomic_t verity_pages;		/* in-flight page count for verity */
 	bool failed;			/* indicate IO error during decompression */
 	void *private;			/* payload buffer for specified decompression algorithm */
 	void *private2;			/* extra payload buffer */
-- 
2.27.0




  parent reply	other threads:[~2021-01-04 16:05 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-04 15:56 [PATCH 5.10 00/63] 5.10.5-rc1 review Greg Kroah-Hartman
2021-01-04 15:56 ` [PATCH 5.10 01/63] net/sched: sch_taprio: reset child qdiscs before freeing them Greg Kroah-Hartman
2021-01-04 22:58   ` Sasha Levin
2021-01-04 23:06     ` Jakub Kicinski
2021-01-04 15:56 ` [PATCH 5.10 02/63] mptcp: fix security context on server socket Greg Kroah-Hartman
2021-01-04 15:56 ` [PATCH 5.10 03/63] ethtool: fix error paths in ethnl_set_channels() Greg Kroah-Hartman
2021-01-04 15:56 ` [PATCH 5.10 04/63] ethtool: fix string set id check Greg Kroah-Hartman
2021-01-04 15:56 ` [PATCH 5.10 05/63] md/raid10: initialize r10_bio->read_slot before use Greg Kroah-Hartman
2021-01-04 15:56 ` [PATCH 5.10 06/63] drm/amd/display: Add get_dig_frontend implementation for DCEx Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 07/63] io_uring: close a small race gap for files cancel Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 08/63] jffs2: Allow setting rp_size to zero during remounting Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 09/63] jffs2: Fix NULL pointer dereference in rp_size fs option parsing Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 10/63] spi: dw-bt1: Fix undefined devm_mux_control_get symbol Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 11/63] opp: fix memory leak in _allocate_opp_table Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 12/63] opp: Call the missing clk_put() on error Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 13/63] scsi: block: Fix a race in the runtime power management code Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 14/63] mm/hugetlb: fix deadlock in hugetlb_cow error path Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 15/63] mm: memmap defer init doesnt work as expected Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 16/63] lib/zlib: fix inflating zlib streams on s390 Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 17/63] io_uring: dont assume mm is constant across submits Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 18/63] io_uring: use bottom half safe lock for fixed file data Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 19/63] io_uring: add a helper for setting a ref node Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 20/63] io_uring: fix io_sqe_files_unregister() hangs Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 21/63] kernel/io_uring: cancel io_uring before task works Greg Kroah-Hartman
2021-01-04 16:06   ` Pavel Begunkov
2021-01-04 17:43     ` Sasha Levin
2021-01-04 15:57 ` [PATCH 5.10 22/63] uapi: move constants from <linux/kernel.h> to <linux/const.h> Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 23/63] tools headers UAPI: Sync linux/const.h with the kernel headers Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 24/63] cgroup: Fix memory leak when parsing multiple source parameters Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 25/63] zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 26/63] scsi: cxgb4i: Fix TLS dependency Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 27/63] Bluetooth: hci_h5: close serdev device and free hu in h5_close Greg Kroah-Hartman
     [not found] ` <20210104155708.800470590-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
2021-01-04 15:57   ` [PATCH 5.10 28/63] fbcon: Disable accelerated scrolling Greg Kroah-Hartman
2021-01-04 15:57     ` Greg Kroah-Hartman
     [not found]     ` <20210104155710.187945647-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
2021-01-07  8:13       ` Geert Uytterhoeven
2021-01-07  8:13         ` Geert Uytterhoeven
2021-01-04 15:57 ` [PATCH 5.10 29/63] reiserfs: add check for an invalid ih_entry_count Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 30/63] misc: vmw_vmci: fix kernel info-leak by initializing dbells in vmci_ctx_get_chkpt_doorbells() Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 31/63] media: gp8psk: initialize stats at power control logic Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 32/63] f2fs: fix shift-out-of-bounds in sanity_check_raw_super() Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 33/63] ALSA: seq: Use bool for snd_seq_queue internal flags Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 34/63] ALSA: rawmidi: Access runtime->avail always in spinlock Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 35/63] bfs: dont use WARNING: string when its just info Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 36/63] ext4: check for invalid block size early when mounting a file system Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 37/63] fcntl: Fix potential deadlock in send_sig{io, urg}() Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 38/63] io_uring: check kthread stopped flag when sq thread is unparked Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 39/63] rtc: sun6i: Fix memleak in sun6i_rtc_clk_init Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 40/63] module: set MODULE_STATE_GOING state when a module fails to load Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 41/63] quota: Dont overflow quota file offsets Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 42/63] rtc: pl031: fix resource leak in pl031_probe Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 43/63] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe() Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 44/63] i3c master: fix missing destroy_workqueue() on error in i3c_master_register Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 45/63] NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 46/63] f2fs: avoid race condition for shrinker count Greg Kroah-Hartman
2021-01-04 15:57 ` Greg Kroah-Hartman [this message]
2021-01-04 15:57 ` [PATCH 5.10 48/63] module: delay kobject uevent until after module init call Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 49/63] powerpc/64: irq replay remove decrementer overflow check Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 50/63] fs/namespace.c: WARN if mnt_count has become negative Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 51/63] watchdog: rti-wdt: fix reference leak in rti_wdt_probe Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 52/63] um: random: Register random as hwrng-core device Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 53/63] um: ubd: Submit all data segments atomically Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 54/63] NFSv4.2: Dont error when exiting early on a READ_PLUS buffer overflow Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 55/63] ceph: fix inode refcount leak when ceph_fill_inode on non-I_NEW inode fails Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 56/63] drm/amd/display: updated wm table for Renoir Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 57/63] tick/sched: Remove bogus boot "safety" check Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 58/63] s390: always clear kernel stack backchain before calling functions Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 59/63] io_uring: remove racy overflow list fast checks Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 60/63] ALSA: pcm: Clear the full allocated memory at hw_params Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 61/63] dm verity: skip verity work if I/O error when system is shutting down Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 62/63] ext4: avoid s_mb_prefetch to be zero in individual scenarios Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.10 63/63] device-dax: Fix range release Greg Kroah-Hartman
2021-01-04 21:09 ` [PATCH 5.10 00/63] 5.10.5-rc1 review Jon Hunter
2021-01-05  6:06 ` Daniel Díaz
2021-01-05 12:55 ` Jeffrin Jose T
2021-01-05 13:05   ` Greg Kroah-Hartman
2021-01-06 18:43     ` Jeffrin Jose T
2021-01-05 16:38 ` Shuah Khan
2021-01-05 18:17 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210104155711.097185632@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=daehojeong@google.com \
    --cc=jaegeuk@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.