public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] dm verity: fix FEC stuck during lower dm suspend
@ 2023-11-22  3:51 Wu Bo
  2023-11-22  3:51 ` [PATCH v2 1/2] dm verity: init fec io before cleaning it Wu Bo
  2023-11-22  3:51 ` [PATCH v2 2/2] dm verity: don't verity if readahead failed Wu Bo
  0 siblings, 2 replies; 6+ messages in thread
From: Wu Bo @ 2023-11-22  3:51 UTC (permalink / raw)
  To: Alasdair Kergon, Mike Snitzer, Mikulas Patocka
  Cc: dm-devel, linux-kernel, Wu Bo, Eric Biggers, Wu Bo

Hi,

We found an issue under Android OTA scenario that many readahead BIOs have
to do FEC and caused system stuck. These 2 patches try to fix this issue.

Thanks

Changes from v1:
1. Modify patch1 as Eric suggested
2. Add "Fixes" and "Cc stable" tags

Wu Bo (2):
  dm verity: init fec io before cleaning it
  dm verity: don't verity if readahead failed

 drivers/md/dm-verity-target.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] dm verity: init fec io before cleaning it
  2023-11-22  3:51 [PATCH v2 0/2] dm verity: fix FEC stuck during lower dm suspend Wu Bo
@ 2023-11-22  3:51 ` Wu Bo
  2023-11-29 17:20   ` Mikulas Patocka
  2023-11-22  3:51 ` [PATCH v2 2/2] dm verity: don't verity if readahead failed Wu Bo
  1 sibling, 1 reply; 6+ messages in thread
From: Wu Bo @ 2023-11-22  3:51 UTC (permalink / raw)
  To: Alasdair Kergon, Mike Snitzer, Mikulas Patocka
  Cc: dm-devel, linux-kernel, Wu Bo, Eric Biggers, Wu Bo, stable

If BIO error, it may goto verity_finish_io() before
verity_fec_init_io(). Therefor, the fec_io->rs is not initialized and
may crash when doing memory freeing in verity_fec_finish_io().

Crash call stack:
 die+0x90/0x2b8
 __do_kernel_fault+0x260/0x298
 do_bad_area+0x2c/0xdc
 do_translation_fault+0x3c/0x54
 do_mem_abort+0x54/0x118
 el1_abort+0x38/0x5c
 el1h_64_sync_handler+0x50/0x90
 el1h_64_sync+0x64/0x6c
 free_rs+0x18/0xac
 fec_rs_free+0x10/0x24
 mempool_free+0x58/0x148
 verity_fec_finish_io+0x4c/0xb0
 verity_end_io+0xb8/0x150

Cc: stable@vger.kernel.org      # v6.0+
Fixes: 5721d4e5a9cd ("dm verity: Add optional "try_verify_in_tasklet" feature")
Signed-off-by: Wu Bo <bo.wu@vivo.com>
---
 drivers/md/dm-verity-target.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index e115fcfe723c..beec14b6b044 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -642,7 +642,6 @@ static void verity_work(struct work_struct *w)
 
 	io->in_tasklet = false;
 
-	verity_fec_init_io(io);
 	verity_finish_io(io, errno_to_blk_status(verity_verify_io(io)));
 }
 
@@ -792,6 +791,8 @@ static int verity_map(struct dm_target *ti, struct bio *bio)
 	bio->bi_private = io;
 	io->iter = bio->bi_iter;
 
+	verity_fec_init_io(io);
+
 	verity_submit_prefetch(v, io);
 
 	submit_bio_noacct(bio);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] dm verity: don't verity if readahead failed
  2023-11-22  3:51 [PATCH v2 0/2] dm verity: fix FEC stuck during lower dm suspend Wu Bo
  2023-11-22  3:51 ` [PATCH v2 1/2] dm verity: init fec io before cleaning it Wu Bo
@ 2023-11-22  3:51 ` Wu Bo
  2023-11-28 12:40   ` Wu Bo
  2023-11-29 17:26   ` Mikulas Patocka
  1 sibling, 2 replies; 6+ messages in thread
From: Wu Bo @ 2023-11-22  3:51 UTC (permalink / raw)
  To: Alasdair Kergon, Mike Snitzer, Mikulas Patocka
  Cc: dm-devel, linux-kernel, Wu Bo, Eric Biggers, Wu Bo, stable

We found an issue under Android OTA scenario that many BIOs have to do
FEC where the data under dm-verity is 100% complete and no corruption.

Android OTA has many dm-block layers, from upper to lower:
dm-verity
dm-snapshot
dm-origin & dm-cow
dm-linear
ufs

Dm tables have to change 2 times during Android OTA merging process.
When doing table change, the dm-snapshot will be suspended for a while.
During this interval, we found there are many readahead IOs are
submitted to dm_verity from filesystem. Then the kverity works are busy
doing FEC process which cost too much time to finish dm-verity IO. And
cause system stuck.

We add some debug log and find that each readahead IO need around 10s to
finish when this situation occurred. Because here has a IO
amplification:

dm-snapshot suspend
erofs_readahead     // 300+ io is submitted
	dm_submit_bio (dm_verity)
		dm_submit_bio (dm_snapshot)
		bio return EIO
		bio got nothing, it's empty
	verity_end_io
	verity_verify_io
	forloop range(0, io->n_blocks)    // each io->nblocks ~= 20
		verity_fec_decode
		fec_decode_rsb
		fec_read_bufs
		forloop range(0, v->fec->rsn) // v->fec->rsn = 253
			new_read
			submit_bio (dm_snapshot)
		end loop
	end loop
dm-snapshot resume

Readahead BIO got nothing during dm-snapshot suspended. So all of them
will do FEC.
Each readahead BIO need to do io->n_blocks ~= 20 times verify.
Each block need to do fec, and every block need to do v->fec->rsn = 253
times read.
So during the suspend interval(~200ms), 300 readahead BIO make
300*20*253 IOs on dm-snapshot.

As readahead IO is not required by user space, and to fix this issue,
I think it would be better to pass it to upper layer to handle it.

Cc: stable@vger.kernel.org
Fixes: a739ff3f543a ("dm verity: add support for forward error correction")
Signed-off-by: Wu Bo <bo.wu@vivo.com>
---
 drivers/md/dm-verity-target.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index beec14b6b044..14e58ae70521 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -667,7 +667,9 @@ static void verity_end_io(struct bio *bio)
 	struct dm_verity_io *io = bio->bi_private;
 
 	if (bio->bi_status &&
-	    (!verity_fec_is_enabled(io->v) || verity_is_system_shutting_down())) {
+	    (!verity_fec_is_enabled(io->v) ||
+	     verity_is_system_shutting_down() ||
+	     (bio->bi_opf & REQ_RAHEAD))) {
 		verity_finish_io(io, bio->bi_status);
 		return;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] dm verity: don't verity if readahead failed
  2023-11-22  3:51 ` [PATCH v2 2/2] dm verity: don't verity if readahead failed Wu Bo
@ 2023-11-28 12:40   ` Wu Bo
  2023-11-29 17:26   ` Mikulas Patocka
  1 sibling, 0 replies; 6+ messages in thread
From: Wu Bo @ 2023-11-28 12:40 UTC (permalink / raw)
  To: Wu Bo, Alasdair Kergon, Mike Snitzer, Mikulas Patocka
  Cc: dm-devel, linux-kernel, Eric Biggers, stable

ping.

On 2023/11/22 11:51, Wu Bo wrote:
> We found an issue under Android OTA scenario that many BIOs have to do
> FEC where the data under dm-verity is 100% complete and no corruption.
>
> Android OTA has many dm-block layers, from upper to lower:
> dm-verity
> dm-snapshot
> dm-origin & dm-cow
> dm-linear
> ufs
>
> Dm tables have to change 2 times during Android OTA merging process.
> When doing table change, the dm-snapshot will be suspended for a while.
> During this interval, we found there are many readahead IOs are
> submitted to dm_verity from filesystem. Then the kverity works are busy
> doing FEC process which cost too much time to finish dm-verity IO. And
> cause system stuck.
>
> We add some debug log and find that each readahead IO need around 10s to
> finish when this situation occurred. Because here has a IO
> amplification:
>
> dm-snapshot suspend
> erofs_readahead     // 300+ io is submitted
> 	dm_submit_bio (dm_verity)
> 		dm_submit_bio (dm_snapshot)
> 		bio return EIO
> 		bio got nothing, it's empty
> 	verity_end_io
> 	verity_verify_io
> 	forloop range(0, io->n_blocks)    // each io->nblocks ~= 20
> 		verity_fec_decode
> 		fec_decode_rsb
> 		fec_read_bufs
> 		forloop range(0, v->fec->rsn) // v->fec->rsn = 253
> 			new_read
> 			submit_bio (dm_snapshot)
> 		end loop
> 	end loop
> dm-snapshot resume
>
> Readahead BIO got nothing during dm-snapshot suspended. So all of them
> will do FEC.
> Each readahead BIO need to do io->n_blocks ~= 20 times verify.
> Each block need to do fec, and every block need to do v->fec->rsn = 253
> times read.
> So during the suspend interval(~200ms), 300 readahead BIO make
> 300*20*253 IOs on dm-snapshot.
>
> As readahead IO is not required by user space, and to fix this issue,
> I think it would be better to pass it to upper layer to handle it.
>
> Cc: stable@vger.kernel.org
> Fixes: a739ff3f543a ("dm verity: add support for forward error correction")
> Signed-off-by: Wu Bo <bo.wu@vivo.com>
> ---
>   drivers/md/dm-verity-target.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
> index beec14b6b044..14e58ae70521 100644
> --- a/drivers/md/dm-verity-target.c
> +++ b/drivers/md/dm-verity-target.c
> @@ -667,7 +667,9 @@ static void verity_end_io(struct bio *bio)
>   	struct dm_verity_io *io = bio->bi_private;
>   
>   	if (bio->bi_status &&
> -	    (!verity_fec_is_enabled(io->v) || verity_is_system_shutting_down())) {
> +	    (!verity_fec_is_enabled(io->v) ||
> +	     verity_is_system_shutting_down() ||
> +	     (bio->bi_opf & REQ_RAHEAD))) {
>   		verity_finish_io(io, bio->bi_status);
>   		return;
>   	}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] dm verity: init fec io before cleaning it
  2023-11-22  3:51 ` [PATCH v2 1/2] dm verity: init fec io before cleaning it Wu Bo
@ 2023-11-29 17:20   ` Mikulas Patocka
  0 siblings, 0 replies; 6+ messages in thread
From: Mikulas Patocka @ 2023-11-29 17:20 UTC (permalink / raw)
  To: Wu Bo
  Cc: Alasdair Kergon, Mike Snitzer, dm-devel, linux-kernel, Wu Bo,
	Eric Biggers, stable



On Tue, 21 Nov 2023, Wu Bo wrote:

> If BIO error, it may goto verity_finish_io() before
> verity_fec_init_io(). Therefor, the fec_io->rs is not initialized and
> may crash when doing memory freeing in verity_fec_finish_io().
> 
> Crash call stack:
>  die+0x90/0x2b8
>  __do_kernel_fault+0x260/0x298
>  do_bad_area+0x2c/0xdc
>  do_translation_fault+0x3c/0x54
>  do_mem_abort+0x54/0x118
>  el1_abort+0x38/0x5c
>  el1h_64_sync_handler+0x50/0x90
>  el1h_64_sync+0x64/0x6c
>  free_rs+0x18/0xac
>  fec_rs_free+0x10/0x24
>  mempool_free+0x58/0x148
>  verity_fec_finish_io+0x4c/0xb0
>  verity_end_io+0xb8/0x150
> 
> Cc: stable@vger.kernel.org      # v6.0+
> Fixes: 5721d4e5a9cd ("dm verity: Add optional "try_verify_in_tasklet" feature")
> Signed-off-by: Wu Bo <bo.wu@vivo.com>
> ---
>  drivers/md/dm-verity-target.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
> index e115fcfe723c..beec14b6b044 100644
> --- a/drivers/md/dm-verity-target.c
> +++ b/drivers/md/dm-verity-target.c
> @@ -642,7 +642,6 @@ static void verity_work(struct work_struct *w)
>  
>  	io->in_tasklet = false;
>  
> -	verity_fec_init_io(io);
>  	verity_finish_io(io, errno_to_blk_status(verity_verify_io(io)));
>  }
>  
> @@ -792,6 +791,8 @@ static int verity_map(struct dm_target *ti, struct bio *bio)
>  	bio->bi_private = io;
>  	io->iter = bio->bi_iter;
>  
> +	verity_fec_init_io(io);
> +
>  	verity_submit_prefetch(v, io);
>  
>  	submit_bio_noacct(bio);
> -- 
> 2.25.1

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] dm verity: don't verity if readahead failed
  2023-11-22  3:51 ` [PATCH v2 2/2] dm verity: don't verity if readahead failed Wu Bo
  2023-11-28 12:40   ` Wu Bo
@ 2023-11-29 17:26   ` Mikulas Patocka
  1 sibling, 0 replies; 6+ messages in thread
From: Mikulas Patocka @ 2023-11-29 17:26 UTC (permalink / raw)
  To: Wu Bo
  Cc: Alasdair Kergon, Mike Snitzer, dm-devel, linux-kernel, Wu Bo,
	Eric Biggers, stable



On Tue, 21 Nov 2023, Wu Bo wrote:

> We found an issue under Android OTA scenario that many BIOs have to do
> FEC where the data under dm-verity is 100% complete and no corruption.
> 
> Android OTA has many dm-block layers, from upper to lower:
> dm-verity
> dm-snapshot
> dm-origin & dm-cow
> dm-linear
> ufs
> 
> Dm tables have to change 2 times during Android OTA merging process.
> When doing table change, the dm-snapshot will be suspended for a while.
> During this interval, we found there are many readahead IOs are
> submitted to dm_verity from filesystem. Then the kverity works are busy
> doing FEC process which cost too much time to finish dm-verity IO. And
> cause system stuck.
> 
> We add some debug log and find that each readahead IO need around 10s to
> finish when this situation occurred. Because here has a IO
> amplification:
> 
> dm-snapshot suspend
> erofs_readahead     // 300+ io is submitted
> 	dm_submit_bio (dm_verity)
> 		dm_submit_bio (dm_snapshot)
> 		bio return EIO
> 		bio got nothing, it's empty
> 	verity_end_io
> 	verity_verify_io
> 	forloop range(0, io->n_blocks)    // each io->nblocks ~= 20
> 		verity_fec_decode
> 		fec_decode_rsb
> 		fec_read_bufs
> 		forloop range(0, v->fec->rsn) // v->fec->rsn = 253
> 			new_read
> 			submit_bio (dm_snapshot)
> 		end loop
> 	end loop
> dm-snapshot resume
> 
> Readahead BIO got nothing during dm-snapshot suspended. So all of them
> will do FEC.
> Each readahead BIO need to do io->n_blocks ~= 20 times verify.
> Each block need to do fec, and every block need to do v->fec->rsn = 253
> times read.
> So during the suspend interval(~200ms), 300 readahead BIO make
> 300*20*253 IOs on dm-snapshot.
> 
> As readahead IO is not required by user space, and to fix this issue,
> I think it would be better to pass it to upper layer to handle it.
> 
> Cc: stable@vger.kernel.org
> Fixes: a739ff3f543a ("dm verity: add support for forward error correction")
> Signed-off-by: Wu Bo <bo.wu@vivo.com>
> ---
>  drivers/md/dm-verity-target.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
> index beec14b6b044..14e58ae70521 100644
> --- a/drivers/md/dm-verity-target.c
> +++ b/drivers/md/dm-verity-target.c
> @@ -667,7 +667,9 @@ static void verity_end_io(struct bio *bio)
>  	struct dm_verity_io *io = bio->bi_private;
>  
>  	if (bio->bi_status &&
> -	    (!verity_fec_is_enabled(io->v) || verity_is_system_shutting_down())) {
> +	    (!verity_fec_is_enabled(io->v) ||
> +	     verity_is_system_shutting_down() ||
> +	     (bio->bi_opf & REQ_RAHEAD))) {
>  		verity_finish_io(io, bio->bi_status);
>  		return;
>  	}
> -- 
> 2.25.1
> 

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-11-29 17:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-22  3:51 [PATCH v2 0/2] dm verity: fix FEC stuck during lower dm suspend Wu Bo
2023-11-22  3:51 ` [PATCH v2 1/2] dm verity: init fec io before cleaning it Wu Bo
2023-11-29 17:20   ` Mikulas Patocka
2023-11-22  3:51 ` [PATCH v2 2/2] dm verity: don't verity if readahead failed Wu Bo
2023-11-28 12:40   ` Wu Bo
2023-11-29 17:26   ` Mikulas Patocka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox