* [PATCH v2 0/2] dm verity: fix FEC stuck during lower dm suspend @ 2023-11-22 3:51 Wu Bo 2023-11-22 3:51 ` [PATCH v2 1/2] dm verity: init fec io before cleaning it Wu Bo 2023-11-22 3:51 ` [PATCH v2 2/2] dm verity: don't verity if readahead failed Wu Bo 0 siblings, 2 replies; 6+ messages in thread From: Wu Bo @ 2023-11-22 3:51 UTC (permalink / raw) To: Alasdair Kergon, Mike Snitzer, Mikulas Patocka Cc: dm-devel, linux-kernel, Wu Bo, Eric Biggers, Wu Bo Hi, We found an issue under Android OTA scenario that many readahead BIOs have to do FEC and caused system stuck. These 2 patches try to fix this issue. Thanks Changes from v1: 1. Modify patch1 as Eric suggested 2. Add "Fixes" and "Cc stable" tags Wu Bo (2): dm verity: init fec io before cleaning it dm verity: don't verity if readahead failed drivers/md/dm-verity-target.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/2] dm verity: init fec io before cleaning it 2023-11-22 3:51 [PATCH v2 0/2] dm verity: fix FEC stuck during lower dm suspend Wu Bo @ 2023-11-22 3:51 ` Wu Bo 2023-11-29 17:20 ` Mikulas Patocka 2023-11-22 3:51 ` [PATCH v2 2/2] dm verity: don't verity if readahead failed Wu Bo 1 sibling, 1 reply; 6+ messages in thread From: Wu Bo @ 2023-11-22 3:51 UTC (permalink / raw) To: Alasdair Kergon, Mike Snitzer, Mikulas Patocka Cc: dm-devel, linux-kernel, Wu Bo, Eric Biggers, Wu Bo, stable If BIO error, it may goto verity_finish_io() before verity_fec_init_io(). Therefor, the fec_io->rs is not initialized and may crash when doing memory freeing in verity_fec_finish_io(). Crash call stack: die+0x90/0x2b8 __do_kernel_fault+0x260/0x298 do_bad_area+0x2c/0xdc do_translation_fault+0x3c/0x54 do_mem_abort+0x54/0x118 el1_abort+0x38/0x5c el1h_64_sync_handler+0x50/0x90 el1h_64_sync+0x64/0x6c free_rs+0x18/0xac fec_rs_free+0x10/0x24 mempool_free+0x58/0x148 verity_fec_finish_io+0x4c/0xb0 verity_end_io+0xb8/0x150 Cc: stable@vger.kernel.org # v6.0+ Fixes: 5721d4e5a9cd ("dm verity: Add optional "try_verify_in_tasklet" feature") Signed-off-by: Wu Bo <bo.wu@vivo.com> --- drivers/md/dm-verity-target.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c index e115fcfe723c..beec14b6b044 100644 --- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c @@ -642,7 +642,6 @@ static void verity_work(struct work_struct *w) io->in_tasklet = false; - verity_fec_init_io(io); verity_finish_io(io, errno_to_blk_status(verity_verify_io(io))); } @@ -792,6 +791,8 @@ static int verity_map(struct dm_target *ti, struct bio *bio) bio->bi_private = io; io->iter = bio->bi_iter; + verity_fec_init_io(io); + verity_submit_prefetch(v, io); submit_bio_noacct(bio); -- 2.25.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] dm verity: init fec io before cleaning it 2023-11-22 3:51 ` [PATCH v2 1/2] dm verity: init fec io before cleaning it Wu Bo @ 2023-11-29 17:20 ` Mikulas Patocka 0 siblings, 0 replies; 6+ messages in thread From: Mikulas Patocka @ 2023-11-29 17:20 UTC (permalink / raw) To: Wu Bo Cc: Alasdair Kergon, Mike Snitzer, dm-devel, linux-kernel, Wu Bo, Eric Biggers, stable On Tue, 21 Nov 2023, Wu Bo wrote: > If BIO error, it may goto verity_finish_io() before > verity_fec_init_io(). Therefor, the fec_io->rs is not initialized and > may crash when doing memory freeing in verity_fec_finish_io(). > > Crash call stack: > die+0x90/0x2b8 > __do_kernel_fault+0x260/0x298 > do_bad_area+0x2c/0xdc > do_translation_fault+0x3c/0x54 > do_mem_abort+0x54/0x118 > el1_abort+0x38/0x5c > el1h_64_sync_handler+0x50/0x90 > el1h_64_sync+0x64/0x6c > free_rs+0x18/0xac > fec_rs_free+0x10/0x24 > mempool_free+0x58/0x148 > verity_fec_finish_io+0x4c/0xb0 > verity_end_io+0xb8/0x150 > > Cc: stable@vger.kernel.org # v6.0+ > Fixes: 5721d4e5a9cd ("dm verity: Add optional "try_verify_in_tasklet" feature") > Signed-off-by: Wu Bo <bo.wu@vivo.com> > --- > drivers/md/dm-verity-target.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c > index e115fcfe723c..beec14b6b044 100644 > --- a/drivers/md/dm-verity-target.c > +++ b/drivers/md/dm-verity-target.c > @@ -642,7 +642,6 @@ static void verity_work(struct work_struct *w) > > io->in_tasklet = false; > > - verity_fec_init_io(io); > verity_finish_io(io, errno_to_blk_status(verity_verify_io(io))); > } > > @@ -792,6 +791,8 @@ static int verity_map(struct dm_target *ti, struct bio *bio) > bio->bi_private = io; > io->iter = bio->bi_iter; > > + verity_fec_init_io(io); > + > verity_submit_prefetch(v, io); > > submit_bio_noacct(bio); > -- > 2.25.1 Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] dm verity: don't verity if readahead failed 2023-11-22 3:51 [PATCH v2 0/2] dm verity: fix FEC stuck during lower dm suspend Wu Bo 2023-11-22 3:51 ` [PATCH v2 1/2] dm verity: init fec io before cleaning it Wu Bo @ 2023-11-22 3:51 ` Wu Bo 2023-11-28 12:40 ` Wu Bo 2023-11-29 17:26 ` Mikulas Patocka 1 sibling, 2 replies; 6+ messages in thread From: Wu Bo @ 2023-11-22 3:51 UTC (permalink / raw) To: Alasdair Kergon, Mike Snitzer, Mikulas Patocka Cc: dm-devel, linux-kernel, Wu Bo, Eric Biggers, Wu Bo, stable We found an issue under Android OTA scenario that many BIOs have to do FEC where the data under dm-verity is 100% complete and no corruption. Android OTA has many dm-block layers, from upper to lower: dm-verity dm-snapshot dm-origin & dm-cow dm-linear ufs Dm tables have to change 2 times during Android OTA merging process. When doing table change, the dm-snapshot will be suspended for a while. During this interval, we found there are many readahead IOs are submitted to dm_verity from filesystem. Then the kverity works are busy doing FEC process which cost too much time to finish dm-verity IO. And cause system stuck. We add some debug log and find that each readahead IO need around 10s to finish when this situation occurred. Because here has a IO amplification: dm-snapshot suspend erofs_readahead // 300+ io is submitted dm_submit_bio (dm_verity) dm_submit_bio (dm_snapshot) bio return EIO bio got nothing, it's empty verity_end_io verity_verify_io forloop range(0, io->n_blocks) // each io->nblocks ~= 20 verity_fec_decode fec_decode_rsb fec_read_bufs forloop range(0, v->fec->rsn) // v->fec->rsn = 253 new_read submit_bio (dm_snapshot) end loop end loop dm-snapshot resume Readahead BIO got nothing during dm-snapshot suspended. So all of them will do FEC. Each readahead BIO need to do io->n_blocks ~= 20 times verify. Each block need to do fec, and every block need to do v->fec->rsn = 253 times read. So during the suspend interval(~200ms), 300 readahead BIO make 300*20*253 IOs on dm-snapshot. As readahead IO is not required by user space, and to fix this issue, I think it would be better to pass it to upper layer to handle it. Cc: stable@vger.kernel.org Fixes: a739ff3f543a ("dm verity: add support for forward error correction") Signed-off-by: Wu Bo <bo.wu@vivo.com> --- drivers/md/dm-verity-target.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c index beec14b6b044..14e58ae70521 100644 --- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c @@ -667,7 +667,9 @@ static void verity_end_io(struct bio *bio) struct dm_verity_io *io = bio->bi_private; if (bio->bi_status && - (!verity_fec_is_enabled(io->v) || verity_is_system_shutting_down())) { + (!verity_fec_is_enabled(io->v) || + verity_is_system_shutting_down() || + (bio->bi_opf & REQ_RAHEAD))) { verity_finish_io(io, bio->bi_status); return; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] dm verity: don't verity if readahead failed 2023-11-22 3:51 ` [PATCH v2 2/2] dm verity: don't verity if readahead failed Wu Bo @ 2023-11-28 12:40 ` Wu Bo 2023-11-29 17:26 ` Mikulas Patocka 1 sibling, 0 replies; 6+ messages in thread From: Wu Bo @ 2023-11-28 12:40 UTC (permalink / raw) To: Wu Bo, Alasdair Kergon, Mike Snitzer, Mikulas Patocka Cc: dm-devel, linux-kernel, Eric Biggers, stable ping. On 2023/11/22 11:51, Wu Bo wrote: > We found an issue under Android OTA scenario that many BIOs have to do > FEC where the data under dm-verity is 100% complete and no corruption. > > Android OTA has many dm-block layers, from upper to lower: > dm-verity > dm-snapshot > dm-origin & dm-cow > dm-linear > ufs > > Dm tables have to change 2 times during Android OTA merging process. > When doing table change, the dm-snapshot will be suspended for a while. > During this interval, we found there are many readahead IOs are > submitted to dm_verity from filesystem. Then the kverity works are busy > doing FEC process which cost too much time to finish dm-verity IO. And > cause system stuck. > > We add some debug log and find that each readahead IO need around 10s to > finish when this situation occurred. Because here has a IO > amplification: > > dm-snapshot suspend > erofs_readahead // 300+ io is submitted > dm_submit_bio (dm_verity) > dm_submit_bio (dm_snapshot) > bio return EIO > bio got nothing, it's empty > verity_end_io > verity_verify_io > forloop range(0, io->n_blocks) // each io->nblocks ~= 20 > verity_fec_decode > fec_decode_rsb > fec_read_bufs > forloop range(0, v->fec->rsn) // v->fec->rsn = 253 > new_read > submit_bio (dm_snapshot) > end loop > end loop > dm-snapshot resume > > Readahead BIO got nothing during dm-snapshot suspended. So all of them > will do FEC. > Each readahead BIO need to do io->n_blocks ~= 20 times verify. > Each block need to do fec, and every block need to do v->fec->rsn = 253 > times read. > So during the suspend interval(~200ms), 300 readahead BIO make > 300*20*253 IOs on dm-snapshot. > > As readahead IO is not required by user space, and to fix this issue, > I think it would be better to pass it to upper layer to handle it. > > Cc: stable@vger.kernel.org > Fixes: a739ff3f543a ("dm verity: add support for forward error correction") > Signed-off-by: Wu Bo <bo.wu@vivo.com> > --- > drivers/md/dm-verity-target.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c > index beec14b6b044..14e58ae70521 100644 > --- a/drivers/md/dm-verity-target.c > +++ b/drivers/md/dm-verity-target.c > @@ -667,7 +667,9 @@ static void verity_end_io(struct bio *bio) > struct dm_verity_io *io = bio->bi_private; > > if (bio->bi_status && > - (!verity_fec_is_enabled(io->v) || verity_is_system_shutting_down())) { > + (!verity_fec_is_enabled(io->v) || > + verity_is_system_shutting_down() || > + (bio->bi_opf & REQ_RAHEAD))) { > verity_finish_io(io, bio->bi_status); > return; > } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] dm verity: don't verity if readahead failed 2023-11-22 3:51 ` [PATCH v2 2/2] dm verity: don't verity if readahead failed Wu Bo 2023-11-28 12:40 ` Wu Bo @ 2023-11-29 17:26 ` Mikulas Patocka 1 sibling, 0 replies; 6+ messages in thread From: Mikulas Patocka @ 2023-11-29 17:26 UTC (permalink / raw) To: Wu Bo Cc: Alasdair Kergon, Mike Snitzer, dm-devel, linux-kernel, Wu Bo, Eric Biggers, stable On Tue, 21 Nov 2023, Wu Bo wrote: > We found an issue under Android OTA scenario that many BIOs have to do > FEC where the data under dm-verity is 100% complete and no corruption. > > Android OTA has many dm-block layers, from upper to lower: > dm-verity > dm-snapshot > dm-origin & dm-cow > dm-linear > ufs > > Dm tables have to change 2 times during Android OTA merging process. > When doing table change, the dm-snapshot will be suspended for a while. > During this interval, we found there are many readahead IOs are > submitted to dm_verity from filesystem. Then the kverity works are busy > doing FEC process which cost too much time to finish dm-verity IO. And > cause system stuck. > > We add some debug log and find that each readahead IO need around 10s to > finish when this situation occurred. Because here has a IO > amplification: > > dm-snapshot suspend > erofs_readahead // 300+ io is submitted > dm_submit_bio (dm_verity) > dm_submit_bio (dm_snapshot) > bio return EIO > bio got nothing, it's empty > verity_end_io > verity_verify_io > forloop range(0, io->n_blocks) // each io->nblocks ~= 20 > verity_fec_decode > fec_decode_rsb > fec_read_bufs > forloop range(0, v->fec->rsn) // v->fec->rsn = 253 > new_read > submit_bio (dm_snapshot) > end loop > end loop > dm-snapshot resume > > Readahead BIO got nothing during dm-snapshot suspended. So all of them > will do FEC. > Each readahead BIO need to do io->n_blocks ~= 20 times verify. > Each block need to do fec, and every block need to do v->fec->rsn = 253 > times read. > So during the suspend interval(~200ms), 300 readahead BIO make > 300*20*253 IOs on dm-snapshot. > > As readahead IO is not required by user space, and to fix this issue, > I think it would be better to pass it to upper layer to handle it. > > Cc: stable@vger.kernel.org > Fixes: a739ff3f543a ("dm verity: add support for forward error correction") > Signed-off-by: Wu Bo <bo.wu@vivo.com> > --- > drivers/md/dm-verity-target.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c > index beec14b6b044..14e58ae70521 100644 > --- a/drivers/md/dm-verity-target.c > +++ b/drivers/md/dm-verity-target.c > @@ -667,7 +667,9 @@ static void verity_end_io(struct bio *bio) > struct dm_verity_io *io = bio->bi_private; > > if (bio->bi_status && > - (!verity_fec_is_enabled(io->v) || verity_is_system_shutting_down())) { > + (!verity_fec_is_enabled(io->v) || > + verity_is_system_shutting_down() || > + (bio->bi_opf & REQ_RAHEAD))) { > verity_finish_io(io, bio->bi_status); > return; > } > -- > 2.25.1 > Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-11-29 17:26 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-22 3:51 [PATCH v2 0/2] dm verity: fix FEC stuck during lower dm suspend Wu Bo 2023-11-22 3:51 ` [PATCH v2 1/2] dm verity: init fec io before cleaning it Wu Bo 2023-11-29 17:20 ` Mikulas Patocka 2023-11-22 3:51 ` [PATCH v2 2/2] dm verity: don't verity if readahead failed Wu Bo 2023-11-28 12:40 ` Wu Bo 2023-11-29 17:26 ` Mikulas Patocka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox