* [PATCH] xfs: up(ic_sema) if flushing data device fails
@ 2023-10-23 18:14 Leah Rumancik
2023-10-23 21:22 ` Darrick J. Wong
0 siblings, 1 reply; 4+ messages in thread
From: Leah Rumancik @ 2023-10-23 18:14 UTC (permalink / raw)
To: linux-xfs; +Cc: djwong, Leah Rumancik
We flush the data device cache before we issue external log IO. Since
7d839e325af2, we check the return value of the flush, and if the flush
failed, we shut down the log immediately and return. However, the
iclog->ic_sema is left in a decremented state so let's add an up().
Prior to this patch, xfs/438 would fail consistently when running with
an external log device:
sync
-> xfs_log_force
-> xlog_write_iclog
-> down(&iclog->ic_sema)
-> blkdev_issue_flush (fail causes us to intiate shutdown)
-> xlog_force_shutdown
-> return
unmount
-> xfs_log_umount
-> xlog_wait_iclog_completion
-> down(&iclog->ic_sema) --------> HANG
There is a second early return / shutdown. Add an up() there as well.
Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices")
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
Notes:
Tested auto group for xfs/4k and xfs/logdev configs with no regressions
seen.
fs/xfs/xfs_log.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 51c100c86177..b4a8105299c2 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1926,6 +1926,7 @@ xlog_write_iclog(
*/
if (log->l_targ != log->l_mp->m_ddev_targp &&
blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) {
+ up(&iclog->ic_sema);
xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
return;
}
@@ -1936,6 +1937,7 @@ xlog_write_iclog(
iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA);
if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) {
+ up(&iclog->ic_sema);
xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
return;
}
--
2.42.0.758.gaed0368e0e-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] xfs: up(ic_sema) if flushing data device fails 2023-10-23 18:14 [PATCH] xfs: up(ic_sema) if flushing data device fails Leah Rumancik @ 2023-10-23 21:22 ` Darrick J. Wong 2023-10-24 0:44 ` Leah Rumancik 0 siblings, 1 reply; 4+ messages in thread From: Darrick J. Wong @ 2023-10-23 21:22 UTC (permalink / raw) To: Leah Rumancik; +Cc: linux-xfs On Mon, Oct 23, 2023 at 11:14:10AM -0700, Leah Rumancik wrote: > We flush the data device cache before we issue external log IO. Since > 7d839e325af2, we check the return value of the flush, and if the flush > failed, we shut down the log immediately and return. However, the > iclog->ic_sema is left in a decremented state so let's add an up(). > Prior to this patch, xfs/438 would fail consistently when running with > an external log device: > > sync > -> xfs_log_force > -> xlog_write_iclog > -> down(&iclog->ic_sema) > -> blkdev_issue_flush (fail causes us to intiate shutdown) > -> xlog_force_shutdown > -> return > > unmount > -> xfs_log_umount > -> xlog_wait_iclog_completion > -> down(&iclog->ic_sema) --------> HANG > > There is a second early return / shutdown. Add an up() there as well. Ow. Yes, I think it's correct that both of those error returns need to drop ic_sema since we don't submit_bio, so there is no xlog_ioend_work to do it for us. > Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") Hmm. This bug was introduced in b5d721eaae47e ("xfs: external logs need to flush data device"), not 7d839. That said, this patch only applies cleanly to 7d839e325af2. b5d721 was introduced in 5.14 and 7d839 came in via 6.0, so ... this is where I would have spat out: Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") Actually-Fixes: b5d721eaae47e ("xfs: external logs need to flush data device") > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> > --- > > Notes: > Tested auto group for xfs/4k and xfs/logdev configs with no regressions > seen. > > fs/xfs/xfs_log.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index 51c100c86177..b4a8105299c2 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -1926,6 +1926,7 @@ xlog_write_iclog( > */ > if (log->l_targ != log->l_mp->m_ddev_targp && > blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) { > + up(&iclog->ic_sema); > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > return; > } > @@ -1936,6 +1937,7 @@ xlog_write_iclog( > iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA); > > if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) { > + up(&iclog->ic_sema); > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); I wonder if these two should both become a cleanup clause at the end? if (log->l_targ != log->l_mp->m_ddev_targp && blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) goto shutdown; ... if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) goto shutdown; ... submit_bio(&iclog->ic_bio); return; shutdown: up(&iclog->ic_sema); xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); } Seeing as we've screwed this up twice now and not a whole lot of people actually use external logs, and somehow I've never seen this on my test fleet. Anyway the code change looks correct so modulo the stylistic thing, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > return; > } > -- > 2.42.0.758.gaed0368e0e-goog > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: up(ic_sema) if flushing data device fails 2023-10-23 21:22 ` Darrick J. Wong @ 2023-10-24 0:44 ` Leah Rumancik 2023-10-24 1:41 ` Darrick J. Wong 0 siblings, 1 reply; 4+ messages in thread From: Leah Rumancik @ 2023-10-24 0:44 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs On Mon, Oct 23, 2023 at 02:22:21PM -0700, Darrick J. Wong wrote: > On Mon, Oct 23, 2023 at 11:14:10AM -0700, Leah Rumancik wrote: > > We flush the data device cache before we issue external log IO. Since > > 7d839e325af2, we check the return value of the flush, and if the flush > > failed, we shut down the log immediately and return. However, the > > iclog->ic_sema is left in a decremented state so let's add an up(). > > Prior to this patch, xfs/438 would fail consistently when running with > > an external log device: > > > > sync > > -> xfs_log_force > > -> xlog_write_iclog > > -> down(&iclog->ic_sema) > > -> blkdev_issue_flush (fail causes us to intiate shutdown) > > -> xlog_force_shutdown > > -> return > > > > unmount > > -> xfs_log_umount > > -> xlog_wait_iclog_completion > > -> down(&iclog->ic_sema) --------> HANG > > > > There is a second early return / shutdown. Add an up() there as well. > > Ow. Yes, I think it's correct that both of those error returns need to > drop ic_sema since we don't submit_bio, so there is no xlog_ioend_work > to do it for us. > > > Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") > > Hmm. This bug was introduced in b5d721eaae47e ("xfs: external logs need > to flush data device"), not 7d839. That said, this patch only applies > cleanly to 7d839e325af2. > > b5d721 was introduced in 5.14 and 7d839 came in via 6.0, so ... this is > where I would have spat out: > > Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") > Actually-Fixes: b5d721eaae47e ("xfs: external logs need to flush data device") I'm not sure I follow. Before 7d839e325af2, we didn't return if there was an issue with the flush so there wasn't an issue with ic_sema. 7d839e325af2 was a fix for eef983ffeae7 though. Do you try to keep fixes tags associated with the original commit as opposed to fixes of fixes? > > > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> > > --- > > > > Notes: > > Tested auto group for xfs/4k and xfs/logdev configs with no regressions > > seen. > > > > fs/xfs/xfs_log.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > index 51c100c86177..b4a8105299c2 100644 > > --- a/fs/xfs/xfs_log.c > > +++ b/fs/xfs/xfs_log.c > > @@ -1926,6 +1926,7 @@ xlog_write_iclog( > > */ > > if (log->l_targ != log->l_mp->m_ddev_targp && > > blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) { > > + up(&iclog->ic_sema); > > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > > return; > > } > > @@ -1936,6 +1937,7 @@ xlog_write_iclog( > > iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA); > > > > if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) { > > + up(&iclog->ic_sema); > > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > > I wonder if these two should both become a cleanup clause at the end? Sure, that sounds good, I'll create a new version. Thanks for the review! :) - Leah > > if (log->l_targ != log->l_mp->m_ddev_targp && > blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) > goto shutdown; > > ... > > if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) > goto shutdown; > > ... > > submit_bio(&iclog->ic_bio); > return; > > shutdown: > up(&iclog->ic_sema); > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > } > > Seeing as we've screwed this up twice now and not a whole lot of people > actually use external logs, and somehow I've never seen this on my test > fleet. > > Anyway the code change looks correct so modulo the stylistic thing, > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > --D > > > return; > > } > > -- > > 2.42.0.758.gaed0368e0e-goog > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: up(ic_sema) if flushing data device fails 2023-10-24 0:44 ` Leah Rumancik @ 2023-10-24 1:41 ` Darrick J. Wong 0 siblings, 0 replies; 4+ messages in thread From: Darrick J. Wong @ 2023-10-24 1:41 UTC (permalink / raw) To: Leah Rumancik; +Cc: linux-xfs On Mon, Oct 23, 2023 at 05:44:14PM -0700, Leah Rumancik wrote: > On Mon, Oct 23, 2023 at 02:22:21PM -0700, Darrick J. Wong wrote: > > On Mon, Oct 23, 2023 at 11:14:10AM -0700, Leah Rumancik wrote: > > > We flush the data device cache before we issue external log IO. Since > > > 7d839e325af2, we check the return value of the flush, and if the flush > > > failed, we shut down the log immediately and return. However, the > > > iclog->ic_sema is left in a decremented state so let's add an up(). > > > Prior to this patch, xfs/438 would fail consistently when running with > > > an external log device: > > > > > > sync > > > -> xfs_log_force > > > -> xlog_write_iclog > > > -> down(&iclog->ic_sema) > > > -> blkdev_issue_flush (fail causes us to intiate shutdown) > > > -> xlog_force_shutdown > > > -> return > > > > > > unmount > > > -> xfs_log_umount > > > -> xlog_wait_iclog_completion > > > -> down(&iclog->ic_sema) --------> HANG > > > > > > There is a second early return / shutdown. Add an up() there as well. > > > > Ow. Yes, I think it's correct that both of those error returns need to > > drop ic_sema since we don't submit_bio, so there is no xlog_ioend_work > > to do it for us. > > > > > Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") > > > > Hmm. This bug was introduced in b5d721eaae47e ("xfs: external logs need > > to flush data device"), not 7d839. That said, this patch only applies > > cleanly to 7d839e325af2. > > > > b5d721 was introduced in 5.14 and 7d839 came in via 6.0, so ... this is > > where I would have spat out: > > > > Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") > > Actually-Fixes: b5d721eaae47e ("xfs: external logs need to flush data device") > > I'm not sure I follow. Before 7d839e325af2, we didn't return if there > was an issue with the flush so there wasn't an issue with ic_sema. > 7d839e325af2 was a fix for eef983ffeae7 though. Do you try to keep > fixes tags associated with the original commit as opposed to fixes of > fixes? I think we're both wrong about which commits this patch applies to. The error return for blkdev_issue_flush failing to unlock ic_sema was introduced in b5d721 when we added the bailout there. The error return for xlog_map_iclog_data failing to unlock ic_sema was introduced back in 842a42d126b4 ("xfs: shutdown on failure to add page to log bio") back in 5.7. So that's the original bug. That said, I think the stable rules say that the Fixes tag is supposed to help AUTOSEL, so they want the newest commit. Since you all are doing the backports by hand I suppose the AUTOSEL rules might not matter as much. So, uh... let's tag all three? :) > > > > > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> > > > --- > > > > > > Notes: > > > Tested auto group for xfs/4k and xfs/logdev configs with no regressions > > > seen. > > > > > > fs/xfs/xfs_log.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > > index 51c100c86177..b4a8105299c2 100644 > > > --- a/fs/xfs/xfs_log.c > > > +++ b/fs/xfs/xfs_log.c > > > @@ -1926,6 +1926,7 @@ xlog_write_iclog( > > > */ > > > if (log->l_targ != log->l_mp->m_ddev_targp && > > > blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) { > > > + up(&iclog->ic_sema); > > > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > > > return; > > > } > > > @@ -1936,6 +1937,7 @@ xlog_write_iclog( > > > iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA); > > > > > > if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) { > > > + up(&iclog->ic_sema); > > > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > > > > I wonder if these two should both become a cleanup clause at the end? > > Sure, that sounds good, I'll create a new version. > > Thanks for the review! :) NP. --D > - Leah > > > > if (log->l_targ != log->l_mp->m_ddev_targp && > > blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) > > goto shutdown; > > > > ... > > > > if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) > > goto shutdown; > > > > ... > > > > submit_bio(&iclog->ic_bio); > > return; > > > > shutdown: > > up(&iclog->ic_sema); > > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > > } > > > > Seeing as we've screwed this up twice now and not a whole lot of people > > actually use external logs, and somehow I've never seen this on my test > > fleet. > > > > Anyway the code change looks correct so modulo the stylistic thing, > > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > > > --D > > > > > return; > > > } > > > -- > > > 2.42.0.758.gaed0368e0e-goog > > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-10-24 1:41 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-23 18:14 [PATCH] xfs: up(ic_sema) if flushing data device fails Leah Rumancik 2023-10-23 21:22 ` Darrick J. Wong 2023-10-24 0:44 ` Leah Rumancik 2023-10-24 1:41 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox