* Re: [PATCH 2/3] ext4: remove incorrect check for inode journal mode in ext4_writepages()
@ 2015-12-01 4:50 Daeho Jeong
0 siblings, 0 replies; 3+ messages in thread
From: Daeho Jeong @ 2015-12-01 4:50 UTC (permalink / raw)
To: Jan Kara
Cc: tytso@mit.edu, linux-ext4@vger.kernel.org,
정대호
> Actually what you do only hides the real problem - that ext4_writepages()
> in non-journalled mode can be still running for an inode which is already
> switched to journalled mode. In theory if we manage to dirty some pages
> after the switch, non-journalled writepages *can* see them an try to write
> them back which will break spectacularly. So to fix this we need something
> like a writeback barrier for the inode - make sure all ext4_writepages()
> calls have completed before switching aops. Now I'd hate to grow struct
> ext4_inode_info only for this extra rare case so we could probably
> implement the barrier on per-filesystem basis - a fs-wide per-cpu rw
> semaphore acquired for reading while ext4_writepages() runs and acquired
> for writing when we switch aops for some inode.
Yes, there is another issue that newly dirtied pages after the switch can be
shown in ext4_writepages() in non-journalled mode. I overlooked that point.
According to your comment, I will modify this patch.
Thank you, again.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/3] ext4: handle unwritten or delalloc buffers before enabling per-file data journaling
@ 2015-11-18 1:34 Daeho Jeong
2015-11-18 1:34 ` [PATCH 2/3] ext4: remove incorrect check for inode journal mode in ext4_writepages() Daeho Jeong
0 siblings, 1 reply; 3+ messages in thread
From: Daeho Jeong @ 2015-11-18 1:34 UTC (permalink / raw)
To: tytso, linux-ext4, daeho.jeong
We already allocate delalloc blocks before changing the inode mode into
"per-file data journal" mode to prevent delalloc blocks from remaining
not allocated, but another issue concerned with "BH_Unwritten" status
still exists. For example, by fallocate(), several buffers' status
change into "BH_Unwritten", but these buffers cannot be processed by
ext4_alloc_da_blocks(). So, they still remain in unwritten status after
per-file data journaling is enabled and they cannot be changed into
written status any more and, if they are journaled and eventually
checkpointed, these unwritten buffer will cause a kernel panic by the
below BUG_ON() function of submit_bh_wbc() when they are submitted
during checkpointing.
static int submit_bh_wbc(int rw, struct buffer_head *bh,...
{
...
BUG_ON(buffer_unwritten(bh));
Moreover, when "dioread_nolock" option is enabled, the status of a
buffer is changed into "BH_Unwritten" after write_begin() completes and
the "BH_Unwritten" status will be cleared after I/O is done. Therefore,
if a buffer's status is changed into unwrutten but the buffer's I/O is
not submitted and completed, it can cause the same problem after
enabling per-file data journaling. You can easily generate this bug by
executing the following command.
./kvm-xfstests -C 10000 -m nodelalloc,dioread_nolock generic/269
To resolve these problems and define a boundary between the previous
mode and per-file data journaling mode, we need to flush and wait all
the I/O of buffers of a file before enabling per-file data journaling
of the file.
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
---
fs/ext4/inode.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 612fbcf..1f9458e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5168,9 +5168,14 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
* be allocated any more. even more truncate on delalloc blocks
* could trigger BUG by flushing delalloc blocks in journal.
* There is no delalloc block in non-journal data mode.
+ * We also have to handle unwritten buffers generated by
+ * fallocate() and dioread_nolock option. Once per-file data
+ * journaling is enabled, unwritten buffers will remain in
+ * unwritten status forever and they will be the seeds of
+ * kernel panic when they are checkpointed.
*/
- if (val && test_opt(inode->i_sb, DELALLOC)) {
- err = ext4_alloc_da_blocks(inode);
+ if (val) {
+ err = filemap_write_and_wait(inode->i_mapping);
if (err < 0)
return err;
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 3+ messages in thread* [PATCH 2/3] ext4: remove incorrect check for inode journal mode in ext4_writepages()
2015-11-18 1:34 [PATCH 1/3] ext4: handle unwritten or delalloc buffers before enabling per-file data journaling Daeho Jeong
@ 2015-11-18 1:34 ` Daeho Jeong
2015-11-30 14:08 ` Jan Kara
0 siblings, 1 reply; 3+ messages in thread
From: Daeho Jeong @ 2015-11-18 1:34 UTC (permalink / raw)
To: tytso, linux-ext4, daeho.jeong
Now, in ext4, there is only one writepages() function and it is shared
by all the inode modes. Therefore, BUG_ON() for checking journaled
inode mode in ext4_writepages() is not correct anymore because, if
per-file data journaling of a file is enabled while ext4_writepages()
is being executed, this BUG_ON() function can cause a kernel panic
unintentionally even on "nodelalloc" mode.
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
---
fs/ext4/inode.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1f9458e..db24348 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2480,13 +2480,11 @@ retry:
}
/*
- * We have two constraints: We find one extent to map and we
+ * We have a constraint: We find one extent to map and we
* must always write out whole page (makes a difference when
* blocksize < pagesize) so that we don't block on IO when we
- * try to write out the rest of the page. Journalled mode is
- * not supported by delalloc.
+ * try to write out the rest of the page.
*/
- BUG_ON(ext4_should_journal_data(inode));
needed_blocks = ext4_da_writepages_trans_blocks(inode);
/* start a new transaction */
--
1.7.9.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 2/3] ext4: remove incorrect check for inode journal mode in ext4_writepages()
2015-11-18 1:34 ` [PATCH 2/3] ext4: remove incorrect check for inode journal mode in ext4_writepages() Daeho Jeong
@ 2015-11-30 14:08 ` Jan Kara
0 siblings, 0 replies; 3+ messages in thread
From: Jan Kara @ 2015-11-30 14:08 UTC (permalink / raw)
To: Daeho Jeong; +Cc: tytso, linux-ext4
On Wed 18-11-15 10:34:33, Daeho Jeong wrote:
> Now, in ext4, there is only one writepages() function and it is shared
> by all the inode modes. Therefore, BUG_ON() for checking journaled
> inode mode in ext4_writepages() is not correct anymore because, if
> per-file data journaling of a file is enabled while ext4_writepages()
> is being executed, this BUG_ON() function can cause a kernel panic
> unintentionally even on "nodelalloc" mode.
>
> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
> ---
> fs/ext4/inode.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 1f9458e..db24348 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2480,13 +2480,11 @@ retry:
> }
>
> /*
> - * We have two constraints: We find one extent to map and we
> + * We have a constraint: We find one extent to map and we
> * must always write out whole page (makes a difference when
> * blocksize < pagesize) so that we don't block on IO when we
> - * try to write out the rest of the page. Journalled mode is
> - * not supported by delalloc.
> + * try to write out the rest of the page.
> */
Well, it is still true that journalled mode is not supported by delalloc so
I would not delete the comment.
Actually what you do only hides the real problem - that ext4_writepages()
in non-journalled mode can be still running for an inode which is already
switched to journalled mode. In theory if we manage to dirty some pages
after the switch, non-journalled writepages *can* see them an try to write
them back which will break spectacularly. So to fix this we need something
like a writeback barrier for the inode - make sure all ext4_writepages()
calls have completed before switching aops. Now I'd hate to grow struct
ext4_inode_info only for this extra rare case so we could probably
implement the barrier on per-filesystem basis - a fs-wide per-cpu rw
semaphore acquired for reading while ext4_writepages() runs and acquired
for writing when we switch aops for some inode.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-12-01 4:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-01 4:50 [PATCH 2/3] ext4: remove incorrect check for inode journal mode in ext4_writepages() Daeho Jeong
-- strict thread matches above, loose matches on Subject: below --
2015-11-18 1:34 [PATCH 1/3] ext4: handle unwritten or delalloc buffers before enabling per-file data journaling Daeho Jeong
2015-11-18 1:34 ` [PATCH 2/3] ext4: remove incorrect check for inode journal mode in ext4_writepages() Daeho Jeong
2015-11-30 14:08 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).