* [PATCH 1/3] ext4: handle unwritten or delalloc buffers before enabling per-file data journaling
@ 2015-11-18 1:34 Daeho Jeong
2015-11-30 13:45 ` Jan Kara
0 siblings, 1 reply; 4+ messages in thread
From: Daeho Jeong @ 2015-11-18 1:34 UTC (permalink / raw)
To: tytso, linux-ext4, daeho.jeong
We already allocate delalloc blocks before changing the inode mode into
"per-file data journal" mode to prevent delalloc blocks from remaining
not allocated, but another issue concerned with "BH_Unwritten" status
still exists. For example, by fallocate(), several buffers' status
change into "BH_Unwritten", but these buffers cannot be processed by
ext4_alloc_da_blocks(). So, they still remain in unwritten status after
per-file data journaling is enabled and they cannot be changed into
written status any more and, if they are journaled and eventually
checkpointed, these unwritten buffer will cause a kernel panic by the
below BUG_ON() function of submit_bh_wbc() when they are submitted
during checkpointing.
static int submit_bh_wbc(int rw, struct buffer_head *bh,...
{
...
BUG_ON(buffer_unwritten(bh));
Moreover, when "dioread_nolock" option is enabled, the status of a
buffer is changed into "BH_Unwritten" after write_begin() completes and
the "BH_Unwritten" status will be cleared after I/O is done. Therefore,
if a buffer's status is changed into unwrutten but the buffer's I/O is
not submitted and completed, it can cause the same problem after
enabling per-file data journaling. You can easily generate this bug by
executing the following command.
./kvm-xfstests -C 10000 -m nodelalloc,dioread_nolock generic/269
To resolve these problems and define a boundary between the previous
mode and per-file data journaling mode, we need to flush and wait all
the I/O of buffers of a file before enabling per-file data journaling
of the file.
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
---
fs/ext4/inode.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 612fbcf..1f9458e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5168,9 +5168,14 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
* be allocated any more. even more truncate on delalloc blocks
* could trigger BUG by flushing delalloc blocks in journal.
* There is no delalloc block in non-journal data mode.
+ * We also have to handle unwritten buffers generated by
+ * fallocate() and dioread_nolock option. Once per-file data
+ * journaling is enabled, unwritten buffers will remain in
+ * unwritten status forever and they will be the seeds of
+ * kernel panic when they are checkpointed.
*/
- if (val && test_opt(inode->i_sb, DELALLOC)) {
- err = ext4_alloc_da_blocks(inode);
+ if (val) {
+ err = filemap_write_and_wait(inode->i_mapping);
if (err < 0)
return err;
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/3] ext4: handle unwritten or delalloc buffers before enabling per-file data journaling
2015-11-18 1:34 Daeho Jeong
@ 2015-11-30 13:45 ` Jan Kara
2015-11-30 13:55 ` Jan Kara
0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2015-11-30 13:45 UTC (permalink / raw)
To: Daeho Jeong; +Cc: tytso, linux-ext4
On Wed 18-11-15 10:34:32, Daeho Jeong wrote:
> We already allocate delalloc blocks before changing the inode mode into
> "per-file data journal" mode to prevent delalloc blocks from remaining
> not allocated, but another issue concerned with "BH_Unwritten" status
> still exists. For example, by fallocate(), several buffers' status
> change into "BH_Unwritten", but these buffers cannot be processed by
> ext4_alloc_da_blocks(). So, they still remain in unwritten status after
> per-file data journaling is enabled and they cannot be changed into
> written status any more and, if they are journaled and eventually
> checkpointed, these unwritten buffer will cause a kernel panic by the
> below BUG_ON() function of submit_bh_wbc() when they are submitted
> during checkpointing.
>
> static int submit_bh_wbc(int rw, struct buffer_head *bh,...
> {
> ...
> BUG_ON(buffer_unwritten(bh));
>
> Moreover, when "dioread_nolock" option is enabled, the status of a
> buffer is changed into "BH_Unwritten" after write_begin() completes and
> the "BH_Unwritten" status will be cleared after I/O is done. Therefore,
> if a buffer's status is changed into unwrutten but the buffer's I/O is
> not submitted and completed, it can cause the same problem after
> enabling per-file data journaling. You can easily generate this bug by
> executing the following command.
>
> ./kvm-xfstests -C 10000 -m nodelalloc,dioread_nolock generic/269
>
> To resolve these problems and define a boundary between the previous
> mode and per-file data journaling mode, we need to flush and wait all
> the I/O of buffers of a file before enabling per-file data journaling
> of the file.
>
> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
> ---
> fs/ext4/inode.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 612fbcf..1f9458e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5168,9 +5168,14 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> * be allocated any more. even more truncate on delalloc blocks
> * could trigger BUG by flushing delalloc blocks in journal.
> * There is no delalloc block in non-journal data mode.
> + * We also have to handle unwritten buffers generated by
> + * fallocate() and dioread_nolock option. Once per-file data
> + * journaling is enabled, unwritten buffers will remain in
> + * unwritten status forever and they will be the seeds of
> + * kernel panic when they are checkpointed.
Can we maybe rephrase the whole comment like:
/*
* Before flushing the journal and switching inode's aops, we have
* to flush all dirty data the inode has. There can be outstanding
* delayed allocations, there can be unwritten extents created by
* fallocate or buffered writes in dioread_nolock mode covered by
* dirty data which can be converted only after flushing the dirty
* data (and journalled aops don't know how to handle these cases).
*/
Otherwise the patch looks good to me. So after updating the comment you can
add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> - if (val && test_opt(inode->i_sb, DELALLOC)) {
> - err = ext4_alloc_da_blocks(inode);
> + if (val) {
> + err = filemap_write_and_wait(inode->i_mapping);
> if (err < 0)
> return err;
> }
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/3] ext4: handle unwritten or delalloc buffers before enabling per-file data journaling
2015-11-30 13:45 ` Jan Kara
@ 2015-11-30 13:55 ` Jan Kara
0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2015-11-30 13:55 UTC (permalink / raw)
To: Daeho Jeong; +Cc: tytso, linux-ext4
On Mon 30-11-15 14:45:37, Jan Kara wrote:
> On Wed 18-11-15 10:34:32, Daeho Jeong wrote:
> > We already allocate delalloc blocks before changing the inode mode into
> > "per-file data journal" mode to prevent delalloc blocks from remaining
> > not allocated, but another issue concerned with "BH_Unwritten" status
> > still exists. For example, by fallocate(), several buffers' status
> > change into "BH_Unwritten", but these buffers cannot be processed by
> > ext4_alloc_da_blocks(). So, they still remain in unwritten status after
> > per-file data journaling is enabled and they cannot be changed into
> > written status any more and, if they are journaled and eventually
> > checkpointed, these unwritten buffer will cause a kernel panic by the
> > below BUG_ON() function of submit_bh_wbc() when they are submitted
> > during checkpointing.
> >
> > static int submit_bh_wbc(int rw, struct buffer_head *bh,...
> > {
> > ...
> > BUG_ON(buffer_unwritten(bh));
> >
> > Moreover, when "dioread_nolock" option is enabled, the status of a
> > buffer is changed into "BH_Unwritten" after write_begin() completes and
> > the "BH_Unwritten" status will be cleared after I/O is done. Therefore,
> > if a buffer's status is changed into unwrutten but the buffer's I/O is
> > not submitted and completed, it can cause the same problem after
> > enabling per-file data journaling. You can easily generate this bug by
> > executing the following command.
> >
> > ./kvm-xfstests -C 10000 -m nodelalloc,dioread_nolock generic/269
> >
> > To resolve these problems and define a boundary between the previous
> > mode and per-file data journaling mode, we need to flush and wait all
> > the I/O of buffers of a file before enabling per-file data journaling
> > of the file.
>
>
> >
> > Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
> > ---
> > fs/ext4/inode.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 612fbcf..1f9458e 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -5168,9 +5168,14 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> > * be allocated any more. even more truncate on delalloc blocks
> > * could trigger BUG by flushing delalloc blocks in journal.
> > * There is no delalloc block in non-journal data mode.
> > + * We also have to handle unwritten buffers generated by
> > + * fallocate() and dioread_nolock option. Once per-file data
> > + * journaling is enabled, unwritten buffers will remain in
> > + * unwritten status forever and they will be the seeds of
> > + * kernel panic when they are checkpointed.
>
> Can we maybe rephrase the whole comment like:
>
> /*
> * Before flushing the journal and switching inode's aops, we have
> * to flush all dirty data the inode has. There can be outstanding
> * delayed allocations, there can be unwritten extents created by
> * fallocate or buffered writes in dioread_nolock mode covered by
> * dirty data which can be converted only after flushing the dirty
> * data (and journalled aops don't know how to handle these cases).
> */
>
> Otherwise the patch looks good to me. So after updating the comment you can
> add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
BTW, the code is still racy wrt mmap write faults. Once Ted merges my
patches for hole punching races, we need to grab i_mmap_sem for writing
here as well to avoid write page fault from creating dirty page while we
are switching aops... I'm mostly writing this here so that there's lower
chance this gets lost.
Honza
> > - if (val && test_opt(inode->i_sb, DELALLOC)) {
> > - err = ext4_alloc_da_blocks(inode);
> > + if (val) {
> > + err = filemap_write_and_wait(inode->i_mapping);
> > if (err < 0)
> > return err;
> > }
> > --
> > 1.7.9.5
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/3] ext4: handle unwritten or delalloc buffers before enabling per-file data journaling
@ 2015-12-01 4:42 Daeho Jeong
0 siblings, 0 replies; 4+ messages in thread
From: Daeho Jeong @ 2015-12-01 4:42 UTC (permalink / raw)
To: Jan Kara
Cc: tytso@mit.edu, linux-ext4@vger.kernel.org,
정대호
Thank you, Jan.
> Can we maybe rephrase the whole comment like:
>
> /*
> * Before flushing the journal and switching inode's aops, we have
> * to flush all dirty data the inode has. There can be outstanding
> * delayed allocations, there can be unwritten extents created by
> * fallocate or buffered writes in dioread_nolock mode covered by
> * dirty data which can be converted only after flushing the dirty
> * data (and journalled aops don't know how to handle these cases).
> */
Your comment is more understandable, so I will change the comment in
the patch.
> BTW, the code is still racy wrt mmap write faults. Once Ted merges my
> patches for hole punching races, we need to grab i_mmap_sem for writing
> here as well to avoid write page fault from creating dirty page while we
> are switching aops... I'm mostly writing this here so that there's lower
> chance this gets lost.
Ok. I've recognized another racy condition when switching aops of an inode.
I will also consider the point in v2 of this patch.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-12-01 4:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-01 4:42 [PATCH 1/3] ext4: handle unwritten or delalloc buffers before enabling per-file data journaling Daeho Jeong
-- strict thread matches above, loose matches on Subject: below --
2015-11-18 1:34 Daeho Jeong
2015-11-30 13:45 ` Jan Kara
2015-11-30 13:55 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox