From: Zheng Liu <gnehzuil.liu@gmail.com>
To: linux-ext4@vger.kernel.org
Cc: Jan Kara <jack@suse.cz>
Subject: [RFC] call end_page_writeback after converting unwritten extents in ext4_end_io
Date: Thu, 10 Jan 2013 13:56:17 +0800 [thread overview]
Message-ID: <20130110055617.GA4415@gmail.com> (raw)
Hi all,
Now I am trying to handle AIO DIO with O_SYNC using extent status tree in ext4.
After applied Christoph's patch series, O_SYNC semantics in ext4 will be broken.
This problem can be fixed using extent status tree. But we will get a deadlock
because i_mutex needs to be taken in ext4_sync_file() and then it will wait on
i_unwritten==0. So let's consider what happends after applied Christoph's
patches and using extent status tree to ensure AIO DIO with O_SYNC semantics.
ext4_ext_direct_IO: ext4_ind_direct_IO:
->ext4_file_write()
->mutex_lock(i_mutex)
->ext4_ind_direct_IO()
[if this is an append dio]
->mutex_unlock(i_mutex)
->ext4_file_write()
->mutex_lock(i_mutex)
->ext4_ext_direct_IO()
->mutex_unlock(i_mutex)
->generic_write_sync()
->ext4_sync_file()
->mutex_lock(i_mutex)
->ext4_flush_unwritten_io()
->ext4_do_flush_complete_IO()
[there is empty list]
->ext4_unwritten_wait()
[wait on i_unwritten==0 because
in ext4_ext_direct_IO i_unwritten
has been increased]
kworkd:
->dio_complete()
->ext4_end_dio()
->ext4_es_convert_unwritten_extents()
[convert unwritten extents in status
tree to ensure O_SYNC semantics]
->ext4_add_complete_io()
->generic_write_sync()
->ext4_sync_file()
->mutex_lock(i_mutex)
[*DEADLOCK*]
Thus all we need to do is do not wait on i_unwritten==0. But, as this
commit (c278531d) described, there is a time window that integrity is
broken. So we need to call end_page_writeback() after converting
unwritten extents in ext4_end_io(). However, if we call end_page_writeback()
after conversion has been done in ext4_end_io(), we will get another deadlock
because in ext4_convert_unwritten_extents() we need to start a journal and it is
possible to cause a journal commit. At the time if ext4_write_begin() is
called, it also will start a journal and then it will wait on writeback in
grab_cache_page_write_begin().
Now I have an idea to solve this problem. We start a journal before submitting
an io request rather than start it in ext4_convert_unwritten_extents(). The
reason of starting a journal in ext4_convert_unwritten_extents() is that we need
to calculate credits for journal. But as far as I understand the credits is not
increased in this function because we have splitted extents before submitting
this io request. A 'handle_t *handle' will be added into ext4_io_end_t, and it
will be used in ext4_convert_unwritten_extents(). Then we can avoid to
trigger a journal commit when starting a journal.
Hope my description is clear. Any comments or feedbacks are always welcome.
Jan, I don't know whether you have begun to try to fix this problem or not. If
there has an update, please let me know.
Thanks,
- Zheng
next reply other threads:[~2013-01-10 5:42 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-10 5:56 Zheng Liu [this message]
2013-01-10 14:47 ` [RFC] call end_page_writeback after converting unwritten extents in ext4_end_io Jan Kara
2013-01-11 2:29 ` Zheng Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130110055617.GA4415@gmail.com \
--to=gnehzuil.liu@gmail.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.