From: Zheng Liu <gnehzuil.liu@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org,
"Darrick J. Wong" <darrick.wong@oracle.com>,
Christoph Hellwig <hch@infradead.org>,
Zheng Liu <wenqing.lz@taobao.com>
Subject: Re: [RFC][PATCH 8/9 v1] ext4: refine unwritten extent conversion
Date: Fri, 4 Jan 2013 12:26:01 +0800 [thread overview]
Message-ID: <20130104042601.GB27833@gmail.com> (raw)
In-Reply-To: <20130103105613.GA30695@quack.suse.cz>
On Thu, Jan 03, 2013 at 11:56:13AM +0100, Jan Kara wrote:
> On Tue 01-01-13 13:24:45, Zheng Liu wrote:
> > On Mon, Dec 31, 2012 at 10:58:15PM +0100, Jan Kara wrote:
> > > On Mon 24-12-12 15:55:41, Zheng Liu wrote:
> > > > From: Zheng Liu <wenqing.lz@taobao.com>
> > > >
> > > > Currently all unwritten extent conversion work is pushed into a workqueue to be
> > > > done because DIO end_io is in a irq context and this conversion needs to take
> > > > i_data_sem locking. But we couldn't take a semaphore in a irq context. After
> > > > tracking all extent status, we can first convert this unwritten extent in extent
> > > > status tree, and call aio_complete() and inode_dio_done() to notify upper level
> > > > that this dio has done. We don't need to be worried about exposing a stale data
> > > > because we first try to lookup in extent status tree. So it makes us to see the
> > > > latest extent status. Meanwhile we queue a work to convert this unwritten
> > > > extent in extent tree. After this improvement, reader also needn't wait this
> > > > conversion to be done when dioread_nolock is enabled.
> > > >
> > > > CC: Jan Kara <jack@suse.cz>
> > > > CC: "Darrick J. Wong" <darrick.wong@oracle.com>
> > > > CC: Christoph Hellwig <hch@infradead.org>
> > > > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> > > OK, so after reading other patches this should work fine. Just I think we
> > > should somehow mark in the extent status tree that the extent tree is
> > > inconsistent with what's on disk - something like extent dirty bit. It will
> > > be set for UNWRITTEN extents where conversion is pending logically it would
> > > also make sence to have it set for DELAYED extents. Then if we need to
> > > reclaim some extents due to memory pressure we know we have to keep dirty
> > > extents because those cache irreplacible information. What do you think?
> >
> > Dirty bit is a good idea for UNWRITTEN extent because we can feel free
> > to reclaim all WRITTEN extents and all UNWRITTEN extents that are
> > without dirty bit. But we can not reclaim DEALYED extents no matter
> > whether they are dirty or not because they are used to lookup an delayed
> > extent in fiemap, seek_data/hole, and bigalloc. So at least DEALYED
> > extent must be kept in status tree. That is why in step 1 status tree
> > only tracks all DELAYED extents in the tree.
> So I was thinking about this some more and also testing some code and I
> realized that using extent status tree won't be enough (sadly). In case of
> AIO DIO with O_SYNC set, we have to perform extent conversion on *disk*
> before we can complete the AIO. So extent status tree won't help us in any
> way.
Ah, I see. Conversion must be done in disk before aio_complete() is
called if AIO DIO with O_SYNC set. So now we must call aio_complete()
after ext4_convert_unwritten_extents() in ext4_end_io().
>
> Furthermore O_SYNC writes end up calling ->fsync() after IO is finished
> which currently waits for all unwritten extents to convert and that
> effectively deadlocks the conversion thread if there are more DIO
> conversions pending. To fix this we would have to hack around
> ext4_file_fsync() to avoid waiting in case of O_SYNC AIO writes and that
> gets nasty quickly. So I'm currently back to my original plan of completing
> IO only after extent conversion happens... I'll see how that works out.
>
> Eventually we could *optimize* that by doing the extent conversion only in
> the extent status tree if possible but I wouldn't bother with it right now.
> For one thing, I also realized we would probably have to somehow throttle
> writers so that there are not too many outstanding conversions (when we
> complete AIO only after the conversion is finished, writer is naturally
> limited by the amount of AIOs allowed).
Sorry, currently no any idea is in my mind. I will think about it. :-(
Thanks,
- Zheng
next prev parent reply other threads:[~2013-01-04 4:12 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-24 7:55 [RFC][PATCH 0/9 v1] ext4: extent status tree implementation (step2) Zheng Liu
2012-12-24 7:55 ` [RFC][PATCH 1/9 v1] ext4: fixup metadata reserve block warning when bigalloc and delalloc are enabled Zheng Liu
2012-12-24 7:55 ` [RFC][PATCH 2/9 v1] ext4: refine extent status tree Zheng Liu
2012-12-24 7:55 ` [RFC][PATCH 3/9 v1] ext4: add physical block and status member into " Zheng Liu
2012-12-31 21:49 ` Jan Kara
2013-01-01 5:16 ` Zheng Liu
2013-01-02 11:22 ` Jan Kara
2013-01-05 2:44 ` Zheng Liu
2013-01-08 1:27 ` Dave Chinner
2013-01-08 2:25 ` Zheng Liu
2012-12-24 7:55 ` [RFC][PATCH 4/9 v1] ext4: adjust interfaces of " Zheng Liu
2012-12-24 7:55 ` [RFC][PATCH 5/9 v1] ext4: track all extent status in " Zheng Liu
2012-12-24 7:55 ` [RFC][PATCH 6/9 v1] ext4: lookup block mapping " Zheng Liu
2012-12-24 7:55 ` [RFC][PATCH 7/9 v1] ext4: add a new convert function to convert an unwritten extent " Zheng Liu
2012-12-24 7:55 ` [RFC][PATCH 8/9 v1] ext4: refine unwritten extent conversion Zheng Liu
2012-12-31 16:36 ` Jan Kara
2012-12-31 17:04 ` Jan Kara
2012-12-31 21:58 ` Jan Kara
2013-01-01 5:24 ` Zheng Liu
2013-01-03 10:56 ` Jan Kara
2013-01-04 4:26 ` Zheng Liu [this message]
2012-12-24 7:55 ` [RFC][PATCH 9/9 v1] ext4: set dioread_nolock by default for extent-based files Zheng Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130104042601.GB27833@gmail.com \
--to=gnehzuil.liu@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=wenqing.lz@taobao.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.