From: Jan Kara <jack@suse.cz>
To: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
Cc: Jan Kara <jack@suse.cz>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
"Li, Michael" <huayil@qti.qualcomm.com>
Subject: Re: ext4 out of order when use cfq scheduler
Date: Thu, 7 Jan 2016 13:19:07 +0100 [thread overview]
Message-ID: <20160107121907.GD8380@quack.suse.cz> (raw)
In-Reply-To: <20160107114736.GC8380@quack.suse.cz>
On Thu 07-01-16 12:47:36, Jan Kara wrote:
> On Thu 07-01-16 11:02:29, HUANG Weller (CM/ESW12-CN) wrote:
> > > -----Original Message-----
> > > From: Jan Kara [mailto:jack@suse.cz]
> > > Sent: Thursday, January 07, 2016 6:24 PM
> > > To: HUANG Weller (CM/ESW12-CN) <Weller.Huang@cn.bosch.com>
> > > Cc: Jan Kara <jack@suse.cz>; linux-ext4@vger.kernel.org
> > > Subject: Re: ext4 out of order when use cfq scheduler
> > >
> > > On Thu 07-01-16 06:43:00, HUANG Weller (CM/ESW12-CN) wrote:
> > > > > -----Original Message-----
> > > > > From: Jan Kara [mailto:jack@suse.cz]
> > > > > Sent: Wednesday, January 06, 2016 6:06 PM
> > > > > To: HUANG Weller (CM/ESW12-CN) <Weller.Huang@cn.bosch.com>
> > > > > Subject: Re: ext4 out of order when use cfq scheduler
> > > > >
> > > > > On Wed 06-01-16 02:39:15, HUANG Weller (CM/ESW12-CN) wrote:
> > > > > > > So you are running in 'ws' mode of your tool, am I right? Just
> > > > > > > looking into the sources you've sent me I've noticed that
> > > > > > > although you set O_SYNC in openflg when mode == MODE_WS, you do
> > > > > > > not use openflg at all. So file won't be synced at all. That
> > > > > > > would well explain why you see that not all file contents is
> > > > > > > written. So did you just send me a different version of the
> > > > > > > source or is your test program
> > > > > really buggy?
> > > > > > >
> > > > > >
> > > > > > Yes, it is a bug of the test code. So the test tool create files
> > > > > > without O_SYNC flag actually. But , even in this case, is the out
> > > > > > of order acceptable ? or is it normal ?
> > > > >
> > > > > Without fsync(2) or O_SYNC, it is perfectly possible that some files
> > > > > are written and others are not since nobody guarantees order of
> > > > > writeback of inodes. OTOH you shouldn't ever see uninitialized data
> > > > > in the inode (but so far it isn't clear to me whether you really see
> > > > > unitialized data or whether we really wrote zeros to those blocks -
> > > > > ext4 can sometimes decide to do so). Your traces and disk contents
> > > > > show that the problematic inode has extent of length 128 blocks
> > > > > starting at block
> > > > > 0x12c00 and then extent of lenght 1 block starting at block 0x1268e.
> > > > > What is the block size of the filesystem? Because inode size is only 0x40010.
> > > > >
> > > > > Some suggestions to try:
> > > > > 1) Print also length of a write request in addition to the starting
> > > > > block so that we can see how much actually got written
> > > >
> > > > Please see below failure analysis.
> > > >
> > > > > 2) Initialize the device to 0xff so that we can distinguish
> > > > > uninitialized blocks from zeroed-out blocks.
> > > >
> > > > Yes, i Initialize the device to 0xff this time.
> > > >
> > > > > 3) Report exactly for which 512-byte blocks checksum matches and for
> > > > > which it is wrong.
> > > > The wrong contents are old file contents which are created in previous
> > > > test round. It is caused by the "wrong" sequence inode data(in
> > > > journal) and the file contents. So the file contents are not updated.
> > >
> > > So this confuses me somewhat. You previously said that you always remove files
> > > after each test round and then new ones are created. Is it still the case? So the old
> > > file contents you speak about above is just some random contents that happened
> > > to be in disk blocks we freshly allocated to the file, am I right?
> >
> > Yes. You are right.
> > The "old file contents" means that the disk blocks which the contents is generated from last test round, and they are allocated to a new file in new test round.
> >
> >
> > >
> > > OK, so I was looking into the code and indeed, reality is correct and my mental
> > > model was wrong! ;) I thought that inode gets added to the list of inodes for which
> > > we need to wait for data IO completion during transaction commit during block
> > > allocation. And I was wrong. It used to happen in
> > > mpage_da_map_and_submit() until commit f3b59291a69d (ext4: remove calls to
> > > ext4_jbd2_file_inode() from delalloc write path) where it got removed. And that was
> > > wrong because although we submit data writes before dropping handle for
> > > allocating transaction and updating i_size, nobody guarantees that data IO is not
> > > delayed in the block layer until transaction commit.
> > > Which seems to happen in your case. I'll send a fix. Thanks for your report and
> > > persistence!
> > >
> >
> > Thanks a lot for your feedback :-)
> > Because I am not familiar with the detail of the ext4 internal code. I will try to understand your explanation which you describe above. And have a look on related funcations.
> > Could you send the fix in this mail ?
> > And whether the kernel 3.14 also have such issue, right ?
>
> The problem is in all kernels starting with 3.8. Attached is a patch which
> should fix the issue. Can you test whether it fixes the problem for you?
Oh, I have realized the patch is on top of current ext4 development tree
and it won't compile for current vanilla kernel because of
EXT4_GET_BLOCKS_ZERO check. Just remove that line when you get compilation
failure.
> + if (map->m_flags & EXT4_MAP_NEW &&
> + !(map->m_flags & EXT4_MAP_UNWRITTEN) &&
> + !(flags & EXT4_GET_BLOCKS_ZERO) &&
Just remove the above line and things should work for older kernels as
well.
> + ext4_should_order_data(inode)) {
> + ret = ext4_jbd2_file_inode(handle, inode);
> + if (ret)
> + return ret;
> + }
> }
> return retval;
> }
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2016-01-07 12:19 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-22 6:24 ext4 out of order when use cfq scheduler HUANG Weller (CM/EPF1-CN)
2015-12-22 15:00 ` Jan Kara
[not found] ` <c67f356b63d94d35ad010a6e987b68f0@SGPMBX1004.APAC.bosch.com>
2016-01-05 15:30 ` Jan Kara
2016-01-06 2:39 ` HUANG Weller (CM/ESW12-CN)
2016-01-06 19:17 ` Andreas Dilger
2016-01-07 6:51 ` HUANG Weller (CM/ESW12-CN)
[not found] ` <20160106100621.GA24046@quack.suse.cz>
[not found] ` <3ab48fa47e434455b101251730e69bd2@SGPMBX1004.APAC.bosch.com>
2016-01-07 10:24 ` Jan Kara
2016-01-07 11:02 ` HUANG Weller (CM/ESW12-CN)
2016-01-07 11:47 ` Jan Kara
2016-01-07 12:19 ` Jan Kara [this message]
2016-01-08 2:18 ` HUANG Weller (CM/ESW12-CN)
2016-01-08 0:46 ` HUANG Weller (CM/ESW12-CN)
2016-01-11 9:05 ` HUANG Weller (CM/ESW12-CN)
2016-01-11 10:21 ` Jan Kara
2016-03-13 4:27 ` Theodore Ts'o
2016-03-14 2:43 ` HUANG Weller (CM/ESW12-CN)
2016-03-14 7:39 ` Jan Kara
2016-03-14 14:36 ` Theodore Ts'o
2016-03-15 10:46 ` Jan Kara
2016-03-15 14:46 ` Jan Kara
2016-03-15 20:09 ` Jan Kara
2016-03-16 2:30 ` HUANG Weller (CM/ESW12-CN)
2016-03-18 9:20 ` Jan Kara
2016-06-22 11:55 ` FW: " HUANG Weller (CM/ESW12-CN)
2016-06-22 13:09 ` Jan Kara
2016-03-16 0:41 ` HUANG Weller (CM/ESW12-CN)
2016-03-24 10:16 ` HUANG Weller (CM/ESW12-CN)
2016-03-24 12:17 ` Jan Kara
2016-01-28 8:02 ` Xiong Zhou
2016-02-03 6:08 ` HUANG Weller (CM/ESW12-CN)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160107121907.GD8380@quack.suse.cz \
--to=jack@suse.cz \
--cc=Weller.Huang@cn.bosch.com \
--cc=huayil@qti.qualcomm.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).