linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhengyuan Liu <liuzhengyuang521@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: viro@zeniv.linux.org.uk, tytso@mit.edu,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	mysql@lists.mysql.com, linux-ext4@vger.kernel.org,
	刘云 <liuyun01@kylinos.cn>,
	"Zhengyuan Liu" <liuzhengyuan@kylinos.cn>
Subject: Re: Problem with direct IO
Date: Tue, 19 Oct 2021 11:39:38 +0800	[thread overview]
Message-ID: <CAOOPZo4HtGB5MYETpj_q++m+PvomNqasNdaPa65gp2hsQ5H67A@mail.gmail.com> (raw)
In-Reply-To: <20211018114349.b80a27af9bfa7f16162b0ec4@linux-foundation.org>

On Tue, Oct 19, 2021 at 2:43 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 18 Oct 2021 09:09:06 +0800 Zhengyuan Liu <liuzhengyuang521@gmail.com> wrote:
>
> > Ping.
> >
> > I think this problem is serious and someone may  also encounter it in
> > the future.
> >
> >
> > On Wed, Oct 13, 2021 at 9:46 AM Zhengyuan Liu
> > <liuzhengyuang521@gmail.com> wrote:
> > >
> > > Hi, all
> > >
> > > we are encounting following Mysql crash problem while importing tables :
> > >
> > >     2021-09-26T11:22:17.825250Z 0 [ERROR] [MY-013622] [InnoDB] [FATAL]
> > >     fsync() returned EIO, aborting.
> > >     2021-09-26T11:22:17.825315Z 0 [ERROR] [MY-013183] [InnoDB]
> > >     Assertion failure: ut0ut.cc:555 thread 281472996733168
> > >
> > > At the same time , we found dmesg had following message:
> > >
> > >     [ 4328.838972] Page cache invalidation failure on direct I/O.
> > >     Possible data corruption due to collision with buffered I/O!
> > >     [ 4328.850234] File: /data/mysql/data/sysbench/sbtest53.ibd PID:
> > >     625 Comm: kworker/42:1
> > >
> > > Firstly, we doubled Mysql has operating the file with direct IO and
> > > buffered IO interlaced, but after some checking we found it did only
> > > do direct IO using aio. The problem is exactly from direct-io
> > > interface (__generic_file_write_iter) itself.
> > >
> > > ssize_t __generic_file_write_iter()
> > > {
> > > ...
> > >         if (iocb->ki_flags & IOCB_DIRECT) {
> > >                 loff_t pos, endbyte;
> > >
> > >                 written = generic_file_direct_write(iocb, from);
> > >                 /*
> > >                  * If the write stopped short of completing, fall back to
> > >                  * buffered writes.  Some filesystems do this for writes to
> > >                  * holes, for example.  For DAX files, a buffered write will
> > >                  * not succeed (even if it did, DAX does not handle dirty
> > >                  * page-cache pages correctly).
> > >                  */
> > >                 if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
> > >                         goto out;
> > >
> > >                 status = generic_perform_write(file, from, pos = iocb->ki_pos);
> > > ...
> > > }
> > >
> > > From above code snippet we can see that direct io could fall back to
> > > buffered IO under certain conditions, so even Mysql only did direct IO
> > > it could interleave with buffered IO when fall back occurred. I have
> > > no idea why FS(ext3) failed the direct IO currently, but it is strange
> > > __generic_file_write_iter make direct IO fall back to buffered IO, it
> > > seems  breaking the semantics of direct IO.
>
> That makes sense.
>
> > > The reproduced  environment is:
> > > Platform:  Kunpeng 920 (arm64)
> > > Kernel: V5.15-rc
> > > PAGESIZE: 64K
> > > Mysql:  V8.0
> > > Innodb_page_size: default(16K)
>
> This is all fairly mature code, I think.  Do you know if earlier
> kernels were OK, and if so which versions?

we have tested v4.18 and v4.19 and the problem is still here,  the earlier
version such before v4.12 doesn't support Arm64 well  so we can't test.

I think this problem has something to do with page size,  if we change kernel
page size from 64K to 4k or just set Innodb_page_size to 64K then we cannot
reproduce this problem.  Typically we use 4k as kernel page size and FS block
size, if database use more than 4k as IO unit then it won't interleave for each
IO in kernel page cache as each one will occupy one or more page cache, that
means it is hard to trigger this problem on x84 or other platforms using 4k page
size.  But thing got changed when come to Arm64 64K page size, if database uses
a smaller IO unit, in our Mysql case that is 16K DIO, then two IO
could share one
page cache and if one falls back to buffered IO it can trigger the problem. For
example,  aio got two direct IO which share the same page cache to write , it
dispatched the first one to storage and begin process the second one before
the first one completed, if the second one fall back to buffered IO it will been
copy to page cache and mark the page as dirty, upon that the first one completed
it will check and invalidate it's page cache, if it is dirty then the
problem occured.

If my analysis isn't correct please point it out, thanks.

  reply	other threads:[~2021-10-19  3:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-13  1:46 Problem with direct IO Zhengyuan Liu
2021-10-18  1:09 ` Zhengyuan Liu
2021-10-18 18:43   ` Andrew Morton
2021-10-19  3:39     ` Zhengyuan Liu [this message]
2021-10-20 17:37 ` Jan Kara
2021-10-21  2:21   ` Zhengyuan Liu
2021-10-21  8:03     ` Jan Kara
2021-10-21 12:11       ` Zhengyuan Liu
2021-10-22  9:31         ` Jan Kara
2021-10-23  2:06           ` Zhengyuan Liu
2021-10-25 15:57             ` Jan Kara
2021-10-28 15:02               ` Zhengyuan Liu
     [not found]       ` <61712B10.2060408@huawei.com>
2021-10-21 12:20         ` Zhengyuan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOOPZo4HtGB5MYETpj_q++m+PvomNqasNdaPa65gp2hsQ5H67A@mail.gmail.com \
    --to=liuzhengyuang521@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liuyun01@kylinos.cn \
    --cc=liuzhengyuan@kylinos.cn \
    --cc=mysql@lists.mysql.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).