From: Luis Henriques <lhenriques@suse.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: Theodore Ts'o <tytso@mit.edu>,
fstests@vger.kernel.org, fio@vger.kernel.org
Subject: Re: generic/095 failing in ext4 and xfs
Date: Mon, 4 Oct 2021 13:17:02 +0100 [thread overview]
Message-ID: <YVrwvqOiI6CMpq2X@suse.de> (raw)
In-Reply-To: <YVrUX+qOJCDKp/Ng@suse.de>
On Mon, Oct 04, 2021 at 11:15:59AM +0100, Luis Henriques wrote:
> On Mon, Oct 04, 2021 at 11:08:29AM +0100, Luis Henriques wrote:
> > On Sat, Oct 02, 2021 at 08:59:57AM -0600, Jens Axboe wrote:
> > > On 10/2/21 4:16 AM, Luis Henriques wrote:
> > > > "Theodore Ts'o" <tytso@mit.edu> writes:
> > > >
> > > >> On Fri, Oct 01, 2021 at 02:46:09PM -0600, Jens Axboe wrote:
> > > >>>
> > > >>> Hmm, do older versions fail? I see Ted suggested that 3.27 doesn't, can
> > > >>> you give that a go? If that does work, would be great if you could try
> > > >>> and bisect it.
> > > >>
> > > >> I just tried fio 3.28, and it worked for me. So I don't think it's
> > > >> fio.
> > > >
> > > > Awesome, thank you both for checking it out. So, it's definitely
> > > > something in my test environment.
> > > >
> > > >> Luis, could it be related to a kernel config option?
> > > >
> > > > Yeah, it could be. I've tested this on a rolling release (openSUSE TW),
> > > > so it's definitely quite different from Debian 10. It may take me a bit
> > > > to figure out what's going on, but I'll start with this kernel config and
> > > > report back any finding.
> > > >
> > > > Again, thank you both for confirming it's working on your side.
> > >
> > > Do you have a core file from fio? Would be interesting to get a
> > > backtrace from it.
> >
> > Ok, not a lot of progress from my end yet, but here's some info gathered
> > with gdb from the core file:
> >
> > #0 0x000056505966b361 in io_completed (td=0x7f2b0c5437a0, io_u_ptr=0x7ffec2403e48, icd=0x7ffec2403e60) at /usr/src/debug/fio-3.28-1.1.x86_64/io_u.c:2012
> > #1 0x000056505966b922 in ios_completed (icd=0x7ffec2403e60, td=0x7f2b0c5437a0) at /usr/src/debug/fio-3.28-1.1.x86_64/io_u.c:2086
> > #2 io_u_queued_complete (td=0x7f2b0c5437a0, min_evts=<optimized out>) at /usr/src/debug/fio-3.28-1.1.x86_64/io_u.c:2145
> > #3 0x0000565059680e88 in do_io (td=0x7f2b0c5437a0, bytes_done=0x7ffec2404070) at /usr/src/debug/fio-3.28-1.1.x86_64/backend.c:1176
> > #4 0x000056505968a8ee in thread_main (data=data@entry=0x56505ae43510) at /usr/src/debug/fio-3.28-1.1.x86_64/backend.c:1870
> > #5 0x000056505968ca48 in run_threads (sk_out=0x0) at /usr/src/debug/fio-3.28-1.1.x86_64/backend.c:2460
> > #6 0x000056505968cb55 in fio_backend (sk_out=0x0) at /usr/src/debug/fio-3.28-1.1.x86_64/backend.c:2597
> > #7 fio_backend (sk_out=0x0) at /usr/src/debug/fio-3.28-1.1.x86_64/backend.c:2558
> > #8 0x000056505962fd97 in main (argc=4, argv=0x7ffec240c448, envp=<optimized out>) at /usr/src/debug/fio-3.28-1.1.x86_64/fio.c:60
> >
> > And here's the io_completed() code where the crash occurs:
> >
> > 2007 if (io_u->resid) {
> > 2008 io_u->xfer_buflen = io_u->resid;
> > 2009 io_u->xfer_buf += bytes;
> > 2010 io_u->offset += bytes;
> > 2011 td->ts.short_io_u[io_u->ddir]++;
> > 2012 if (io_u->offset < io_u->file->real_file_size) {
> > 2013 requeue_io_u(td, io_u_ptr);
> > 2014 return;
> > 2015 }
> > 2016 }
>
> I forgot to include the kernel log. The page cache error seems relevant,
> and, as I said before, I'm seeing it both on ext4 and xfs:
>
> [ 38.014790] fio[762]: segfault at 30 ip 000056505966b361 sp 00007ffec2403df0 error 4 in fio[56505962e000+84000]
> [ 38.016320] Code: c1 48 85 c0 74 2e 48 89 45 68 48 8b 45 40 48 63 55 2c 4c 01 4d 60 4c 01 c8 48 89 45 40 49 83 84 d4 70 5d 02 00 01 48 8b 55 20 <48> 3b 42 30 0f 82 75 026
> [ 38.016839] Page cache invalidation failure on direct I/O. Possible data corruption due to collision with buffered I/O!
> [ 38.019520] fio[760]: segfault at 30 ip 000056505966b361 sp 00007ffec2403df0 error 4 in fio[56505962e000+84000]
> [ 38.020543] File: /mnt/scratch/file1 PID: 754 Comm: fio
> [ 38.022056] Code: c1 48 85 c0 74 2e 48 89 45 68 48 8b 45 40 48 63 55 2c 4c 01 4d 60 4c 01 c8 48 89 45 40 49 83 84 d4 70 5d 02 00 01 48 8b 55 20 <48> 3b 42 30 0f 82 75 026
> [ 38.052142] fio[761]: segfault at 30 ip 000056505966b361 sp 00007ffec2403df0 error 4 in fio[56505962e000+84000]
> [ 38.053545] Code: c1 48 85 c0 74 2e 48 89 45 68 48 8b 45 40 48 63 55 2c 4c 01 4d 60 4c 01 c8 48 89 45 40 49 83 84 d4 70 5d 02 00 01 48 8b 55 20 <48> 3b 42 30 0f 82 75 026
> [ 38.058111] fio[759]: segfault at 30 ip 000056505966b361 sp 00007ffec2403df0 error 4 in fio[56505962e000+84000]
> [ 38.059511] Code: c1 48 85 c0 74 2e 48 89 45 68 48 8b 45 40 48 63 55 2c 4c 01 4d 60 4c 01 c8 48 89 45 40 49 83 84 d4 70 5d 02 00 01 48 8b 55 20 <48> 3b 42 30 0f 82 75 026
> [ 38.065638] fio[758]: segfault at 30 ip 000056505966b361 sp 00007ffec2403df0 error 4 in fio[56505962e000+84000]
> [ 38.067055] Code: c1 48 85 c0 74 2e 48 89 45 68 48 8b 45 40 48 63 55 2c 4c 01 4d 60 4c 01 c8 48 89 45 40 49 83 84 d4 70 5d 02 00 01 48 8b 55 20 <48> 3b 42 30 0f 82 75 026
Ok, I may have narrowed it a bit more. The disks being used in my testing
were zram-based (I know, I should have mentioned it before :-/ ). If I use
file-based disks the test passes and I see no crashes in fio.
Cheers,
--
Luís
next prev parent reply other threads:[~2021-10-04 12:17 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-01 17:11 generic/095 failing in ext4 and xfs Luis Henriques
2021-10-01 20:07 ` Theodore Ts'o
2021-10-01 20:46 ` Jens Axboe
2021-10-01 21:59 ` Theodore Ts'o
2021-10-02 10:16 ` Luis Henriques
2021-10-02 14:59 ` Jens Axboe
2021-10-04 10:08 ` Luis Henriques
2021-10-04 10:15 ` Luis Henriques
2021-10-04 12:17 ` Luis Henriques [this message]
2021-10-04 16:18 ` Theodore Ts'o
2021-10-06 13:39 ` Luis Henriques
2021-10-11 10:27 ` [PATCH] fio: make sure io_u->file isn't NULL before using it Luís Henriques
2021-10-11 12:58 ` Jens Axboe
2021-10-11 13:44 ` Luís Henriques
2021-10-11 15:15 ` Jens Axboe
2021-10-11 15:45 ` Luís Henriques
2021-10-10 8:31 ` generic/095 failing in ext4 and xfs Zorro Lang
2021-10-11 9:09 ` Luís Henriques
2021-10-11 9:31 ` Ming Lei
2021-10-11 10:16 ` Luís Henriques
2021-10-11 11:13 ` Ming Lei
2021-10-11 13:41 ` Luís Henriques
2021-10-11 12:44 ` Theodore Ts'o
2021-10-11 13:41 ` Luís Henriques
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YVrwvqOiI6CMpq2X@suse.de \
--to=lhenriques@suse.de \
--cc=axboe@kernel.dk \
--cc=fio@vger.kernel.org \
--cc=fstests@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.