All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Henriques <lhenriques@suse.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: Theodore Ts'o <tytso@mit.edu>,
	fstests@vger.kernel.org, fio@vger.kernel.org
Subject: Re: generic/095 failing in ext4 and xfs
Date: Mon, 4 Oct 2021 13:17:02 +0100	[thread overview]
Message-ID: <YVrwvqOiI6CMpq2X@suse.de> (raw)
In-Reply-To: <YVrUX+qOJCDKp/Ng@suse.de>

On Mon, Oct 04, 2021 at 11:15:59AM +0100, Luis Henriques wrote:
> On Mon, Oct 04, 2021 at 11:08:29AM +0100, Luis Henriques wrote:
> > On Sat, Oct 02, 2021 at 08:59:57AM -0600, Jens Axboe wrote:
> > > On 10/2/21 4:16 AM, Luis Henriques wrote:
> > > > "Theodore Ts'o" <tytso@mit.edu> writes:
> > > > 
> > > >> On Fri, Oct 01, 2021 at 02:46:09PM -0600, Jens Axboe wrote:
> > > >>>
> > > >>> Hmm, do older versions fail? I see Ted suggested that 3.27 doesn't, can
> > > >>> you give that a go? If that does work, would be great if you could try
> > > >>> and bisect it.
> > > >>
> > > >> I just tried fio 3.28, and it worked for me.  So I don't think it's
> > > >> fio.
> > > > 
> > > > Awesome, thank you both for checking it out.  So, it's definitely
> > > > something in my test environment.
> > > > 
> > > >> Luis, could it be related to a  kernel config option?
> > > > 
> > > > Yeah, it could be.  I've tested this on a rolling release (openSUSE TW),
> > > > so it's definitely quite different from Debian 10.  It may take me a bit
> > > > to figure out what's going on, but I'll start with this kernel config and
> > > > report back any finding.
> > > > 
> > > > Again, thank you both for confirming it's working on your side.
> > > 
> > > Do you have a core file from fio? Would be interesting to get a
> > > backtrace from it.
> > 
> > Ok, not a lot of progress from my end yet, but here's some info gathered
> > with gdb from the core file:
> > 
> > #0  0x000056505966b361 in io_completed (td=0x7f2b0c5437a0, io_u_ptr=0x7ffec2403e48, icd=0x7ffec2403e60) at /usr/src/debug/fio-3.28-1.1.x86_64/io_u.c:2012
> > #1  0x000056505966b922 in ios_completed (icd=0x7ffec2403e60, td=0x7f2b0c5437a0) at /usr/src/debug/fio-3.28-1.1.x86_64/io_u.c:2086
> > #2  io_u_queued_complete (td=0x7f2b0c5437a0, min_evts=<optimized out>) at /usr/src/debug/fio-3.28-1.1.x86_64/io_u.c:2145
> > #3  0x0000565059680e88 in do_io (td=0x7f2b0c5437a0, bytes_done=0x7ffec2404070) at /usr/src/debug/fio-3.28-1.1.x86_64/backend.c:1176
> > #4  0x000056505968a8ee in thread_main (data=data@entry=0x56505ae43510) at /usr/src/debug/fio-3.28-1.1.x86_64/backend.c:1870
> > #5  0x000056505968ca48 in run_threads (sk_out=0x0) at /usr/src/debug/fio-3.28-1.1.x86_64/backend.c:2460
> > #6  0x000056505968cb55 in fio_backend (sk_out=0x0) at /usr/src/debug/fio-3.28-1.1.x86_64/backend.c:2597
> > #7  fio_backend (sk_out=0x0) at /usr/src/debug/fio-3.28-1.1.x86_64/backend.c:2558
> > #8  0x000056505962fd97 in main (argc=4, argv=0x7ffec240c448, envp=<optimized out>) at /usr/src/debug/fio-3.28-1.1.x86_64/fio.c:60
> > 
> > And here's the io_completed() code where the crash occurs:
> > 
> >    2007                 if (io_u->resid) {
> >    2008                         io_u->xfer_buflen = io_u->resid;
> >    2009                         io_u->xfer_buf += bytes;
> >    2010                         io_u->offset += bytes;
> >    2011                         td->ts.short_io_u[io_u->ddir]++;
> >    2012                         if (io_u->offset < io_u->file->real_file_size) {
> >    2013                                 requeue_io_u(td, io_u_ptr);
> >    2014                                 return;
> >    2015                         }
> >    2016                 }
> 
> I forgot to include the kernel log.  The page cache error seems relevant,
> and, as I said before, I'm seeing it both on ext4 and xfs:
> 
> [   38.014790] fio[762]: segfault at 30 ip 000056505966b361 sp 00007ffec2403df0 error 4 in fio[56505962e000+84000]
> [   38.016320] Code: c1 48 85 c0 74 2e 48 89 45 68 48 8b 45 40 48 63 55 2c 4c 01 4d 60 4c 01 c8 48 89 45 40 49 83 84 d4 70 5d 02 00 01 48 8b 55 20 <48> 3b 42 30 0f 82 75 026
> [   38.016839] Page cache invalidation failure on direct I/O.  Possible data corruption due to collision with buffered I/O!
> [   38.019520] fio[760]: segfault at 30 ip 000056505966b361 sp 00007ffec2403df0 error 4 in fio[56505962e000+84000]
> [   38.020543] File: /mnt/scratch/file1 PID: 754 Comm: fio
> [   38.022056] Code: c1 48 85 c0 74 2e 48 89 45 68 48 8b 45 40 48 63 55 2c 4c 01 4d 60 4c 01 c8 48 89 45 40 49 83 84 d4 70 5d 02 00 01 48 8b 55 20 <48> 3b 42 30 0f 82 75 026
> [   38.052142] fio[761]: segfault at 30 ip 000056505966b361 sp 00007ffec2403df0 error 4 in fio[56505962e000+84000]
> [   38.053545] Code: c1 48 85 c0 74 2e 48 89 45 68 48 8b 45 40 48 63 55 2c 4c 01 4d 60 4c 01 c8 48 89 45 40 49 83 84 d4 70 5d 02 00 01 48 8b 55 20 <48> 3b 42 30 0f 82 75 026
> [   38.058111] fio[759]: segfault at 30 ip 000056505966b361 sp 00007ffec2403df0 error 4 in fio[56505962e000+84000]
> [   38.059511] Code: c1 48 85 c0 74 2e 48 89 45 68 48 8b 45 40 48 63 55 2c 4c 01 4d 60 4c 01 c8 48 89 45 40 49 83 84 d4 70 5d 02 00 01 48 8b 55 20 <48> 3b 42 30 0f 82 75 026
> [   38.065638] fio[758]: segfault at 30 ip 000056505966b361 sp 00007ffec2403df0 error 4 in fio[56505962e000+84000]
> [   38.067055] Code: c1 48 85 c0 74 2e 48 89 45 68 48 8b 45 40 48 63 55 2c 4c 01 4d 60 4c 01 c8 48 89 45 40 49 83 84 d4 70 5d 02 00 01 48 8b 55 20 <48> 3b 42 30 0f 82 75 026

Ok, I may have narrowed it a bit more.  The disks being used in my testing
were zram-based (I know, I should have mentioned it before :-/ ).  If I use
file-based disks the test passes and I see no crashes in fio.

Cheers,
--
Luís

  reply	other threads:[~2021-10-04 12:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-01 17:11 generic/095 failing in ext4 and xfs Luis Henriques
2021-10-01 20:07 ` Theodore Ts'o
2021-10-01 20:46 ` Jens Axboe
2021-10-01 21:59   ` Theodore Ts'o
2021-10-02 10:16     ` Luis Henriques
2021-10-02 14:59       ` Jens Axboe
2021-10-04 10:08         ` Luis Henriques
2021-10-04 10:15           ` Luis Henriques
2021-10-04 12:17             ` Luis Henriques [this message]
2021-10-04 16:18               ` Theodore Ts'o
2021-10-06 13:39                 ` Luis Henriques
2021-10-11 10:27         ` [PATCH] fio: make sure io_u->file isn't NULL before using it Luís Henriques
2021-10-11 12:58           ` Jens Axboe
2021-10-11 13:44             ` Luís Henriques
2021-10-11 15:15               ` Jens Axboe
2021-10-11 15:45                 ` Luís Henriques
2021-10-10  8:31 ` generic/095 failing in ext4 and xfs Zorro Lang
2021-10-11  9:09   ` Luís Henriques
2021-10-11  9:31     ` Ming Lei
2021-10-11 10:16       ` Luís Henriques
2021-10-11 11:13         ` Ming Lei
2021-10-11 13:41           ` Luís Henriques
2021-10-11 12:44         ` Theodore Ts'o
2021-10-11 13:41           ` Luís Henriques

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YVrwvqOiI6CMpq2X@suse.de \
    --to=lhenriques@suse.de \
    --cc=axboe@kernel.dk \
    --cc=fio@vger.kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.