linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220
       [not found]                 ` <CA+55aFxv4jVzoa_NpYOq=+TVpBvbSmJq++dPW5nmWNdxUEc-8Q@mail.gmail.com>
@ 2017-11-13 21:56                   ` Darrick J. Wong
  2017-11-13 22:01                     ` Linus Torvalds
  0 siblings, 1 reply; 3+ messages in thread
From: Darrick J. Wong @ 2017-11-13 21:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Eric Biggers, Eryu Guan, Fengguang Wu, Lukas Czerner,
	Jan Kara, Jeff Moyer, Linux Kernel Mailing List, linux-fsdevel,
	xfs

[cc xfs list]

[We're discussing the WARN_ON in *dio_complete when invalidation fails]

[yes, again]

On Mon, Nov 13, 2017 at 11:21:50AM -0800, Linus Torvalds wrote:
> On Mon, Nov 13, 2017 at 11:16 AM, Jens Axboe <axboe@kernel.dk> wrote:
> >
> > I would tend to agree with you, it's annoying to dump a full stack trace
> > for an expected (even for a rare situation) condition. But it's not the
> > first one, there's also one in XFS that always triggers for test runs. I
> > complained about that one in the past.
> 
> Yeah, we should always consider a WARN_ON() that is triggerable from
> user space to be a kernel bug.
> 
> If it's a "cannot happen", then the bug should be fixed.
> 
> If it's a "can happen, but I want to see the trace", then you just got
> the trace and you're done, and the WARN_ON() should be removed.
> 
> It could possibly be replaced with a "pr_warn()" or something, so that
> it still shows up as a "the user did something dodgy", but honestly,
> even that is questionable. We do that for things like "we just removed
> support for this, we want to see if somebody is using it"
> 
> So in no case is "let's just keep things as is" the right answer.

Wellll... the WARN_ON in question happens when:

a) two programs race to write to the same part of a file, one via the page
   cache and the other via directio
b) the dio write completes, tries to invalidate the page cache, and fails
   because the corresponding page cannot be invalidated

At this point, the page cache contents don't reflect what's on disk, so
I don't think we can quietly ignore the situation.  Clearly, enough
people dislike the WARN to complain repeatedly, so perhaps we should try
to barf evidence of this situation up to userspace?  The dio write
succeeded, which is why we don't turn err into ret; but now that we can
store and forward error codes through the mapping, how about we just:

   errseq_set(dio->inode->i_mapping->wb_error, -EIO);

and then let the writers pick up the EIO the next time they fsync?
Though I can already imagine the complaints about writes that used to
work and suddenly start returning error codes.

--D

>                Linus

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220
  2017-11-13 21:56                   ` [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220 Darrick J. Wong
@ 2017-11-13 22:01                     ` Linus Torvalds
  2017-11-14 17:17                       ` Theodore Ts'o
  0 siblings, 1 reply; 3+ messages in thread
From: Linus Torvalds @ 2017-11-13 22:01 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Eric Biggers, Eryu Guan, Fengguang Wu, Lukas Czerner,
	Jan Kara, Jeff Moyer, Linux Kernel Mailing List, linux-fsdevel,
	xfs

On Mon, Nov 13, 2017 at 1:56 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> Wellll... the WARN_ON in question happens when:
>
> a) two programs race to write to the same part of a file, one via the page
>    cache and the other via directio
> b) the dio write completes, tries to invalidate the page cache, and fails
>    because the corresponding page cannot be invalidated
>
> At this point, the page cache contents don't reflect what's on disk, so
> I don't think we can quietly ignore the situation.

Direct-IO causing cache coherency issues? Really? I'm shocked and
surptised that that could _possibly_ happen.

But why the hell would you want a back-trace for it?

IOW, if you want to warn about it, use "pr_warn_ratelimited()" or
something. What did the backtrace and "looks like a kernel oops"
really help with?

And in the end, maybe even the warning is pointless. You used
direct-IO and cached IO at the same time, and you got coherency
issues. What did you expect? directio is fundamentally broken.

                 Linus

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220
  2017-11-13 22:01                     ` Linus Torvalds
@ 2017-11-14 17:17                       ` Theodore Ts'o
  0 siblings, 0 replies; 3+ messages in thread
From: Theodore Ts'o @ 2017-11-14 17:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Darrick J. Wong, Jens Axboe, Eric Biggers, Eryu Guan,
	Fengguang Wu, Lukas Czerner, Jan Kara, Jeff Moyer,
	Linux Kernel Mailing List, linux-fsdevel, xfs

On Mon, Nov 13, 2017 at 02:01:36PM -0800, Linus Torvalds wrote:
> 
> And in the end, maybe even the warning is pointless. You used
> direct-IO and cached IO at the same time, and you got coherency
> issues. What did you expect? directio is fundamentally broken.

A single warning per inode, "page cache coherency broken due to direct
I/O; userspace did a dumb thing; oh, well" is probably more than
sufficient.  The reason to have the warning is so that when the user
complains about the file system bug, we can point at the "userspace
did a dumb thing" warning message.

						- Ted

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-11-14 17:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CA+55aFxSJGeN=2X-uX-on1Uq2Nb8+v1aiMDz5H1+tKW_N5Q+6g@mail.gmail.com>
     [not found] ` <20171029225155.qcum5i75awrt5tzm@wfg-t540p.sh.intel.com>
     [not found]   ` <20171030072021.gcgpaolo5m3myuln@wfg-t540p.sh.intel.com>
     [not found]     ` <20171030074429.GC3283@eguan.usersys.redhat.com>
     [not found]       ` <20171031001041.5qjzn5pjertpdc3e@wfg-t540p.sh.intel.com>
     [not found]         ` <20171031065417.GD3283@eguan.usersys.redhat.com>
     [not found]           ` <20171106011306.GB11631@zzz.localdomain>
     [not found]             ` <20171113191322.GA123488@gmail.com>
     [not found]               ` <ff823a95-3dff-0f2b-13ab-aab9bed7b0d3@kernel.dk>
     [not found]                 ` <CA+55aFxv4jVzoa_NpYOq=+TVpBvbSmJq++dPW5nmWNdxUEc-8Q@mail.gmail.com>
2017-11-13 21:56                   ` [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220 Darrick J. Wong
2017-11-13 22:01                     ` Linus Torvalds
2017-11-14 17:17                       ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).