* Re: [PATCH AUTOSEL 4.14 25/35] iomap: sub-block dio needs to zeroout beyond EOF
[not found] ` <20181130101441.GA213156@sasha-vm>
@ 2018-11-30 20:35 ` Darrick J. Wong
[not found] ` <20181130215005.GP19305@dastard>
1 sibling, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2018-11-30 20:35 UTC (permalink / raw)
To: Sasha Levin
Cc: Greg KH, Dave Chinner, stable, linux-kernel, Dave Chinner,
linux-fsdevel, xfs
On Fri, Nov 30, 2018 at 05:14:41AM -0500, Sasha Levin wrote:
> On Fri, Nov 30, 2018 at 09:22:03AM +0100, Greg KH wrote:
> > On Fri, Nov 30, 2018 at 09:40:19AM +1100, Dave Chinner wrote:
> > > I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops
> > > aggregate) to focus on testing the copy_file_range() changes, but
> > > Darrick's tests are still ongoing and have passed 40 billion ops in
> > > aggregate over the past few days.
> > >
> > > The reason we are running these so long is that we've seen fsx data
> > > corruption failures after 12+ hours of runtime and hundreds of
> > > millions of ops. Hence the testing for backported fixes will need to
> > > replicate these test runs across multiple configurations for
> > > multiple days before we have any confidence that we've actually
> > > fixed the data corruptions and not introduced any new ones.
> > >
> > > If you pull only a small subset of the fixes, the fsx will still
> > > fail and we have no real way of actually verifying that there have
> > > been no regression introduced by the backport. IOWs, there's a
> > > /massive/ amount of QA needed for ensuring that these backports work
> > > correctly.
> > >
> > > Right now the XFS developers don't have the time or resources
> > > available to validate stable backports are correct and regression
> > > fre because we are focussed on ensuring the upstream fixes we've
> > > already made (and are still writing) are solid and reliable.
I feel the need to contribute my own interpretation of what's been going
on the last four months:
What you're seeing is not the usual level of reluctance to backport
fixes to LTS kernels, it's our own frustrations at the kernel
community's systemic inability to QA new fs features properly.
Four months ago (prior to 4.19) Zorro started digging into periodic test
failures with shared/010, which resulted in some fixes to the btrfs
dedupe and clone range ioctl implementations. He then saw the same
failures on XFS.
Dave and I stared at the btrfs patches for a while, then started looking
at the xfs counterparts, and realized that nobody had ever added those
commands to the fstests stressor programs, nor had anyone ever encoded
into a test the side effects of a file remap (mtime update, removal of
suid). Nor were there any tests to ensure that these ioctls couldn't be
abused to violate system security and stability constraints.
That's why I refactored a whole ton of vfs file remap code for 4.20, and
(with the help of Dave and Brian and others) worked on fixing all the
problems where fsx and fsstress demonstrate file corruption problems.
Then we started asking the same questions of the copy_file_range system
call, and discovered that yes, we have all of the same problems. We
also discovered several failure cases that aren't mentioned in any
documentation, which has complicated the generation of automatable
tests. Worse yet, the stressor programs fell over even sooner with the
fallback splice implementation.
TLDR: New features show up in the vfs without a lot of design
documentation, incomplete userspace interface manuals, and not much
beyond trivial testing.
So the problem I'm facing here is that the XFS team are singlehandedly
trying to pay off years of accumulated technical debt in the vfs. We
definitely had a role in adding to that debt, so we're fixing it.
Dave is now refactoring the copy_file_range backend to implement all the
necessary security and stability checks, and I'm still QAing all the
stuff we've added to 4.20.
We're not finished, where "finished" means that we can get /one/ kernel
tree to go ~100 billion fsxops without burping up failures, and we've
written fstests to check that said kernel can handle correctly all the
weird side cases.
Until all those fstests go upstream, I don't want to spread out into
backporting and testing LTS kernels, even with test automation. By the
time we're done with all our upstream work you ought to be able to
autosel backport the whole mess into the LTS kernels /and/ fstests will
be able to tell you if the autosel has succeeded without causing any
obvious regressions.
> > Ok, that's fine, so users of XFS should wait until the 4.20 release
> > before relying on it? :)
At the rate we're going, we're not going to finish until 4.21, but yes,
let's wait until 4.20 is closer to release to start in on porting all of
its fixes to 4.14/4.19.
> It's getting to the point that with the amount of known issues with XFS
> on LTS kernels it makes sense to mark it as CONFIG_BROKEN.
These aren't all issues specific to XFS; some plague every fs in subtle
weird ways that only show up with extreme testing. We need the extreme
testing to flush out as many bugs as we can before enabling the feature
by default. XFS reflink is not enabled by default and due to all this
is not likely to get it any time soon.
(That copy_file_range syscall should have been rigorously tested before
it was turned on in the kernel...)
> > I understand your reluctance to want to backport anything, but it really
> > feels like you are not even allowing for fixes that are "obviously
> > right" to be backported either, even after they pass testing. Which
> > isn't ok for your users.
>
> Do the XFS maintainers expect users to always use the latest upstream
> kernel?
For features that are EXPERIMENTAL or aren't enabled by default, yes,
they should be.
--D
>
> --
> Thanks,
> Sasha
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS patches for stable
[not found] ` <20181201074909.GC213156@sasha-vm>
@ 2018-12-01 9:09 ` Amir Goldstein
2018-12-02 15:25 ` Sasha Levin
0 siblings, 1 reply; 7+ messages in thread
From: Amir Goldstein @ 2018-12-01 9:09 UTC (permalink / raw)
To: sashal
Cc: Dave Chinner, Greg KH, stable, linux-kernel, Dave Chinner,
Darrick J. Wong, linux-fsdevel, linux-xfs, Luis R. Chamberlain
> >> It's getting to the point that with the amount of known issues with XFS
> >> on LTS kernels it makes sense to mark it as CONFIG_BROKEN.
> >
> >Really? Where are the bug reports?
>
> In 'git log'! You report these every time you fix something in upstream
> xfs but don't backport it to stable trees:
>
> $ git log --oneline v4.18-rc1..v4.18 fs/xfs
> d4a34e165557 xfs: properly handle free inodes in extent hint validators
> 9991274fddb9 xfs: Initialize variables in xfs_alloc_get_rec before using them
> d8cb5e423789 xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation
> e53c4b598372 xfs: ensure post-EOF zeroing happens after zeroing part of a file
> a3a374bf1889 xfs: fix off-by-one error in xfs_rtalloc_query_range
> 232d0a24b0fc xfs: fix uninitialized field in rtbitmap fsmap backend
> 5bd88d153998 xfs: recheck reflink state after grabbing ILOCK_SHARED for a write
> f62cb48e4319 xfs: don't allow insert-range to shift extents past the maximum offset
> aafe12cee0b1 xfs: don't trip over negative free space in xfs_reserve_blocks
> 10ee25268e1f xfs: allow empty transactions while frozen
> e53946dbd31a xfs: xfs_iflush_abort() can be called twice on cluster writeback failure
> 23fcb3340d03 xfs: More robust inode extent count validation
> e2ac836307e3 xfs: simplify xfs_bmap_punch_delalloc_range
>
> Since I'm assuming that at least some of them are based on actual issues
> users hit, and some of those apply to stable kernels, why would users
> want to use an XFS version which is knowingly buggy?
>
Sasha,
There is one more point to consider.
Until v4.16, reflink and rmapbt features were experimental:
76883f7988e6 xfs: remove experimental tag for reverse mapping
1e369b0e199b xfs: remove experimental tag for reflinks
And MANY of the bug fixes flowing in through XFS tree to master
are related to those new XFS features and also to vfs functionality
that depends on them (e.g. clone/dedupe), so there MAY be no
bug reports at all for XFS in stable trees.
IMO users should NOT be expecting XFS to be stable with those
features enabled (they are still disabled by default)
when running on stable kernels below v4.16.
Allow me to act as a self-appointed mediator here and say:
There is obviously some bad blood between xfs developers and stable
tree maintainers.
The conflicts are caused by long standing frustration on both sides.
We would all be better off with looking forward on how to improve the
situation instead dwelling on past mistakes.
This issue was on the agenda at the XFS team meeting on last LSF/MM.
The path towards compliance has been laid out by xfs maintainers.
Luis, Sasha and myself have been working to improve the filesystem
test coverage for stable tree candidate patches.
We have still some way to go.
The stable candidate patches that triggered the recent flames
was outside of the fs/xfs subsystem, which AUTOSEL already know
to stay away from, so nobody had any intention to stir things up.
At the end of the day, most xfs developers work for companies that
ship enterprise distros and need to maintain stable trees, so I would
hope that it is in the best interest of everyone involved to cooperate
on the goal of better stable-xfs ecosystem.
On my part, I would be happy if AUTOSEL could point me at
candidate patch *series* for review instead of single patches.
For that matter, it sure wouldn't hurt if an xfs developer sending
out a patch series would cc:stable on the cover letter and if a developer
would be kind enough to add some backporting hints to the cover letter
text that would be very helpful indeed.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS patches for stable
2018-12-01 9:09 ` XFS patches for stable Amir Goldstein
@ 2018-12-02 15:25 ` Sasha Levin
2018-12-02 16:10 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Sasha Levin @ 2018-12-02 15:25 UTC (permalink / raw)
To: Amir Goldstein
Cc: Dave Chinner, Greg KH, stable, linux-kernel, Dave Chinner,
Darrick J. Wong, linux-fsdevel, linux-xfs, Luis R. Chamberlain
On Sat, Dec 01, 2018 at 11:09:05AM +0200, Amir Goldstein wrote:
>> >> It's getting to the point that with the amount of known issues with XFS
>> >> on LTS kernels it makes sense to mark it as CONFIG_BROKEN.
>> >
>> >Really? Where are the bug reports?
>>
>> In 'git log'! You report these every time you fix something in upstream
>> xfs but don't backport it to stable trees:
>>
>> $ git log --oneline v4.18-rc1..v4.18 fs/xfs
>> d4a34e165557 xfs: properly handle free inodes in extent hint validators
>> 9991274fddb9 xfs: Initialize variables in xfs_alloc_get_rec before using them
>> d8cb5e423789 xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation
>> e53c4b598372 xfs: ensure post-EOF zeroing happens after zeroing part of a file
>> a3a374bf1889 xfs: fix off-by-one error in xfs_rtalloc_query_range
>> 232d0a24b0fc xfs: fix uninitialized field in rtbitmap fsmap backend
>> 5bd88d153998 xfs: recheck reflink state after grabbing ILOCK_SHARED for a write
>> f62cb48e4319 xfs: don't allow insert-range to shift extents past the maximum offset
>> aafe12cee0b1 xfs: don't trip over negative free space in xfs_reserve_blocks
>> 10ee25268e1f xfs: allow empty transactions while frozen
>> e53946dbd31a xfs: xfs_iflush_abort() can be called twice on cluster writeback failure
>> 23fcb3340d03 xfs: More robust inode extent count validation
>> e2ac836307e3 xfs: simplify xfs_bmap_punch_delalloc_range
>>
>> Since I'm assuming that at least some of them are based on actual issues
>> users hit, and some of those apply to stable kernels, why would users
>> want to use an XFS version which is knowingly buggy?
>>
>
>Sasha,
>
>There is one more point to consider.
>Until v4.16, reflink and rmapbt features were experimental:
>76883f7988e6 xfs: remove experimental tag for reverse mapping
>1e369b0e199b xfs: remove experimental tag for reflinks
>
>And MANY of the bug fixes flowing in through XFS tree to master
>are related to those new XFS features and also to vfs functionality
>that depends on them (e.g. clone/dedupe), so there MAY be no
>bug reports at all for XFS in stable trees.
>
>IMO users should NOT be expecting XFS to be stable with those
>features enabled (they are still disabled by default)
>when running on stable kernels below v4.16.
>
>Allow me to act as a self-appointed mediator here and say:
>There is obviously some bad blood between xfs developers and stable
>tree maintainers.
>The conflicts are caused by long standing frustration on both sides.
>We would all be better off with looking forward on how to improve the
>situation instead dwelling on past mistakes.
>This issue was on the agenda at the XFS team meeting on last LSF/MM.
>The path towards compliance has been laid out by xfs maintainers.
>Luis, Sasha and myself have been working to improve the filesystem
>test coverage for stable tree candidate patches.
>We have still some way to go.
>
>The stable candidate patches that triggered the recent flames
>was outside of the fs/xfs subsystem, which AUTOSEL already know
>to stay away from, so nobody had any intention to stir things up.
>
>At the end of the day, most xfs developers work for companies that
>ship enterprise distros and need to maintain stable trees, so I would
>hope that it is in the best interest of everyone involved to cooperate
>on the goal of better stable-xfs ecosystem.
>
>On my part, I would be happy if AUTOSEL could point me at
>candidate patch *series* for review instead of single patches.
I'm afraid it's not smart enough to do that :(
I can grab an entire series if it selects a single patch in a series,
but from my experience it's usually the wrong thing to do.
>For that matter, it sure wouldn't hurt if an xfs developer sending
>out a patch series would cc:stable on the cover letter and if a developer
>would be kind enough to add some backporting hints to the cover letter
>text that would be very helpful indeed.
Given that we have folks (Luis, Amir, etc) working on it already, maybe
a step in the right direction would be having the XFS folks tag fixes
some other way ("#wants-a-backport"?) where this would give a hint that
this should be backported after sufficient testing?
We won't pick these commits to stable ourselves, but only after the XFS
maintainers are satisfied that the commit was sufficiently tested on LTS
trees?
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS patches for stable
2018-12-02 15:25 ` Sasha Levin
@ 2018-12-02 16:10 ` Christoph Hellwig
2018-12-02 20:08 ` Greg KH
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2018-12-02 16:10 UTC (permalink / raw)
To: Sasha Levin
Cc: Amir Goldstein, Dave Chinner, Greg KH, stable, linux-kernel,
Dave Chinner, Darrick J. Wong, linux-fsdevel, linux-xfs,
Luis R. Chamberlain
As someone who has done xfs stable backports for a while I really don't
think the autoselection is helpful at all. Someone who is vaguely
familiar with the code needs to manually select the commits and QA them,
which takes a fair amount of time, but just needs some manual help if it
should work ok.
I think we are about ready to have a new xfs stable maintainer lined up
if everything works well fortunately.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS patches for stable
2018-12-02 16:10 ` Christoph Hellwig
@ 2018-12-02 20:08 ` Greg KH
2018-12-03 14:41 ` Richard Weinberger
0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2018-12-02 20:08 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Sasha Levin, Amir Goldstein, Dave Chinner, stable, linux-kernel,
Dave Chinner, Darrick J. Wong, linux-fsdevel, linux-xfs,
Luis R. Chamberlain
On Sun, Dec 02, 2018 at 08:10:16AM -0800, Christoph Hellwig wrote:
> As someone who has done xfs stable backports for a while I really don't
> think the autoselection is helpful at all.
autoselection for xfs patches has been turned off for a while, what
triggered this email thread was a core vfs patch that was backported
that was not obvious it was created by the xfs developers due to a
problem they had found.
> Someone who is vaguely familiar with the code needs to manually select
> the commits and QA them, which takes a fair amount of time, but just
> needs some manual help if it should work ok.
>
> I think we are about ready to have a new xfs stable maintainer lined up
> if everything works well fortunately.
That would be wonderful news.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS patches for stable
2018-12-02 20:08 ` Greg KH
@ 2018-12-03 14:41 ` Richard Weinberger
2018-12-03 16:56 ` Sasha Levin
0 siblings, 1 reply; 7+ messages in thread
From: Richard Weinberger @ 2018-12-03 14:41 UTC (permalink / raw)
To: Greg KH
Cc: Christoph Hellwig, sashal, amir73il, Dave Chinner, stable, LKML,
dchinner, darrick.wong, linux-fsdevel, linux-xfs, mcgrof,
linux-mtd, boris.brezillon
On Sun, Dec 2, 2018 at 9:09 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Sun, Dec 02, 2018 at 08:10:16AM -0800, Christoph Hellwig wrote:
> > As someone who has done xfs stable backports for a while I really don't
> > think the autoselection is helpful at all.
>
> autoselection for xfs patches has been turned off for a while, what
> triggered this email thread was a core vfs patch that was backported
> that was not obvious it was created by the xfs developers due to a
> problem they had found.
Sorry for hijacking this thread.
Can you please also disable autoselection for MTD, UBI and UBIFS?
fs/ubifs/
drivers/mtd/
include/linux/mtd/
include/uapi/mtd/
--
Thanks,
//richard
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: XFS patches for stable
2018-12-03 14:41 ` Richard Weinberger
@ 2018-12-03 16:56 ` Sasha Levin
0 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2018-12-03 16:56 UTC (permalink / raw)
To: Richard Weinberger
Cc: Greg KH, Christoph Hellwig, amir73il, Dave Chinner, stable, LKML,
dchinner, darrick.wong, linux-fsdevel, linux-xfs, mcgrof,
linux-mtd, boris.brezillon
On Mon, Dec 03, 2018 at 03:41:27PM +0100, Richard Weinberger wrote:
>On Sun, Dec 2, 2018 at 9:09 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>>
>> On Sun, Dec 02, 2018 at 08:10:16AM -0800, Christoph Hellwig wrote:
>> > As someone who has done xfs stable backports for a while I really don't
>> > think the autoselection is helpful at all.
>>
>> autoselection for xfs patches has been turned off for a while, what
>> triggered this email thread was a core vfs patch that was backported
>> that was not obvious it was created by the xfs developers due to a
>> problem they had found.
>
>Sorry for hijacking this thread.
>Can you please also disable autoselection for MTD, UBI and UBIFS?
>
>fs/ubifs/
>drivers/mtd/
>include/linux/mtd/
>include/uapi/mtd/
Sure, done!
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-12-03 16:56 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20181129060110.159878-1-sashal@kernel.org>
[not found] ` <20181129060110.159878-25-sashal@kernel.org>
[not found] ` <20181129121458.GK19305@dastard>
[not found] ` <20181129124756.GA25945@kroah.com>
[not found] ` <20181129224019.GM19305@dastard>
[not found] ` <20181130082203.GA26830@kroah.com>
[not found] ` <20181130101441.GA213156@sasha-vm>
2018-11-30 20:35 ` [PATCH AUTOSEL 4.14 25/35] iomap: sub-block dio needs to zeroout beyond EOF Darrick J. Wong
[not found] ` <20181130215005.GP19305@dastard>
[not found] ` <20181201074909.GC213156@sasha-vm>
2018-12-01 9:09 ` XFS patches for stable Amir Goldstein
2018-12-02 15:25 ` Sasha Levin
2018-12-02 16:10 ` Christoph Hellwig
2018-12-02 20:08 ` Greg KH
2018-12-03 14:41 ` Richard Weinberger
2018-12-03 16:56 ` Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox