linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
@ 2023-06-29 16:09 Daniel Dao
  2023-06-29 16:34 ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Dao @ 2023-06-29 16:09 UTC (permalink / raw)
  To: Dave Chinner, djwong; +Cc: kernel-team, linux-fsdevel

Hi Dave and Derrick,

We are tracking down some corruptions on xfs for our rocksdb workload,
running on kernel 6.1.25. The corruptions were
detected by rocksdb block checksum. The workload seems to share some
similarities
with the multi-threaded write workload described in
https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/

Can we backport the patch series to stable since it seemed to fix data
corruptions ?

Best,
Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
  2023-06-29 16:09 Backporting of series xfs/iomap: fix data corruption due to stale cached iomap Daniel Dao
@ 2023-06-29 16:34 ` Matthew Wilcox
  2023-06-29 18:14   ` Darrick J. Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2023-06-29 16:34 UTC (permalink / raw)
  To: Daniel Dao; +Cc: Dave Chinner, djwong, kernel-team, linux-fsdevel

On Thu, Jun 29, 2023 at 05:09:41PM +0100, Daniel Dao wrote:
> Hi Dave and Derrick,
> 
> We are tracking down some corruptions on xfs for our rocksdb workload,
> running on kernel 6.1.25. The corruptions were
> detected by rocksdb block checksum. The workload seems to share some
> similarities
> with the multi-threaded write workload described in
> https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/
> 
> Can we backport the patch series to stable since it seemed to fix data
> corruptions ?

For clarity, are you asking for permission or advice about doing this
yourself, or are you asking somebody else to do the backport for you?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
  2023-06-29 16:34 ` Matthew Wilcox
@ 2023-06-29 18:14   ` Darrick J. Wong
  2023-06-29 19:30     ` Ignat Korchagin
  0 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2023-06-29 18:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Daniel Dao, Dave Chinner, kernel-team, linux-fsdevel,
	Chandan Babu R, Amir Goldstein, Leah Rumancik

[add the xfs lts maintainers]

On Thu, Jun 29, 2023 at 05:34:00PM +0100, Matthew Wilcox wrote:
> On Thu, Jun 29, 2023 at 05:09:41PM +0100, Daniel Dao wrote:
> > Hi Dave and Derrick,
> > 
> > We are tracking down some corruptions on xfs for our rocksdb workload,
> > running on kernel 6.1.25. The corruptions were
> > detected by rocksdb block checksum. The workload seems to share some
> > similarities
> > with the multi-threaded write workload described in
> > https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/
> > 
> > Can we backport the patch series to stable since it seemed to fix data
> > corruptions ?
> 
> For clarity, are you asking for permission or advice about doing this
> yourself, or are you asking somebody else to do the backport for you?

Nobody's officially committed to backporting and testing patches for
6.1; are you (Cloudflare) volunteering?

--D

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
  2023-06-29 18:14   ` Darrick J. Wong
@ 2023-06-29 19:30     ` Ignat Korchagin
  2023-06-30 10:39       ` Amir Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: Ignat Korchagin @ 2023-06-29 19:30 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Matthew Wilcox, Daniel Dao, Dave Chinner, kernel-team,
	linux-fsdevel, Chandan Babu R, Amir Goldstein, Leah Rumancik

On Thu, Jun 29, 2023 at 7:14 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> [add the xfs lts maintainers]
>
> On Thu, Jun 29, 2023 at 05:34:00PM +0100, Matthew Wilcox wrote:
> > On Thu, Jun 29, 2023 at 05:09:41PM +0100, Daniel Dao wrote:
> > > Hi Dave and Derrick,
> > >
> > > We are tracking down some corruptions on xfs for our rocksdb workload,
> > > running on kernel 6.1.25. The corruptions were
> > > detected by rocksdb block checksum. The workload seems to share some
> > > similarities
> > > with the multi-threaded write workload described in
> > > https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/
> > >
> > > Can we backport the patch series to stable since it seemed to fix data
> > > corruptions ?
> >
> > For clarity, are you asking for permission or advice about doing this
> > yourself, or are you asking somebody else to do the backport for you?
>
> Nobody's officially committed to backporting and testing patches for
> 6.1; are you (Cloudflare) volunteering?

Yes, we have applied them on top of 6.1.36, will be gradually
releasing to our servers and will report back if we see the issues go
away

> --D

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
  2023-06-29 19:30     ` Ignat Korchagin
@ 2023-06-30 10:39       ` Amir Goldstein
  2023-06-30 12:30         ` Ignat Korchagin
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2023-06-30 10:39 UTC (permalink / raw)
  To: Ignat Korchagin
  Cc: Darrick J. Wong, Matthew Wilcox, Daniel Dao, Dave Chinner,
	kernel-team, linux-fsdevel, Chandan Babu R, Leah Rumancik,
	linux-xfs, Luis R. Rodriguez

On Thu, Jun 29, 2023 at 10:31 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
>
> On Thu, Jun 29, 2023 at 7:14 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > [add the xfs lts maintainers]
> >
> > On Thu, Jun 29, 2023 at 05:34:00PM +0100, Matthew Wilcox wrote:
> > > On Thu, Jun 29, 2023 at 05:09:41PM +0100, Daniel Dao wrote:
> > > > Hi Dave and Derrick,
> > > >
> > > > We are tracking down some corruptions on xfs for our rocksdb workload,
> > > > running on kernel 6.1.25. The corruptions were
> > > > detected by rocksdb block checksum. The workload seems to share some
> > > > similarities
> > > > with the multi-threaded write workload described in
> > > > https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/
> > > >
> > > > Can we backport the patch series to stable since it seemed to fix data
> > > > corruptions ?
> > >
> > > For clarity, are you asking for permission or advice about doing this
> > > yourself, or are you asking somebody else to do the backport for you?
> >
> > Nobody's officially committed to backporting and testing patches for
> > 6.1; are you (Cloudflare) volunteering?
>
> Yes, we have applied them on top of 6.1.36, will be gradually
> releasing to our servers and will report back if we see the issues go
> away
>

Getting feedback back from Cloudflare production servers is awesome
but it's not enough.

The standard for getting xfs LTS backports approved is:
1. Test the backports against regressions with several rounds of fstests
    check -g auto on selected xfs configurations [1]
2. Post the backport series to xfs list and get an ACK from upstream
    xfs maintainers

We have volunteers doing this work for 5.4.y, 5.10.y and 5.15.y.
We do not yet have a volunteer to do that work for 6.1.y.

The question is whether you (or your team) are volunteering to
do that work for 6.1.y xfs backports to help share the load?

If your employer is interested in running reliable and stable xfs
code with 6.1.y LTS, I recommend that you seriously consider
this option, because for the time being, it doesn't look like any
of us are able to perform this role.

For testing, you could establish your own baseline for 6.1.y or, you
could run kdevops and use the baseline already established by
other testers for the selected xfs configurations [1].

I can help you get up to speed with kdevops if you like.

Thanks,
Amir.

[1]  https://github.com/linux-kdevops/kdevops/tree/master/workflows/fstests/expunges/6.1.0-rc6/xfs/unassigned

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
  2023-06-30 10:39       ` Amir Goldstein
@ 2023-06-30 12:30         ` Ignat Korchagin
  2023-06-30 13:05           ` Amir Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: Ignat Korchagin @ 2023-06-30 12:30 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Darrick J. Wong, Matthew Wilcox, Daniel Dao, Dave Chinner,
	kernel-team, linux-fsdevel, Chandan Babu R, Leah Rumancik,
	linux-xfs, Luis R. Rodriguez

On Fri, Jun 30, 2023 at 11:39 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Jun 29, 2023 at 10:31 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
> >
> > On Thu, Jun 29, 2023 at 7:14 PM Darrick J. Wong <djwong@kernel.org> wrote:
> > >
> > > [add the xfs lts maintainers]
> > >
> > > On Thu, Jun 29, 2023 at 05:34:00PM +0100, Matthew Wilcox wrote:
> > > > On Thu, Jun 29, 2023 at 05:09:41PM +0100, Daniel Dao wrote:
> > > > > Hi Dave and Derrick,
> > > > >
> > > > > We are tracking down some corruptions on xfs for our rocksdb workload,
> > > > > running on kernel 6.1.25. The corruptions were
> > > > > detected by rocksdb block checksum. The workload seems to share some
> > > > > similarities
> > > > > with the multi-threaded write workload described in
> > > > > https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/
> > > > >
> > > > > Can we backport the patch series to stable since it seemed to fix data
> > > > > corruptions ?
> > > >
> > > > For clarity, are you asking for permission or advice about doing this
> > > > yourself, or are you asking somebody else to do the backport for you?
> > >
> > > Nobody's officially committed to backporting and testing patches for
> > > 6.1; are you (Cloudflare) volunteering?
> >
> > Yes, we have applied them on top of 6.1.36, will be gradually
> > releasing to our servers and will report back if we see the issues go
> > away
> >
>
> Getting feedback back from Cloudflare production servers is awesome
> but it's not enough.
>
> The standard for getting xfs LTS backports approved is:
> 1. Test the backports against regressions with several rounds of fstests
>     check -g auto on selected xfs configurations [1]
> 2. Post the backport series to xfs list and get an ACK from upstream
>     xfs maintainers
>
> We have volunteers doing this work for 5.4.y, 5.10.y and 5.15.y.
> We do not yet have a volunteer to do that work for 6.1.y.
>
> The question is whether you (or your team) are volunteering to
> do that work for 6.1.y xfs backports to help share the load?

We are not a big team and apart from other internal project work our
efforts are focused on fixing this issue in production, because it
affects many teams and workloads. If we confirm that these patches fix
the issue in production, we will definitely consider dedicating some
work to ensure they are officially backported. But if not - we would
be required to search for a fix first before we can commit to any
work.

So, IOW - can we come back to you a bit later on this after we get the
feedback from production?

> If your employer is interested in running reliable and stable xfs
> code with 6.1.y LTS, I recommend that you seriously consider
> this option, because for the time being, it doesn't look like any
> of us are able to perform this role.
>
> For testing, you could establish your own baseline for 6.1.y or, you
> could run kdevops and use the baseline already established by
> other testers for the selected xfs configurations [1].
>
> I can help you get up to speed with kdevops if you like.

This looks interesting (regardless of this project). We will explore
it and come back with questions, if any.

>
> Thanks,
> Amir.
>
> [1]  https://github.com/linux-kdevops/kdevops/tree/master/workflows/fstests/expunges/6.1.0-rc6/xfs/unassigned

Ignat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
  2023-06-30 12:30         ` Ignat Korchagin
@ 2023-06-30 13:05           ` Amir Goldstein
  2023-06-30 15:16             ` Darrick J. Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2023-06-30 13:05 UTC (permalink / raw)
  To: Ignat Korchagin
  Cc: Darrick J. Wong, Matthew Wilcox, Daniel Dao, Dave Chinner,
	kernel-team, linux-fsdevel, Chandan Babu R, Leah Rumancik,
	linux-xfs, Luis R. Rodriguez

On Fri, Jun 30, 2023 at 3:30 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
>
> On Fri, Jun 30, 2023 at 11:39 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Thu, Jun 29, 2023 at 10:31 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
> > >
> > > On Thu, Jun 29, 2023 at 7:14 PM Darrick J. Wong <djwong@kernel.org> wrote:
> > > >
> > > > [add the xfs lts maintainers]
> > > >
> > > > On Thu, Jun 29, 2023 at 05:34:00PM +0100, Matthew Wilcox wrote:
> > > > > On Thu, Jun 29, 2023 at 05:09:41PM +0100, Daniel Dao wrote:
> > > > > > Hi Dave and Derrick,
> > > > > >
> > > > > > We are tracking down some corruptions on xfs for our rocksdb workload,
> > > > > > running on kernel 6.1.25. The corruptions were
> > > > > > detected by rocksdb block checksum. The workload seems to share some
> > > > > > similarities
> > > > > > with the multi-threaded write workload described in
> > > > > > https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/
> > > > > >
> > > > > > Can we backport the patch series to stable since it seemed to fix data
> > > > > > corruptions ?
> > > > >
> > > > > For clarity, are you asking for permission or advice about doing this
> > > > > yourself, or are you asking somebody else to do the backport for you?
> > > >
> > > > Nobody's officially committed to backporting and testing patches for
> > > > 6.1; are you (Cloudflare) volunteering?
> > >
> > > Yes, we have applied them on top of 6.1.36, will be gradually
> > > releasing to our servers and will report back if we see the issues go
> > > away
> > >
> >
> > Getting feedback back from Cloudflare production servers is awesome
> > but it's not enough.
> >
> > The standard for getting xfs LTS backports approved is:
> > 1. Test the backports against regressions with several rounds of fstests
> >     check -g auto on selected xfs configurations [1]
> > 2. Post the backport series to xfs list and get an ACK from upstream
> >     xfs maintainers
> >
> > We have volunteers doing this work for 5.4.y, 5.10.y and 5.15.y.
> > We do not yet have a volunteer to do that work for 6.1.y.
> >
> > The question is whether you (or your team) are volunteering to
> > do that work for 6.1.y xfs backports to help share the load?
>
> We are not a big team and apart from other internal project work our
> efforts are focused on fixing this issue in production, because it
> affects many teams and workloads. If we confirm that these patches fix
> the issue in production, we will definitely consider dedicating some
> work to ensure they are officially backported. But if not - we would
> be required to search for a fix first before we can commit to any
> work.
>
> So, IOW - can we come back to you a bit later on this after we get the
> feedback from production?
>

Of course.
The volunteering question for 6.1.y is independent.

When you decide that you have a series of backports
that proves to fix a real bug in production,
a way to test the series will be worked out.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
  2023-06-30 13:05           ` Amir Goldstein
@ 2023-06-30 15:16             ` Darrick J. Wong
  2023-07-19 20:37               ` Ignat Korchagin
  0 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2023-06-30 15:16 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Ignat Korchagin, Matthew Wilcox, Daniel Dao, Dave Chinner,
	kernel-team, linux-fsdevel, Chandan Babu R, Leah Rumancik,
	linux-xfs, Luis R. Rodriguez

On Fri, Jun 30, 2023 at 04:05:36PM +0300, Amir Goldstein wrote:
> On Fri, Jun 30, 2023 at 3:30 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
> >
> > On Fri, Jun 30, 2023 at 11:39 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Thu, Jun 29, 2023 at 10:31 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
> > > >
> > > > On Thu, Jun 29, 2023 at 7:14 PM Darrick J. Wong <djwong@kernel.org> wrote:
> > > > >
> > > > > [add the xfs lts maintainers]
> > > > >
> > > > > On Thu, Jun 29, 2023 at 05:34:00PM +0100, Matthew Wilcox wrote:
> > > > > > On Thu, Jun 29, 2023 at 05:09:41PM +0100, Daniel Dao wrote:
> > > > > > > Hi Dave and Derrick,
> > > > > > >
> > > > > > > We are tracking down some corruptions on xfs for our rocksdb workload,
> > > > > > > running on kernel 6.1.25. The corruptions were
> > > > > > > detected by rocksdb block checksum. The workload seems to share some
> > > > > > > similarities
> > > > > > > with the multi-threaded write workload described in
> > > > > > > https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/
> > > > > > >
> > > > > > > Can we backport the patch series to stable since it seemed to fix data
> > > > > > > corruptions ?
> > > > > >
> > > > > > For clarity, are you asking for permission or advice about doing this
> > > > > > yourself, or are you asking somebody else to do the backport for you?
> > > > >
> > > > > Nobody's officially committed to backporting and testing patches for
> > > > > 6.1; are you (Cloudflare) volunteering?
> > > >
> > > > Yes, we have applied them on top of 6.1.36, will be gradually
> > > > releasing to our servers and will report back if we see the issues go
> > > > away
> > > >
> > >
> > > Getting feedback back from Cloudflare production servers is awesome
> > > but it's not enough.
> > >
> > > The standard for getting xfs LTS backports approved is:
> > > 1. Test the backports against regressions with several rounds of fstests
> > >     check -g auto on selected xfs configurations [1]
> > > 2. Post the backport series to xfs list and get an ACK from upstream
> > >     xfs maintainers
> > >
> > > We have volunteers doing this work for 5.4.y, 5.10.y and 5.15.y.
> > > We do not yet have a volunteer to do that work for 6.1.y.
> > >
> > > The question is whether you (or your team) are volunteering to
> > > do that work for 6.1.y xfs backports to help share the load?
> >
> > We are not a big team and apart from other internal project work our
> > efforts are focused on fixing this issue in production, because it
> > affects many teams and workloads. If we confirm that these patches fix
> > the issue in production, we will definitely consider dedicating some
> > work to ensure they are officially backported. But if not - we would
> > be required to search for a fix first before we can commit to any
> > work.
> >
> > So, IOW - can we come back to you a bit later on this after we get the
> > feedback from production?
> >
> 
> Of course.
> The volunteering question for 6.1.y is independent.
> 
> When you decide that you have a series of backports
> that proves to fix a real bug in production,
> a way to test the series will be worked out.

/me notes that xfs/558 and xfs/559 (in fstests) are the functional tests
for these patches that you're backporting; it would be useful to have a
third party (i.e. not just the reporter and the author) confirm that the
two fstests pass when real workloads are fixed.

--D

> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
  2023-06-30 15:16             ` Darrick J. Wong
@ 2023-07-19 20:37               ` Ignat Korchagin
  2023-07-20  6:45                 ` Amir Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: Ignat Korchagin @ 2023-07-19 20:37 UTC (permalink / raw)
  To: Darrick J. Wong, Amir Goldstein
  Cc: Matthew Wilcox, Daniel Dao, Dave Chinner, kernel-team,
	linux-fsdevel, Chandan Babu R, Leah Rumancik, linux-xfs,
	Luis R. Rodriguez, Fred Lawler

On Fri, Jun 30, 2023 at 4:17 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Fri, Jun 30, 2023 at 04:05:36PM +0300, Amir Goldstein wrote:
> > On Fri, Jun 30, 2023 at 3:30 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
> > >
> > > On Fri, Jun 30, 2023 at 11:39 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 29, 2023 at 10:31 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
> > > > >
> > > > > On Thu, Jun 29, 2023 at 7:14 PM Darrick J. Wong <djwong@kernel.org> wrote:
> > > > > >
> > > > > > [add the xfs lts maintainers]
> > > > > >
> > > > > > On Thu, Jun 29, 2023 at 05:34:00PM +0100, Matthew Wilcox wrote:
> > > > > > > On Thu, Jun 29, 2023 at 05:09:41PM +0100, Daniel Dao wrote:
> > > > > > > > Hi Dave and Derrick,
> > > > > > > >
> > > > > > > > We are tracking down some corruptions on xfs for our rocksdb workload,
> > > > > > > > running on kernel 6.1.25. The corruptions were
> > > > > > > > detected by rocksdb block checksum. The workload seems to share some
> > > > > > > > similarities
> > > > > > > > with the multi-threaded write workload described in
> > > > > > > > https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/
> > > > > > > >
> > > > > > > > Can we backport the patch series to stable since it seemed to fix data
> > > > > > > > corruptions ?
> > > > > > >
> > > > > > > For clarity, are you asking for permission or advice about doing this
> > > > > > > yourself, or are you asking somebody else to do the backport for you?
> > > > > >
> > > > > > Nobody's officially committed to backporting and testing patches for
> > > > > > 6.1; are you (Cloudflare) volunteering?
> > > > >
> > > > > Yes, we have applied them on top of 6.1.36, will be gradually
> > > > > releasing to our servers and will report back if we see the issues go
> > > > > away
> > > > >
> > > >
> > > > Getting feedback back from Cloudflare production servers is awesome
> > > > but it's not enough.
> > > >
> > > > The standard for getting xfs LTS backports approved is:
> > > > 1. Test the backports against regressions with several rounds of fstests
> > > >     check -g auto on selected xfs configurations [1]
> > > > 2. Post the backport series to xfs list and get an ACK from upstream
> > > >     xfs maintainers
> > > >
> > > > We have volunteers doing this work for 5.4.y, 5.10.y and 5.15.y.
> > > > We do not yet have a volunteer to do that work for 6.1.y.
> > > >
> > > > The question is whether you (or your team) are volunteering to
> > > > do that work for 6.1.y xfs backports to help share the load?

Circling back on this. So far it seems that the patchset in question
does fix the issues of rocksdb corruption as we haven't seen them for
some time on our test group. We're happy to dedicate some efforts now
to get them officially backported to 6.1 according to the process. We
did try basic things with kdevops and would like to learn more. Fred
(cc-ed here) is happy to drive the effort and be the primary contact
on this. Could you, please, guide us/him on the process?

> > > We are not a big team and apart from other internal project work our
> > > efforts are focused on fixing this issue in production, because it
> > > affects many teams and workloads. If we confirm that these patches fix
> > > the issue in production, we will definitely consider dedicating some
> > > work to ensure they are officially backported. But if not - we would
> > > be required to search for a fix first before we can commit to any
> > > work.
> > >
> > > So, IOW - can we come back to you a bit later on this after we get the
> > > feedback from production?
> > >
> >
> > Of course.
> > The volunteering question for 6.1.y is independent.
> >
> > When you decide that you have a series of backports
> > that proves to fix a real bug in production,
> > a way to test the series will be worked out.
>
> /me notes that xfs/558 and xfs/559 (in fstests) are the functional tests
> for these patches that you're backporting; it would be useful to have a
> third party (i.e. not just the reporter and the author) confirm that the
> two fstests pass when real workloads are fixed.
>
> --D
>
> > Thanks,
> > Amir.

Ignat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
  2023-07-19 20:37               ` Ignat Korchagin
@ 2023-07-20  6:45                 ` Amir Goldstein
  2023-07-20 18:30                   ` Luis Chamberlain
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2023-07-20  6:45 UTC (permalink / raw)
  To: Ignat Korchagin
  Cc: Darrick J. Wong, Matthew Wilcox, Daniel Dao, Dave Chinner,
	kernel-team, linux-fsdevel, Chandan Babu R, Leah Rumancik,
	linux-xfs, Luis R. Rodriguez, Fred Lawler

[-- Attachment #1: Type: text/plain, Size: 6047 bytes --]

On Wed, Jul 19, 2023 at 11:37 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
>
> On Fri, Jun 30, 2023 at 4:17 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Fri, Jun 30, 2023 at 04:05:36PM +0300, Amir Goldstein wrote:
> > > On Fri, Jun 30, 2023 at 3:30 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
> > > >
> > > > On Fri, Jun 30, 2023 at 11:39 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > >
> > > > > On Thu, Jun 29, 2023 at 10:31 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
> > > > > >
> > > > > > On Thu, Jun 29, 2023 at 7:14 PM Darrick J. Wong <djwong@kernel.org> wrote:
> > > > > > >
> > > > > > > [add the xfs lts maintainers]
> > > > > > >
> > > > > > > On Thu, Jun 29, 2023 at 05:34:00PM +0100, Matthew Wilcox wrote:
> > > > > > > > On Thu, Jun 29, 2023 at 05:09:41PM +0100, Daniel Dao wrote:
> > > > > > > > > Hi Dave and Derrick,
> > > > > > > > >
> > > > > > > > > We are tracking down some corruptions on xfs for our rocksdb workload,
> > > > > > > > > running on kernel 6.1.25. The corruptions were
> > > > > > > > > detected by rocksdb block checksum. The workload seems to share some
> > > > > > > > > similarities
> > > > > > > > > with the multi-threaded write workload described in
> > > > > > > > > https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/
> > > > > > > > >
> > > > > > > > > Can we backport the patch series to stable since it seemed to fix data
> > > > > > > > > corruptions ?
> > > > > > > >
> > > > > > > > For clarity, are you asking for permission or advice about doing this
> > > > > > > > yourself, or are you asking somebody else to do the backport for you?
> > > > > > >
> > > > > > > Nobody's officially committed to backporting and testing patches for
> > > > > > > 6.1; are you (Cloudflare) volunteering?
> > > > > >
> > > > > > Yes, we have applied them on top of 6.1.36, will be gradually
> > > > > > releasing to our servers and will report back if we see the issues go
> > > > > > away
> > > > > >
> > > > >
> > > > > Getting feedback back from Cloudflare production servers is awesome
> > > > > but it's not enough.
> > > > >
> > > > > The standard for getting xfs LTS backports approved is:
> > > > > 1. Test the backports against regressions with several rounds of fstests
> > > > >     check -g auto on selected xfs configurations [1]
> > > > > 2. Post the backport series to xfs list and get an ACK from upstream
> > > > >     xfs maintainers
> > > > >
> > > > > We have volunteers doing this work for 5.4.y, 5.10.y and 5.15.y.
> > > > > We do not yet have a volunteer to do that work for 6.1.y.
> > > > >
> > > > > The question is whether you (or your team) are volunteering to
> > > > > do that work for 6.1.y xfs backports to help share the load?
>
> Circling back on this. So far it seems that the patchset in question
> does fix the issues of rocksdb corruption as we haven't seen them for
> some time on our test group. We're happy to dedicate some efforts now
> to get them officially backported to 6.1 according to the process. We
> did try basic things with kdevops and would like to learn more. Fred
> (cc-ed here) is happy to drive the effort and be the primary contact
> on this. Could you, please, guide us/him on the process?
>

Hi Fred,

I'd love to help you get started with kdevops and xfs testing.
However, I am going on vacation tomorrow for three weeks,
so I'll just drop a few pointers and let the others help you out.

Luis (@mcgrof) is your best point of contact for kdevops.
Chandan should be able to help you with xfs backporting questions.

Better yet, use the discord channel:
  https://bit.ly/linux-kdevops-chat

Someone is almost always available to answer questions there.

BACKPORT PATCHES:
-------------------------------
Please make sure to:
1. Prefix subject with [PATCH 6.1]
2. Specify upstream commit at head of commit message body
3. Add specific backport notes at the bottom of commit message if needed
4. Add your Signed-off-by at the end
5. Check if the upstream commit has a Fixes mention in upstream
5.a. If there are later fix commits, you will need to backport those as well
5.b. If the later fix commits are applicable to 6.4.y, you will need to backport
       them to 6.4.y first

TESTING:
--------------
The most challenging part of running fstests with kdevops is
establishing the baseline (which tests pass in current 6.1.y per xfs config),
but the baseline for that has already been established and committed
in kdevops repo.

There is a little quirk, that the baseline is associated only with exact
kernel version, hence commits like:
* c4e3de1 bootlinux: add expunge link for v6.1.39
* d6b5ea4 bootlinux: add expunge link for v6.1.38

Make sure that you test your patches against one of those tags
or add new symlinks to other tags.
Start by running a sanity test without your patches, because different
running environments and kdevops configs may disagree on the baseline.

You can use kdevops to either run local VMs with libvirt or launch
cloud VMs with terraform - you need to configure this and more
during the 'make menuconfig' step.
Attaching my kdevops config (for libvirt guests) as a reference.

REVIEW:
------------
Once you are done verifying no regressions over several kdevops run loops,
please post the backport patches for review with [PATCH 6.1 CANDIDATE]
prefix to xfs list (and not to stable list), like [1].
Specify the bug reports from your production env, all the relevant information
regarding testing and special backport considerations.

Once the candidate backports have been ACKed, add the Acked-by trailer
to patches, remove the CANDIDATE prefix and post them to the stable
list, like [2].

Good luck and thank you for your contribution!
Amir.

[1] https://lore.kernel.org/linux-xfs/20230712094733.1265038-1-amir73il@gmail.com/
[2] https://lore.kernel.org/linux-xfs/20230715063114.1485841-1-amir73il@gmail.com/

[-- Attachment #2: config.kdevops.xfs-6.1.y.txt --]
[-- Type: text/plain, Size: 13440 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# kdevops 5.0.2-00163-ga181072
#
CONFIG_NEEDS_LOCAL_DEVELOPMENT_PATH=y
# CONFIG_KDEVOPS_FIRST_RUN is not set
CONFIG_DISTRO_DEBIAN=y

#
# Target architecture
#
CONFIG_TARGET_ARCH_X86_64=y
# CONFIG_TARGET_ARCH_ARM64 is not set
# CONFIG_TARGET_ARCH_PPC64LE is not set
CONFIG_TARGET_ARCH="x86_64"
# end of Target architecture

#
# SSH update configuration
#
CONFIG_KDEVOPS_SSH_CONFIG_UPDATE=y
CONFIG_KDEVOPS_SSH_CONFIG="~/.ssh/config"
CONFIG_KDEVOPS_SSH_CONFIG_UPDATE_STRICT=y
CONFIG_KDEVOPS_SSH_CONFIG_UPDATE_BACKUP=y
# end of SSH update configuration

CONFIG_GIT_ALTERNATIVES=y
CONFIG_GIT_LINUX_KDEVOPS_GITHUB=y
# CONFIG_GIT_LINUX_KDEVOPS_GITLAB is not set
# CONFIG_SETUP_POSTFIX_EMAIL_RELAY is not set
# CONFIG_HYPERVISOR_TUNING is not set
# CONFIG_ENABLE_LOCAL_LINUX_MIRROR is not set
CONFIG_LOCAL_DEVELOPMENT_PATH="/home/xfs/devel/"

#
# Bring up methods
#
CONFIG_VAGRANT=y
# CONFIG_TERRAFORM is not set
# CONFIG_SKIP_BRINGUP is not set
CONFIG_VAGRANT_LIBVIRT=y
# CONFIG_VAGRANT_VIRTUALBOX is not set
# CONFIG_VAGRANT_LARGE_CPU is not set
# CONFIG_VAGRANT_VCPUS_2 is not set
# CONFIG_VAGRANT_VCPUS_4 is not set
CONFIG_VAGRANT_VCPUS_8=y
# CONFIG_VAGRANT_VCPUS_16 is not set
# CONFIG_VAGRANT_VCPUS_32 is not set
# CONFIG_VAGRANT_VCPUS_64 is not set
# CONFIG_VAGRANT_VCPUS_128 is not set
# CONFIG_VAGRANT_VCPUS_255 is not set
CONFIG_VAGRANT_VCPUS_COUNT=8
# CONFIG_VAGRANT_MEM_2G is not set
# CONFIG_VAGRANT_MEM_3G is not set
CONFIG_VAGRANT_MEM_4G=y
# CONFIG_VAGRANT_MEM_8G is not set
# CONFIG_VAGRANT_MEM_16G is not set
# CONFIG_VAGRANT_MEM_32G is not set
CONFIG_VAGRANT_MEM_MB=4096
# CONFIG_LIBVIRT_MACHINE_TYPE_DEFAULT is not set
CONFIG_LIBVIRT_MACHINE_TYPE_Q35=y
CONFIG_LIBVIRT_HOST_PASSTHROUGH=y
CONFIG_QEMU_BUILD=y
CONFIG_QEMU_USE_DEVELOPMENT_VERSION=y
CONFIG_QEMU_BUILD_UPSTREAM=y
# CONFIG_QEMU_BUILD_JIC23 is not set
# CONFIG_QEMU_BUILD_MANUAL is not set
CONFIG_QEMU_BUILD_GIT="https://github.com/qemu/qemu.git"
CONFIG_QEMU_BUILD_GIT_DATA_PATH="{{local_dev_path}}/qemu"
CONFIG_QEMU_BUILD_GIT_VERSION="v7.2.0-rc4"
CONFIG_QEMU_BIN_PATH_LIBVIRT="/usr/local/bin/qemu-system-x86_64"
CONFIG_QEMU_INSTALL_DIR_LIBVIRT="/usr/local/bin"
# CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME is not set
CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_VIRTIO=y
# CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_IDE is not set
CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_512=y
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_1K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_2K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_4K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_8K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_16K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_32K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_64K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_128K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_256K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_512K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_1M is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_PHYSICAL_BLOCK_SIZE_2M is not set
CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_512=y
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_1K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_2K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_4K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_8K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_16K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_32K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_64K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_128K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_256K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_512K is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_1M is not set
# CONFIG_LIBVIRT_EXTRA_STORAGE_VIRTIO_LOGICAL_BLOCK_SIZE_2M is not set
CONFIG_LIBVIRT_VIRTIO_AIO_MODE_NATIVE=y
# CONFIG_LIBVIRT_VIRTIO_AIO_MODE_THREADS is not set
CONFIG_LIBVIRT_VIRTIO_AIO_MODE="native"
CONFIG_LIBVIRT_VIRTIO_AIO_CACHE_MODE_NONE=y
# CONFIG_LIBVIRT_VIRTIO_AIO_CACHE_MODE_WRITETHROUGH is not set
# CONFIG_LIBVIRT_VIRTIO_AIO_CACHE_MODE_WRITEBACK is not set
# CONFIG_LIBVIRT_VIRTIO_AIO_CACHE_MODE_DIRECTSYNC is not set
# CONFIG_LIBVIRT_VIRTIO_AIO_CACHE_MODE_UNSAFE is not set
CONFIG_LIBVIRT_VIRTIO_AIO_CACHE_MODE="none"
CONFIG_LIBVIRT_STORAGE_POOL_PATH_INFER_ADVANCED=y
# CONFIG_LIBVIRT_STORAGE_POOL_PATH_CUSTOM_CWD is not set
# CONFIG_LIBVIRT_STORAGE_POOL_PATH_CUSTOM_DEFAULT_DISTRO is not set
# CONFIG_LIBVIRT_STORAGE_POOL_PATH_CUSTOM_MANUAL is not set
CONFIG_LIBVIRT_STORAGE_POOL_CREATE=y
CONFIG_LIBVIRT_STORAGE_POOL_NAME="data"
CONFIG_LIBVIRT_STORAGE_POOL_PATH_CUSTOM="/data/libvirt/images"
CONFIG_LIBVIRT_URI_SYSTEM=y
# CONFIG_LIBVIRT_URI_SESSION is not set
# CONFIG_LIBVIRT_URI_CUSTOM is not set
CONFIG_LIBVIRT_URI_PATH="qemu:///system"
CONFIG_LIBVIRT_SYSTEM_URI_PATH="qemu:///system"
CONFIG_LIBVIRT_QEMU_GROUP="libvirt-qemu"
CONFIG_KDEVOPS_STORAGE_POOL_PATH="/data/libvirt/images"
CONFIG_QEMU_BIN_PATH="/usr/local/bin/qemu-system-x86_64"
CONFIG_LIBVIRT_URI="qemu:///system"
CONFIG_LIBVIRT_SYSTEM_URI="qemu:///system"
CONFIG_VAGRANT_DEBIAN=y
# CONFIG_VAGRANT_OPENSUSE is not set
# CONFIG_VAGRANT_FEDORA is not set
# CONFIG_VAGRANT_REDHAT_GENERIC is not set
# CONFIG_VAGRANT_KDEVOPS is not set
# CONFIG_VAGRANT_DEBIAN_BUSTER64 is not set
# CONFIG_VAGRANT_DEBIAN_BULLSEYE64 is not set
CONFIG_VAGRANT_DEBIAN_TESTING64=y
CONFIG_VAGRANT_DEBIAN_BOX_SHORT="testing64"
CONFIG_VAGRANT_BOX="debian/testing64"
CONFIG_VAGRANT_BOX_UPDATE_ON_BRINGUP=y
CONFIG_VAGRANT_BOX_VERSION=""
# CONFIG_VAGRANT_LIBVIRT_INSTALL is not set
# CONFIG_VAGRANT_LIBVIRT_CONFIGURE is not set
# CONFIG_VAGRANT_LIBVIRT_VERIFY is not set
CONFIG_VAGRANT_INSTALL_PRIVATE_BOXES=y
# CONFIG_LIBVIRT_NVME_DRIVE_FORMAT_QCOW2 is not set
CONFIG_LIBVIRT_NVME_DRIVE_FORMAT_RAW=y
CONFIG_QEMU_NVME_ZONE_DRIVE_SIZE=102400
CONFIG_QEMU_NVME_ZONE_ZASL=0
CONFIG_QEMU_NVME_ZONE_SIZE="128M"
CONFIG_QEMU_NVME_ZONE_CAPACITY="0M"
CONFIG_QEMU_NVME_ZONE_MAX_ACTIVE=0
CONFIG_QEMU_NVME_ZONE_MAX_OPEN=0
CONFIG_QEMU_NVME_ZONE_PHYSICAL_BLOCK_SIZE=4096
CONFIG_QEMU_NVME_ZONE_LOGICAL_BLOCK_SIZE=4096
# CONFIG_QEMU_ENABLE_EXTRA_DRIVE_LARGEIO is not set
CONFIG_QEMU_LARGEIO_DRIVE_BASE_SIZE=10240
CONFIG_QEMU_LARGEIO_COMPAT_SIZE=512
CONFIG_QEMU_LARGEIO_MAX_POW_LIMIT=12
# CONFIG_QEMU_ENABLE_CXL is not set
# end of Bring up methods

#
# Bring up goals
#
# CONFIG_KDEVOPS_TRY_REFRESH_REPOS is not set
# CONFIG_KDEVOPS_SETUP_NFSD is not set
# end of Bring up goals

#
# Node sysctl configuration
#
# CONFIG_SYSCTL_TUNING is not set
# end of Node sysctl configuration

#
# Target workflows
#
CONFIG_WORKFLOWS=y

#
# Shared workflow configuration
#

#
# Shared workflow data partition
#
CONFIG_WORKFLOW_DATA_DEVICE="/dev/disk/by-id/virtio-kdevops0"
CONFIG_WORKFLOW_DATA_PATH="/data"
CONFIG_WORKFLOW_INFER_USER_AND_GROUP=y
# CONFIG_WORKFLOW_DATA_FSTYPE_XFS is not set
# CONFIG_WORKFLOW_DATA_FSTYPE_EXT4 is not set
CONFIG_WORKFLOW_DATA_FSTYPE_BTRFS=y
CONFIG_WORKFLOW_DATA_FSTYPE="btrfs"
CONFIG_WORKFLOW_DATA_LABEL="data"
# end of Shared workflow data partition

# CONFIG_WORKFLOW_MAKE_CMD_OVERRIDE is not set
CONFIG_WORKFLOW_KDEVOPS_GIT="https://github.com/linux-kdevops/kdevops.git"
CONFIG_WORKFLOW_KDEVOPS_GIT_DATA="{{data_path}}/kdevops"
CONFIG_WORKFLOW_KDEVOPS_DIR="{{data_path}}/kdevops"
# end of Shared workflow configuration

# CONFIG_WORKFLOW_LINUX_DISTRO is not set
CONFIG_WORKFLOW_LINUX_CUSTOM=y

#
# Get and install Linux from git
#
CONFIG_BOOTLINUX=y
CONFIG_BOOTLINUX_9P=y

#
# Modify default 9p configuration
#
CONFIG_BOOTLINUX_9P_HOST_PATH="/data/xfs/xfs-6.1.y/kdevops/linux"
CONFIG_BOOTLINUX_9P_MSIZE=131072
CONFIG_BOOTLINUX_9P_FSDEV="kdevops_9p_fsdev"
CONFIG_BOOTLINUX_9P_SECURITY_MODEL="none"
CONFIG_BOOTLINUX_9P_DRIVER="virtio-9p-pci"
CONFIG_BOOTLINUX_9P_MOUNT_TAG="kdevops_9p_bootlinux"
# end of Modify default 9p configuration

CONFIG_BOOTLINUX_STABLE=y
# CONFIG_BOOTLINUX_DEV is not set
# CONFIG_BOOTLINUX_TREE_LINUS is not set
CONFIG_BOOTLINUX_TREE_STABLE=y
# CONFIG_BOOTLINUX_STABLE_V419 is not set
# CONFIG_BOOTLINUX_STABLE_V54 is not set
# CONFIG_BOOTLINUX_STABLE_V510 is not set
# CONFIG_BOOTLINUX_STABLE_V514 is not set
# CONFIG_BOOTLINUX_STABLE_V517 is not set
# CONFIG_BOOTLINUX_STABLE_V519 is not set
# CONFIG_BOOTLINUX_STABLE_V60 is not set
CONFIG_BOOTLINUX_STABLE_V61=y
CONFIG_BOOTLINUX_TREE_NAME="linux"
CONFIG_BOOTLINUX_TREE="https://github.com/amir73il/linux.git"
CONFIG_BOOTLINUX_TREE_TAG="xfs-6.1.y-for-testing"
CONFIG_BOOTLINUX_TREE_LOCALVERSION=""
CONFIG_BOOTLINUX_SHALLOW_CLONE=y
CONFIG_BOOTLINUX_SHALLOW_CLONE_DEPTH=1
# end of Get and install Linux from git

CONFIG_WORKFLOWS_TESTS=y
# CONFIG_WORKFLOWS_TESTS_DEMOS is not set
CONFIG_WORKFLOWS_LINUX_TESTS=y
CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
CONFIG_KDEVOPS_WORKFLOW_DEDICATE_FSTESTS=y
# CONFIG_KDEVOPS_WORKFLOW_DEDICATE_BLKTESTS is not set
# CONFIG_KDEVOPS_WORKFLOW_DEDICATE_CXL is not set
# CONFIG_KDEVOPS_WORKFLOW_DEDICATE_PYNFS is not set
# CONFIG_KDEVOPS_WORKFLOW_DEDICATE_SELFTESTS is not set
CONFIG_KDEVOPS_WORKFLOW_ENABLE_FSTESTS=y

#
# Configure and run fstests
#
CONFIG_HAVE_DISTRO_PREFERS_FSTESTS_WATCHDOG=y
CONFIG_HAVE_DISTRO_PREFERS_FSTESTS_WATCHDOG_KILL=y
CONFIG_FSTESTS_XFS=y
# CONFIG_FSTESTS_BTRFS is not set
# CONFIG_FSTESTS_EXT4 is not set
CONFIG_FSTESTS_FSTYP="xfs"
CONFIG_FSTESTS_WATCHDOG=y
CONFIG_FSTESTS_WATCHDOG_CHECK_TIME=5
CONFIG_FSTESTS_WATCHDOG_MAX_NEW_TEST_TIME=60
CONFIG_FSTESTS_WATCHDOG_HUNG_MULTIPLIER_LONG_TESTS=10
CONFIG_FSTESTS_WATCHDOG_HUNG_FAST_TEST_MAX_TIME=5
CONFIG_FSTESTS_WATCHDOG_KILL_TASKS_ON_HANG=y
# CONFIG_FSTESTS_WATCHDOG_RESET_HUNG_SYSTEMS is not set

#
# Configure how to test XFS
#
CONFIG_HAVE_DISTRO_XFS_PREFERS_MANUAL=y
CONFIG_HAVE_DISTRO_XFS_SUPPORTS_CRC=y
CONFIG_HAVE_DISTRO_XFS_SUPPORTS_REFLINKS=y
CONFIG_HAVE_DISTRO_XFS_SUPPORTS_BIGBLOCKS=y
CONFIG_HAVE_DISTRO_XFS_SUPPORTS_EXTERNAL_LOG=y
# CONFIG_FSTESTS_XFS_MANUAL_COVERAGE is not set
CONFIG_FSTESTS_XFS_SECTION_CRC=y
CONFIG_FSTESTS_XFS_SECTION_NOCRC=y
CONFIG_FSTESTS_XFS_SECTION_NOCRC_512=y
CONFIG_FSTESTS_XFS_SECTION_REFLINK=y
CONFIG_FSTESTS_XFS_SECTION_REFLINK_1024=y
CONFIG_FSTESTS_XFS_SECTION_REFLINK_NORMAPBT=y
CONFIG_FSTESTS_XFS_SECTION_LOGDEV=y
# end of Configure how to test XFS

CONFIG_FSTESTS_GIT="https://github.com/linux-kdevops/fstests.git"
CONFIG_FSTESTS_DATA="{{data_path}}/fstests"
CONFIG_FSTESTS_DATA_TARGET="/var/lib/xfstests"
CONFIG_FSTESTS_TESTDEV_SPARSEFILE_GENERATION=y
CONFIG_FSTESTS_SPARSE_DEV="/dev/disk/by-id/virtio-kdevops1"
# CONFIG_FSTESTS_SPARSE_XFS is not set
# CONFIG_FSTESTS_SPARSE_BTRFS is not set
CONFIG_FSTESTS_SPARSE_EXT4=y
CONFIG_FSTESTS_SPARSE_FSTYPE="ext4"
CONFIG_FSTESTS_SPARSE_LABEL="sparsefiles"
CONFIG_FSTESTS_SPARSE_FILE_PATH="/media/sparsefiles"
CONFIG_FSTESTS_SPARSE_FILE_SIZE="20G"
CONFIG_FSTESTS_SPARSE_FILENAME_PREFIX="sparse-disk"
CONFIG_FSTESTS_TEST_DEV="/dev/loop16"
CONFIG_FSTESTS_TEST_LOGDEV="/dev/loop13"
CONFIG_FSTESTS_TEST_LOGDEV_MKFS_OPTS="-lsize=1g"
CONFIG_FSTESTS_TEST_DIR="/media/test"
CONFIG_FSTESTS_SCRATCH_DEV_POOL="/dev/loop5 /dev/loop6 /dev/loop7 /dev/loop8 /dev/loop9 /dev/loop10 /dev/loop11 /dev/loop12"
CONFIG_FSTESTS_SCRATCH_MNT="/media/scratch"
CONFIG_FSTESTS_LOGWRITES_DEV="/dev/loop15"
CONFIG_FSTESTS_SCRATCH_LOGDEV="/dev/loop15"
CONFIG_FSTESTS_SETUP_SYSTEM=y
CONFIG_FSTESTS_RUN_TESTS=y
# CONFIG_FSTESTS_RUN_AUTO_GROUP_TESTS is not set
CONFIG_FSTESTS_RUN_CUSTOM_GROUP_TESTS="auto"
# CONFIG_FSTESTS_ENABLE_RUN_CUSTOM_TESTS is not set
# CONFIG_FSTESTS_RUN_LARGE_DISK_TESTS is not set
# end of Configure and run fstests

CONFIG_KDEVOPS_WORKFLOW_GIT_CLONES_KDEVOPS_GIT=y
# end of Target workflows

#
# Kdevops configuration
#
CONFIG_HAVE_CUSTOM_DISTRO_HOST_PREFIX=y
CONFIG_HAVE_DISTRO_PREFERS_CUSTOM_HOST_PREFIX=y
CONFIG_CUSTOM_DISTRO_HOST_PREFIX="testing64"
CONFIG_KDEVOPS_USE_DISTRO_HOSTS_PREFIX=y
CONFIG_KDEVOPS_HOSTS_PREFIX="xfs61"
# CONFIG_KDEVOPS_BASELINE_AND_DEV is not set

#
# Ansible post-bring up provisioning configuration
#
CONFIG_KDEVOPS_PLAYBOOK_DIR="playbooks"
CONFIG_KDEVOPS_ANSIBLE_PROVISION_ENABLE=y
CONFIG_KDEVOPS_ANSIBLE_PROVISION_PLAYBOOK="devconfig.yml"
CONFIG_KDEVOPS_DEVCONFIG_ENABLE=y
CONFIG_KDEVOPS_DEVCONFIG_ENABLE_CONSOLE=y
CONFIG_KDEVOPS_DEVCONFIG_KERNEL_CONSOLE_SETTINGS="console=tty0 console=tty1 console=ttyS0,115200n8"
CONFIG_KDEVOPS_DEVCONFIG_GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --parity=no --stop=1"
CONFIG_KDEVOPS_GRUB_TIMEOUT=2
CONFIG_KDEVOPS_DEVCONFIG_ENABLE_SYSTEMD_WATCHDOG=y
CONFIG_KDEVOPS_DEVCONFIG_SYSTEMD_WATCHDOG_TIMEOUT_RUNTIME="5min"
CONFIG_KDEVOPS_DEVCONFIG_SYSTEMD_WATCHDOG_TIMEOUT_REBOOT="10min"
CONFIG_KDEVOPS_DEVCONFIG_SYSTEMD_WATCHDOG_TIMEOUT_KEXEC="5min"
CONFIG_KDEVOPS_ANSIBLE_INVENTORY_FILE="hosts"
CONFIG_KDEVOPS_PYTHON_INTERPRETER="/usr/bin/python3"
CONFIG_KDEVOPS_PYTHON_OLD_INTERPRETER="/usr/bin/python2"
# end of Ansible post-bring up provisioning configuration

#
# Kernel continous integration configuration
#
CONFIG_KERNEL_CI_DEFAULT_STEADY_STATE_GOAL=100
CONFIG_KERNEL_CI=y
CONFIG_KERNEL_CI_ENABLE_STEADY_STATE=y
CONFIG_KERNEL_CI_STEADY_STATE_GOAL=10
# end of Kernel continous integration configuration
# end of Kdevops configuration

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
  2023-07-20  6:45                 ` Amir Goldstein
@ 2023-07-20 18:30                   ` Luis Chamberlain
  2023-07-20 18:38                     ` Frederick Lawler
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2023-07-20 18:30 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Ignat Korchagin, Darrick J. Wong, Matthew Wilcox, Daniel Dao,
	Dave Chinner, kernel-team, linux-fsdevel, Chandan Babu R,
	Leah Rumancik, linux-xfs, Fred Lawler

On Thu, Jul 20, 2023 at 09:45:14AM +0300, Amir Goldstein wrote:
> On Wed, Jul 19, 2023 at 11:37 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
> >
> > Circling back on this. So far it seems that the patchset in question
> > does fix the issues of rocksdb corruption as we haven't seen them for
> > some time on our test group. We're happy to dedicate some efforts now
> > to get them officially backported to 6.1 according to the process. We
> > did try basic things with kdevops and would like to learn more. Fred
> > (cc-ed here) is happy to drive the effort and be the primary contact
> > on this. Could you, please, guide us/him on the process?
> >
> 
> Hi Fred,
> 
> I'd love to help you get started with kdevops and xfs testing.
> However, I am going on vacation tomorrow for three weeks,
> so I'll just drop a few pointers and let the others help you out.
> 
> Luis (@mcgrof) is your best point of contact for kdevops.

I'm happy to help.

> Chandan should be able to help you with xfs backporting questions.
> 
> Better yet, use the discord channel:
>   https://bit.ly/linux-kdevops-chat
> 
> Someone is almost always available to answer questions there.

Indeed and also on irc.oftc.net on #kdevops too if you prefer IRC.
But discord seems to be more happening for kdevops these days.

> TESTING:
> --------------
> The most challenging part of running fstests with kdevops is
> establishing the baseline (which tests pass in current 6.1.y per xfs config),
> but the baseline for that has already been established and committed
> in kdevops repo.
> 
> There is a little quirk, that the baseline is associated only with exact
> kernel version, hence commits like:
> * c4e3de1 bootlinux: add expunge link for v6.1.39

Indeed so our latest baseline is in

workflows/fstests/expunges/6.1.39/xfs/unassigned/

> Make sure that you test your patches against one of those tags
> or add new symlinks to other tags.
> Start by running a sanity test without your patches, because different
> running environments and kdevops configs may disagree on the baseline.

You want to first run at least one loop to confirm your setup is fine
and that you don't find any other failures other than the ones above.

> You can use kdevops to either run local VMs with libvirt or launch
> cloud VMs with terraform - you need to configure this and more
> during the 'make menuconfig' step.
> Attaching my kdevops config (for libvirt guests) as a reference.

Please read:

https://github.com/linux-kdevops/kdevops
https://github.com/linux-kdevops/kdevops/blob/master/docs/requirements.md
https://github.com/linux-kdevops/kdevops/blob/master/docs/kdevops-first-run.md
https://github.com/linux-kdevops/kdevops/blob/master/docs/kdevops-mirror.md

And the video demonstrations. Then I'm happy to schedule some time to
cover anything the docs didn't cover, in particular to help you test new
patches you wish to backport for a stable kernel and the testing
criteria for that.

  Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap
  2023-07-20 18:30                   ` Luis Chamberlain
@ 2023-07-20 18:38                     ` Frederick Lawler
  0 siblings, 0 replies; 12+ messages in thread
From: Frederick Lawler @ 2023-07-20 18:38 UTC (permalink / raw)
  To: Luis Chamberlain, Amir Goldstein
  Cc: Ignat Korchagin, Darrick J. Wong, Matthew Wilcox, Daniel Dao,
	Dave Chinner, kernel-team, linux-fsdevel, Chandan Babu R,
	Leah Rumancik, linux-xfs

Hi Amir and Luis

On 7/20/23 1:30 PM, Luis Chamberlain wrote:
> On Thu, Jul 20, 2023 at 09:45:14AM +0300, Amir Goldstein wrote:
>> On Wed, Jul 19, 2023 at 11:37 PM Ignat Korchagin <ignat@cloudflare.com> wrote:
>>>
>>> Circling back on this. So far it seems that the patchset in question
>>> does fix the issues of rocksdb corruption as we haven't seen them for
>>> some time on our test group. We're happy to dedicate some efforts now
>>> to get them officially backported to 6.1 according to the process. We
>>> did try basic things with kdevops and would like to learn more. Fred
>>> (cc-ed here) is happy to drive the effort and be the primary contact
>>> on this. Could you, please, guide us/him on the process?
>>>
>>
>> Hi Fred,
>>
>> I'd love to help you get started with kdevops and xfs testing.
>> However, I am going on vacation tomorrow for three weeks,
>> so I'll just drop a few pointers and let the others help you out.
>>
>> Luis (@mcgrof) is your best point of contact for kdevops.
> 
> I'm happy to help.
> 
>> Chandan should be able to help you with xfs backporting questions.
>>
>> Better yet, use the discord channel:
>>    https://bit.ly/linux-kdevops-chat
>>
>> Someone is almost always available to answer questions there.
> 
> Indeed and also on irc.oftc.net on #kdevops too if you prefer IRC.
> But discord seems to be more happening for kdevops these days.
> 
>> TESTING:
>> --------------
>> The most challenging part of running fstests with kdevops is
>> establishing the baseline (which tests pass in current 6.1.y per xfs config),
>> but the baseline for that has already been established and committed
>> in kdevops repo.
>>
>> There is a little quirk, that the baseline is associated only with exact
>> kernel version, hence commits like:
>> * c4e3de1 bootlinux: add expunge link for v6.1.39
> 
> Indeed so our latest baseline is in
> 
> workflows/fstests/expunges/6.1.39/xfs/unassigned/
> 
>> Make sure that you test your patches against one of those tags
>> or add new symlinks to other tags.
>> Start by running a sanity test without your patches, because different
>> running environments and kdevops configs may disagree on the baseline.
> 
> You want to first run at least one loop to confirm your setup is fine
> and that you don't find any other failures other than the ones above.
> 
>> You can use kdevops to either run local VMs with libvirt or launch
>> cloud VMs with terraform - you need to configure this and more
>> during the 'make menuconfig' step.
>> Attaching my kdevops config (for libvirt guests) as a reference.
> 
> Please read:
> 
> https://github.com/linux-kdevops/kdevops
> https://github.com/linux-kdevops/kdevops/blob/master/docs/requirements.md
> https://github.com/linux-kdevops/kdevops/blob/master/docs/kdevops-first-run.md
> https://github.com/linux-kdevops/kdevops/blob/master/docs/kdevops-mirror.md
> 
> And the video demonstrations. Then I'm happy to schedule some time to
> cover anything the docs didn't cover, in particular to help you test new
> patches you wish to backport for a stable kernel and the testing
> criteria for that.
> 
>    Luis

This is all fantastic! I just joined the discord and will likely begin 
work on this tomorrow. I've already setup kdevops and ran through some 
selftests earlier this week. I still need to watch the video however. 
I'll reach out in Discord after I give a crack at what's presented so far.

Fred

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-07-20 18:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-29 16:09 Backporting of series xfs/iomap: fix data corruption due to stale cached iomap Daniel Dao
2023-06-29 16:34 ` Matthew Wilcox
2023-06-29 18:14   ` Darrick J. Wong
2023-06-29 19:30     ` Ignat Korchagin
2023-06-30 10:39       ` Amir Goldstein
2023-06-30 12:30         ` Ignat Korchagin
2023-06-30 13:05           ` Amir Goldstein
2023-06-30 15:16             ` Darrick J. Wong
2023-07-19 20:37               ` Ignat Korchagin
2023-07-20  6:45                 ` Amir Goldstein
2023-07-20 18:30                   ` Luis Chamberlain
2023-07-20 18:38                     ` Frederick Lawler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).