[REGRESSION] Corruption on cifs / smb write on ARM, kernels 6.3-6.9

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [REGRESSION] Corruption on cifs / smb write on ARM, kernels 6.3-6.9
@ 2024-09-20 20:07 James Young
  2024-09-22 23:55 ` Wang Yugui
  0 siblings, 1 reply; 5+ messages in thread
From: James Young @ 2024-09-20 20:07 UTC (permalink / raw)
  To: pronoiac+kernel
  Cc: stable, regressions, linux-cifs, David Howells, linux-kernel,
	Steve French

I was benchmarking some compressors, piping to and from a network share on a NAS, and some consistently wrote corrupted data.

First, apologies in advance:
* if I'm not in the right place. I tried to follow the directions from the Regressions guide - https://www.kernel.org/doc/html/latest/admin-guide/reporting-regressions.html
* I know there's a ton of context I don't know
* I’m trying a different mail app, because the first one looked concussed with plain text. This might be worse.

The detailed description:
I was benchmarking some compressors on Debian on a Raspberry Pi, piping to and from a network share on a NAS, and found that some consistently had issues writing to my NAS. Specifically:
* lzop
* pigz - parallel gzip
* pbzip2 - parallel bzip2

This is dependent on kernel version. I've done a survey, below.

While I tripped over the issue on a Debian port (Debian 12, bookworm, kernel v6.6), I compiled my own vanilla / mainline kernels for testing and reporting this.

Even more details:
The Pi and the Synology NAS are directly connected by Gigabit Ethernet. Both sides are using self-assigned IP addresses. I'll note that at boot, getting the Pi to see the NAS requires some nudging of avahi-autoipd; while I think it's stable before testing, I'm not positive, and reconnection issues might be in play.

The files in question are tars of sparse file systems, about 270 gig, compressing down to 10-30 gig.

Compression seems to work, without complaint; decompression crashes the process, usually within the first gig of the compressed file. The output of the stream doesn't match what ends up written to disk.

Trying decompression during compression gets further along than it does after compression finishes; this might point toward something with writes and caches.

A previous attempt involved rpi-update, which:
* good: let me install kernels without building myself
* bad: updated the bootloader and firmware, to bleeding edge, with possible regressions; it definitely muddied the results of my tests
I started over with a fresh install, and no results involving rpi-update are included in this email.

A survey of major branches:
* 5.15.167, LTS - good
* 6.1.109, LTS - good
* 6.2.16 - good
* 6.3.13 - bad
* 6.4.16 - bad
* 6.5.13 - bad
* 6.6.50, LTS - bad
* 6.7.12 - bad
* 6.8.12 - bad
* 6.9.12 - bad
* 6.10.9 - good
* 6.11.0 - good

I tried, but couldn't fully build 4.19.322 or 6.0.19, due to issues with modules.

Important commits:
It looked like both the breakage and the fix came in during rc1 releases.

Breakage, v6.3-rc1:
I manually bisected commits in fs/smb* and fs/cifs.

3d78fe73fa12 cifs: Build the RDMA SGE list directly from an iterator
> lzop and pigz worked. last working. test in progress: pbzip2

607aea3cc2a8 cifs: Remove unused code
> lzop didn't work. first broken

Fix, v6.10-rc1:
I manually bisected commits in fs/smb.

69c3c023af25 cifs: Implement netfslib hooks
> lzop didn't work. last broken one

3ee1a1fc3981 cifs: Cut over to using netfslib
> lzop, pigz, pbzip2, all worked. first fixed one

To test / reproduce:
It looks like this, on a mounted network share, with extra pv for progress meters:

cat 1tb-rust-ext4.img.tar.gz | \
  gzip -d | \
  lzop -1 > \
  1tb-rust-ext4.img.tar.lzop
  # wait 40 minutes

cat 1tb-rust-ext4.img.tar.lzop | \
  lzop -d | \
  sha1sum
  # either it works, and shows the right checksum
  # or it crashes early, due to a corrupt file, and shows an incorrect checksum

As I re-read this, I realize it might look like the compressor behaves differently. I added a "tee $output | sha1sum; sha1sum $output" and ran it on a broken version. The checksums from the pipe and for the file on disk are different.

Assorted info:
This is a Raspberry Pi 4, with 4 GiB RAM, running Debian 12, bookworm, or a port.

mount.cifs version: 7.0

# cat /proc/sys/kernel/tainted
1024

# cat /proc/version
Linux version 6.2.0-3d78fe73f-v8-pronoiac+ (pronoiac@bisect) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #21 SMP PREEMPT Thu Sep 19 16:51:22 PDT 2024

DebugData: 
/proc/fs/cifs/DebugData
Display Internal CIFS Data Structures for Debugging
---------------------------------------------------
CIFS Version 2.41
Features: DFS,FSCACHE,STATS2,DEBUG,ALLOW_INSECURE_LEGACY,CIFS_POSIX,UPCALL(SPNEGO),XATTR,ACL
CIFSMaxBufSize: 16384
Active VFS Requests: 1

Servers:
1) ConnectionId: 0x1 Hostname: drums.local
Number of credits: 8062 Dialect 0x300
TCP status: 1 Instance: 1
Local Users To Server: 1 SecMode: 0x1 Req On Wire: 2
In Send: 1 In MaxReq Wait: 0

        Sessions:
        1) Address: 169.254.132.219 Uses: 1 Capability: 0x300047        Session Status: 1
        Security type: RawNTLMSSP  SessionId: 0x4969841e
        User: 1000 Cred User: 0

        Shares:
        0) IPC: \\drums.local\IPC$ Mounts: 1 DevInfo: 0x0 Attributes: 0x0
        PathComponentMax: 0 Status: 1 type: 0 Serial Number: 0x0
        Share Capabilities: None        Share Flags: 0x0
        tid: 0xeb093f0b Maximal Access: 0x1f00a9

        1) \\drums.local\billions Mounts: 1 DevInfo: 0x20 Attributes: 0x5007f
        PathComponentMax: 255 Status: 1 type: DISK Serial Number: 0x735a9af5
        Share Capabilities: None Aligned, Partition Aligned,    Share Flags: 0x0
        tid: 0x5e6832e6 Optimal sector size: 0x200      Maximal Access: 0x1f01ff

        MIDs:
        State: 2 com: 9 pid: 3117 cbdata: 00000000e003293e mid 962892

        State: 2 com: 9 pid: 3117 cbdata: 000000002610602a mid 962956

--

Let me know how I can help.
The process of iterating can take hours, and it's not automated, so my resources are limited.

#regzbot introduced: 607aea3cc2a8
#regzbot fix: 3ee1a1fc3981

-James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] Corruption on cifs / smb write on ARM, kernels 6.3-6.9
  2024-09-20 20:07 [REGRESSION] Corruption on cifs / smb write on ARM, kernels 6.3-6.9 James Young
@ 2024-09-22 23:55 ` Wang Yugui
  2024-09-23 19:36   ` james young
  0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2024-09-22 23:55 UTC (permalink / raw)
  To: James Young
  Cc: pronoiac+kernel, stable, regressions, linux-cifs, David Howells,
	linux-kernel, Steve French

Hi,

> I was benchmarking some compressors, piping to and from a network share on a NAS, and some consistently wrote corrupted data.
> 
> 
> First, apologies in advance:
> * if I'm not in the right place. I tried to follow the directions from the Regressions guide - https://www.kernel.org/doc/html/latest/admin-guide/reporting-regressions.html
> * I know there's a ton of context I don't know
> * I’m trying a different mail app, because the first one looked concussed with plain text. This might be worse.
> 
> 
> The detailed description:
> I was benchmarking some compressors on Debian on a Raspberry Pi, piping to and from a network share on a NAS, and found that some consistently had issues writing to my NAS. Specifically:
> * lzop
> * pigz - parallel gzip
> * pbzip2 - parallel bzip2
> 
> This is dependent on kernel version. I've done a survey, below.
> 
> While I tripped over the issue on a Debian port (Debian 12, bookworm, kernel v6.6), I compiled my own vanilla / mainline kernels for testing and reporting this.
> 
> 
> Even more details:
> The Pi and the Synology NAS are directly connected by Gigabit Ethernet. Both sides are using self-assigned IP addresses. I'll note that at boot, getting the Pi to see the NAS requires some nudging of avahi-autoipd; while I think it's stable before testing, I'm not positive, and reconnection issues might be in play.
> 
> The files in question are tars of sparse file systems, about 270 gig, compressing down to 10-30 gig.
> 
> Compression seems to work, without complaint; decompression crashes the process, usually within the first gig of the compressed file. The output of the stream doesn't match what ends up written to disk.
> 
> Trying decompression during compression gets further along than it does after compression finishes; this might point toward something with writes and caches.
> 
> A previous attempt involved rpi-update, which:
> * good: let me install kernels without building myself
> * bad: updated the bootloader and firmware, to bleeding edge, with possible regressions; it definitely muddied the results of my tests
> I started over with a fresh install, and no results involving rpi-update are included in this email.
> 
> 
> A survey of major branches:
> * 5.15.167, LTS - good
> * 6.1.109, LTS - good
> * 6.2.16 - good
> * 6.3.13 - bad
> * 6.4.16 - bad
> * 6.5.13 - bad
> * 6.6.50, LTS - bad
> * 6.7.12 - bad
> * 6.8.12 - bad
> * 6.9.12 - bad
> * 6.10.9 - good
> * 6.11.0 - good
> 
> I tried, but couldn't fully build 4.19.322 or 6.0.19, due to issues with modules.
> 
> 
> Important commits:
> It looked like both the breakage and the fix came in during rc1 releases.
> 
> Breakage, v6.3-rc1:
> I manually bisected commits in fs/smb* and fs/cifs.
> 
> 3d78fe73fa12 cifs: Build the RDMA SGE list directly from an iterator
> > lzop and pigz worked. last working. test in progress: pbzip2
> 
> 607aea3cc2a8 cifs: Remove unused code
> > lzop didn't work. first broken
> 
> 
> Fix, v6.10-rc1:
> I manually bisected commits in fs/smb.
> 
> 69c3c023af25 cifs: Implement netfslib hooks
> > lzop didn't work. last broken one
> 
> 3ee1a1fc3981 cifs: Cut over to using netfslib
> > lzop, pigz, pbzip2, all worked. first fixed one
> 
> 
> To test / reproduce:
> It looks like this, on a mounted network share, with extra pv for progress meters:
> 
> cat 1tb-rust-ext4.img.tar.gz | \
>   gzip -d | \
>   lzop -1 > \
>   1tb-rust-ext4.img.tar.lzop
>   # wait 40 minutes
> 
> cat 1tb-rust-ext4.img.tar.lzop | \
>   lzop -d | \
>   sha1sum
>   # either it works, and shows the right checksum
>   # or it crashes early, due to a corrupt file, and shows an incorrect checksum
> 
> As I re-read this, I realize it might look like the compressor behaves differently. I added a "tee $output | sha1sum; sha1sum $output" and ran it on a broken version. The checksums from the pipe and for the file on disk are different.
> 
> 
> Assorted info:
> This is a Raspberry Pi 4, with 4 GiB RAM, running Debian 12, bookworm, or a port.
> 
> mount.cifs version: 7.0
> 
> # cat /proc/sys/kernel/tainted
> 1024
> 
> # cat /proc/version
> Linux version 6.2.0-3d78fe73f-v8-pronoiac+ (pronoiac@bisect) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #21 SMP PREEMPT Thu Sep 19 16:51:22 PDT 2024
> 
> 
> DebugData: 
> /proc/fs/cifs/DebugData
> Display Internal CIFS Data Structures for Debugging
> ---------------------------------------------------
> CIFS Version 2.41
> Features: DFS,FSCACHE,STATS2,DEBUG,ALLOW_INSECURE_LEGACY,CIFS_POSIX,UPCALL(SPNEGO),XATTR,ACL
> CIFSMaxBufSize: 16384
> Active VFS Requests: 1
> 
> Servers:
> 1) ConnectionId: 0x1 Hostname: drums.local
> Number of credits: 8062 Dialect 0x300
> TCP status: 1 Instance: 1
> Local Users To Server: 1 SecMode: 0x1 Req On Wire: 2
> In Send: 1 In MaxReq Wait: 0
> 
>         Sessions:
>         1) Address: 169.254.132.219 Uses: 1 Capability: 0x300047        Session Status: 1
>         Security type: RawNTLMSSP  SessionId: 0x4969841e
>         User: 1000 Cred User: 0
> 
>         Shares:
>         0) IPC: \\drums.local\IPC$ Mounts: 1 DevInfo: 0x0 Attributes: 0x0
>         PathComponentMax: 0 Status: 1 type: 0 Serial Number: 0x0
>         Share Capabilities: None        Share Flags: 0x0
>         tid: 0xeb093f0b Maximal Access: 0x1f00a9
> 
>         1) \\drums.local\billions Mounts: 1 DevInfo: 0x20 Attributes: 0x5007f
>         PathComponentMax: 255 Status: 1 type: DISK Serial Number: 0x735a9af5
>         Share Capabilities: None Aligned, Partition Aligned,    Share Flags: 0x0
>         tid: 0x5e6832e6 Optimal sector size: 0x200      Maximal Access: 0x1f01ff
> 
> 
>         MIDs:
>         State: 2 com: 9 pid: 3117 cbdata: 00000000e003293e mid 962892
> 
>         State: 2 com: 9 pid: 3117 cbdata: 000000002610602a mid 962956
> 
> --
> 
> 
> 
> Let me know how I can help.
> The process of iterating can take hours, and it's not automated, so my resources are limited.
> 
> #regzbot introduced: 607aea3cc2a8
> #regzbot fix: 3ee1a1fc3981

I checked 607aea3cc2a8, it just removed some code in #if 0 ... #endif.
so this regression is not introduced in 607aea3cc2a8,  but the reproduce
frequency is changed here.


Another issue in 6.6.y maybe related
https://lore.kernel.org/linux-fsdevel/9e8f8872-f51b-4a09-a92c-49218748dd62@meta.com/T/

Do this regression still happen after the following patches are applied?

a60cc288a1a2 :Luis Chamberlain: test_xarray: add tests for advanced multi-index use
a08c7193e4f1 :Sidhartha Kumar: mm/filemap: remove hugetlb special casing in filemap.c
6212eb4d7a63 :Hongbo Li: mm/filemap: avoid type conversion

de60fd8ddeda :Kairui Song: mm/filemap: return early if failed to allocate memory for split
b2ebcf9d3d5a :Kairui Song: mm/filemap: clean up hugetlb exclusion code
a4864671ca0b :Kairui Song: lib/xarray: introduce a new helper xas_get_order
6758c1128ceb :Kairui Song: mm/filemap: optimize filemap folio adding


Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2024/09/23


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] Corruption on cifs / smb write on ARM, kernels 6.3-6.9
  2024-09-22 23:55 ` Wang Yugui
@ 2024-09-23 19:36   ` james young
  2024-09-25  4:35     ` james young
  0 siblings, 1 reply; 5+ messages in thread
From: james young @ 2024-09-23 19:36 UTC (permalink / raw)
  To: Wang Yugui
  Cc: pronoiac+kernel, stable, regressions, linux-cifs, David Howells,
	linux-kernel, Steve French

Hey there -

On Sun, Sep 22, 2024 at 4:55 PM Wang Yugui <wangyugui@e16-tech.com> wrote:
>
> Hi,
>
> > I was benchmarking some compressors, piping to and from a network share on a NAS, and some consistently wrote corrupted data.

> > Important commits:
> > It looked like both the breakage and the fix came in during rc1 releases.
> >
> > Breakage, v6.3-rc1:
> > I manually bisected commits in fs/smb* and fs/cifs.
> >
> > 3d78fe73fa12 cifs: Build the RDMA SGE list directly from an iterator
> > > lzop and pigz worked. last working. test in progress: pbzip2

This is a first for me: lzop was fine, but pbzip2 still had issues,
roughly a clock hour into compression. (When lzop has issues, it's
usually within a minute or two.)


> > 607aea3cc2a8 cifs: Remove unused code
> > > lzop didn't work. first broken
> >
> >
> > Fix, v6.10-rc1:
> > I manually bisected commits in fs/smb.
> >
> > 69c3c023af25 cifs: Implement netfslib hooks
> > > lzop didn't work. last broken one
> >
> > 3ee1a1fc3981 cifs: Cut over to using netfslib
> > > lzop, pigz, pbzip2, all worked. first fixed one

> I checked 607aea3cc2a8, it just removed some code in #if 0 ... #endif.
> so this regression is not introduced in 607aea3cc2a8,  but the reproduce
> frequency is changed here.

I agree. The pbzip2 results above, regarding the break bisection I
landed on: they mark when it became more of an issue, but not when it
started.

I could re-run tests and dig into possible false negatives. It'll be
slower going, though.


> Another issue in 6.6.y maybe related
> https://lore.kernel.org/linux-fsdevel/9e8f8872-f51b-4a09-a92c-49218748dd62@meta.com/T/

In comparison: I'm relieved that my issue is something that can be
tested within hours, on one device.


> Do this regression still happen after the following patches are applied?
>
> a60cc288a1a2 :Luis Chamberlain: test_xarray: add tests for advanced multi-index use
> a08c7193e4f1 :Sidhartha Kumar: mm/filemap: remove hugetlb special casing in filemap.c
> 6212eb4d7a63 :Hongbo Li: mm/filemap: avoid type conversion
>
> de60fd8ddeda :Kairui Song: mm/filemap: return early if failed to allocate memory for split
> b2ebcf9d3d5a :Kairui Song: mm/filemap: clean up hugetlb exclusion code
> a4864671ca0b :Kairui Song: lib/xarray: introduce a new helper xas_get_order
> 6758c1128ceb :Kairui Song: mm/filemap: optimize filemap folio adding

No luck: I cherry-picked those commits into 6.6.52, and upon testing
lzop, the file didn't match the stream, and decompression failed.

Thank you for investigating, and giving me something to try!

-James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] Corruption on cifs / smb write on ARM, kernels 6.3-6.9
  2024-09-23 19:36   ` james young
@ 2024-09-25  4:35     ` james young
  2024-09-28 16:37       ` james young
  0 siblings, 1 reply; 5+ messages in thread
From: james young @ 2024-09-25  4:35 UTC (permalink / raw)
  To: Wang Yugui
  Cc: pronoiac+kernel, stable, regressions, linux-cifs, David Howells,
	linux-kernel, Steve French, smfrench

On request:
* adding another cc for Steven
* I tested 6.6.52, without any extra commits: it was bad.

-James

On Mon, Sep 23, 2024 at 12:36 PM james young <pronoiac@gmail.com> wrote:
>
> Hey there -
>
> On Sun, Sep 22, 2024 at 4:55 PM Wang Yugui <wangyugui@e16-tech.com> wrote:
> >
> > Hi,
> >
> > > I was benchmarking some compressors, piping to and from a network share on a NAS, and some consistently wrote corrupted data.
>
> > > Important commits:
> > > It looked like both the breakage and the fix came in during rc1 releases.
> > >
> > > Breakage, v6.3-rc1:
> > > I manually bisected commits in fs/smb* and fs/cifs.
> > >
> > > 3d78fe73fa12 cifs: Build the RDMA SGE list directly from an iterator
> > > > lzop and pigz worked. last working. test in progress: pbzip2
>
> This is a first for me: lzop was fine, but pbzip2 still had issues,
> roughly a clock hour into compression. (When lzop has issues, it's
> usually within a minute or two.)
>
>
> > > 607aea3cc2a8 cifs: Remove unused code
> > > > lzop didn't work. first broken
> > >
> > >
> > > Fix, v6.10-rc1:
> > > I manually bisected commits in fs/smb.
> > >
> > > 69c3c023af25 cifs: Implement netfslib hooks
> > > > lzop didn't work. last broken one
> > >
> > > 3ee1a1fc3981 cifs: Cut over to using netfslib
> > > > lzop, pigz, pbzip2, all worked. first fixed one
>
> > I checked 607aea3cc2a8, it just removed some code in #if 0 ... #endif.
> > so this regression is not introduced in 607aea3cc2a8,  but the reproduce
> > frequency is changed here.
>
> I agree. The pbzip2 results above, regarding the break bisection I
> landed on: they mark when it became more of an issue, but not when it
> started.
>
> I could re-run tests and dig into possible false negatives. It'll be
> slower going, though.
>
>
> > Another issue in 6.6.y maybe related
> > https://lore.kernel.org/linux-fsdevel/9e8f8872-f51b-4a09-a92c-49218748dd62@meta.com/T/
>
> In comparison: I'm relieved that my issue is something that can be
> tested within hours, on one device.
>
>
> > Do this regression still happen after the following patches are applied?
> >
> > a60cc288a1a2 :Luis Chamberlain: test_xarray: add tests for advanced multi-index use
> > a08c7193e4f1 :Sidhartha Kumar: mm/filemap: remove hugetlb special casing in filemap.c
> > 6212eb4d7a63 :Hongbo Li: mm/filemap: avoid type conversion
> >
> > de60fd8ddeda :Kairui Song: mm/filemap: return early if failed to allocate memory for split
> > b2ebcf9d3d5a :Kairui Song: mm/filemap: clean up hugetlb exclusion code
> > a4864671ca0b :Kairui Song: lib/xarray: introduce a new helper xas_get_order
> > 6758c1128ceb :Kairui Song: mm/filemap: optimize filemap folio adding
>
> No luck: I cherry-picked those commits into 6.6.52, and upon testing
> lzop, the file didn't match the stream, and decompression failed.
>
> Thank you for investigating, and giving me something to try!
>
> -James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] Corruption on cifs / smb write on ARM, kernels 6.3-6.9
  2024-09-25  4:35     ` james young
@ 2024-09-28 16:37       ` james young
  0 siblings, 0 replies; 5+ messages in thread
From: james young @ 2024-09-28 16:37 UTC (permalink / raw)
  To: Wang Yugui
  Cc: pronoiac+kernel, stable, regressions, linux-cifs, David Howells,
	linux-kernel, Steve French, smfrench

I retraced my steps:
* looking for the breaking commit, between 6.2 and 6.3-rc1
* I switched to checksumming the stream and the written file; this can
save time, compared to decompression
* I checked for lzop, pigz, and pbzip2

So, breakage. I landed on different commits:
last working commit. ok: lzop, pigz, pbzip2.
16541195c6d9 cifs: Add a function to read into an iter from a socket

first broken commit. lzop failed.
d08089f649a0 cifs: Change the I/O paths to use an iterator rather than
a page list

That broken commit is right before my previous "last good" and "break".

I'm seeing some inconsistencies. I'd *thought* I was careful with dtb
files and .config; I might have dropped the ball occasionally, or
there's something else, I don't know what, that I'm stumbling over.

To check for marginal hardware, I tried another Raspberry Pi 4. I
verified baseline 6.6.52 didn't work there, and stopped there. It
doesn't have any cooling; it *almost certainly* would throttle for
thermal reasons, but I didn't want to push it.

-James


On Tue, Sep 24, 2024 at 9:35 PM james young <pronoiac@gmail.com> wrote:
>
> On request:
> * adding another cc for Steven
> * I tested 6.6.52, without any extra commits: it was bad.
>
> -James
>
> On Mon, Sep 23, 2024 at 12:36 PM james young <pronoiac@gmail.com> wrote:
> >
> > Hey there -
> >
> > On Sun, Sep 22, 2024 at 4:55 PM Wang Yugui <wangyugui@e16-tech.com> wrote:
> > >
> > > Hi,
> > >
> > > > I was benchmarking some compressors, piping to and from a network share on a NAS, and some consistently wrote corrupted data.
> >
> > > > Important commits:
> > > > It looked like both the breakage and the fix came in during rc1 releases.
> > > >
> > > > Breakage, v6.3-rc1:
> > > > I manually bisected commits in fs/smb* and fs/cifs.
> > > >
> > > > 3d78fe73fa12 cifs: Build the RDMA SGE list directly from an iterator
> > > > > lzop and pigz worked. last working. test in progress: pbzip2
> >
> > This is a first for me: lzop was fine, but pbzip2 still had issues,
> > roughly a clock hour into compression. (When lzop has issues, it's
> > usually within a minute or two.)
> >
> >
> > > > 607aea3cc2a8 cifs: Remove unused code
> > > > > lzop didn't work. first broken
> > > >
> > > >
> > > > Fix, v6.10-rc1:
> > > > I manually bisected commits in fs/smb.
> > > >
> > > > 69c3c023af25 cifs: Implement netfslib hooks
> > > > > lzop didn't work. last broken one
> > > >
> > > > 3ee1a1fc3981 cifs: Cut over to using netfslib
> > > > > lzop, pigz, pbzip2, all worked. first fixed one
> >
> > > I checked 607aea3cc2a8, it just removed some code in #if 0 ... #endif.
> > > so this regression is not introduced in 607aea3cc2a8,  but the reproduce
> > > frequency is changed here.
> >
> > I agree. The pbzip2 results above, regarding the break bisection I
> > landed on: they mark when it became more of an issue, but not when it
> > started.
> >
> > I could re-run tests and dig into possible false negatives. It'll be
> > slower going, though.
> >
> >
> > > Another issue in 6.6.y maybe related
> > > https://lore.kernel.org/linux-fsdevel/9e8f8872-f51b-4a09-a92c-49218748dd62@meta.com/T/
> >
> > In comparison: I'm relieved that my issue is something that can be
> > tested within hours, on one device.
> >
> >
> > > Do this regression still happen after the following patches are applied?
> > >
> > > a60cc288a1a2 :Luis Chamberlain: test_xarray: add tests for advanced multi-index use
> > > a08c7193e4f1 :Sidhartha Kumar: mm/filemap: remove hugetlb special casing in filemap.c
> > > 6212eb4d7a63 :Hongbo Li: mm/filemap: avoid type conversion
> > >
> > > de60fd8ddeda :Kairui Song: mm/filemap: return early if failed to allocate memory for split
> > > b2ebcf9d3d5a :Kairui Song: mm/filemap: clean up hugetlb exclusion code
> > > a4864671ca0b :Kairui Song: lib/xarray: introduce a new helper xas_get_order
> > > 6758c1128ceb :Kairui Song: mm/filemap: optimize filemap folio adding
> >
> > No luck: I cherry-picked those commits into 6.6.52, and upon testing
> > lzop, the file didn't match the stream, and decompression failed.
> >
> > Thank you for investigating, and giving me something to try!
> >
> > -James

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-09-28 16:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-20 20:07 [REGRESSION] Corruption on cifs / smb write on ARM, kernels 6.3-6.9 James Young
2024-09-22 23:55 ` Wang Yugui
2024-09-23 19:36   ` james young
2024-09-25  4:35     ` james young
2024-09-28 16:37       ` james young

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox