6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
@ 2024-06-11  7:20 Thomas Voegtle
  2024-06-12 12:50 ` Greg KH
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Voegtle @ 2024-06-11  7:20 UTC (permalink / raw)
  To: stable, David Howells, Steve French

Hello,

a machine booted with Linux 6.6.23 up to 6.6.32:

writing /dev/zero with dd on a mounted cifs share with vers=1.0 or
vers=2.0 slows down drastically in my setup after writing approx. 46GB of
data.

The whole machine gets unresponsive as it was under very high IO load. It 
pings but opening a new ssh session needs too much time. I can stop the dd 
(ctrl-c) and after a few minutes the machine is fine again.

cifs with vers=3.1.1 seems to be fine with 6.6.32.
Linux 6.10-rc3 is fine with vers=1.0 and vers=2.0.

Bisected down to:

cifs-fix-writeback-data-corruption.patch
which is:
Upstream commit f3dc1bdb6b0b0693562c7c54a6c28bafa608ba3c
and
linux-stable commit e45deec35bf7f1f4f992a707b2d04a8c162f2240

Reverting this patch on 6.6.32 fixes the problem for me.

       Thomas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
  2024-06-11  7:20 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0 Thomas Voegtle
@ 2024-06-12 12:50 ` Greg KH
  2024-06-12 14:44   ` Thomas Voegtle
  0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2024-06-12 12:50 UTC (permalink / raw)
  To: Thomas Voegtle; +Cc: stable, David Howells, Steve French

On Tue, Jun 11, 2024 at 09:20:33AM +0200, Thomas Voegtle wrote:
> 
> Hello,
> 
> a machine booted with Linux 6.6.23 up to 6.6.32:
> 
> writing /dev/zero with dd on a mounted cifs share with vers=1.0 or
> vers=2.0 slows down drastically in my setup after writing approx. 46GB of
> data.
> 
> The whole machine gets unresponsive as it was under very high IO load. It
> pings but opening a new ssh session needs too much time. I can stop the dd
> (ctrl-c) and after a few minutes the machine is fine again.
> 
> cifs with vers=3.1.1 seems to be fine with 6.6.32.
> Linux 6.10-rc3 is fine with vers=1.0 and vers=2.0.
> 
> Bisected down to:
> 
> cifs-fix-writeback-data-corruption.patch
> which is:
> Upstream commit f3dc1bdb6b0b0693562c7c54a6c28bafa608ba3c
> and
> linux-stable commit e45deec35bf7f1f4f992a707b2d04a8c162f2240
> 
> Reverting this patch on 6.6.32 fixes the problem for me.

Odd, that commit is kind of needed :(

Is there some later commit that resolves the issue here that we should
pick up for the stable trees?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
  2024-06-12 12:50 ` Greg KH
@ 2024-06-12 14:44   ` Thomas Voegtle
  2024-06-12 14:53     ` Greg KH
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Voegtle @ 2024-06-12 14:44 UTC (permalink / raw)
  To: Greg KH; +Cc: Thomas Voegtle, stable, David Howells, Steve French

On Wed, 12 Jun 2024, Greg KH wrote:

> On Tue, Jun 11, 2024 at 09:20:33AM +0200, Thomas Voegtle wrote:
>>
>> Hello,
>>
>> a machine booted with Linux 6.6.23 up to 6.6.32:
>>
>> writing /dev/zero with dd on a mounted cifs share with vers=1.0 or
>> vers=2.0 slows down drastically in my setup after writing approx. 46GB of
>> data.
>>
>> The whole machine gets unresponsive as it was under very high IO load. It
>> pings but opening a new ssh session needs too much time. I can stop the dd
>> (ctrl-c) and after a few minutes the machine is fine again.
>>
>> cifs with vers=3.1.1 seems to be fine with 6.6.32.
>> Linux 6.10-rc3 is fine with vers=1.0 and vers=2.0.
>>
>> Bisected down to:
>>
>> cifs-fix-writeback-data-corruption.patch
>> which is:
>> Upstream commit f3dc1bdb6b0b0693562c7c54a6c28bafa608ba3c
>> and
>> linux-stable commit e45deec35bf7f1f4f992a707b2d04a8c162f2240
>>
>> Reverting this patch on 6.6.32 fixes the problem for me.
>
> Odd, that commit is kind of needed :(
>
> Is there some later commit that resolves the issue here that we should
> pick up for the stable trees?
>

Hope this helps:

Linux 6.9.4 is broken in the same way and so is 6.9.0.



        Thomas





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
  2024-06-12 14:44   ` Thomas Voegtle
@ 2024-06-12 14:53     ` Greg KH
  2024-06-12 17:38       ` [EXTERNAL] " Steven French
  0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2024-06-12 14:53 UTC (permalink / raw)
  To: Thomas Voegtle; +Cc: stable, David Howells, Steve French

On Wed, Jun 12, 2024 at 04:44:27PM +0200, Thomas Voegtle wrote:
> On Wed, 12 Jun 2024, Greg KH wrote:
> 
> > On Tue, Jun 11, 2024 at 09:20:33AM +0200, Thomas Voegtle wrote:
> > > 
> > > Hello,
> > > 
> > > a machine booted with Linux 6.6.23 up to 6.6.32:
> > > 
> > > writing /dev/zero with dd on a mounted cifs share with vers=1.0 or
> > > vers=2.0 slows down drastically in my setup after writing approx. 46GB of
> > > data.
> > > 
> > > The whole machine gets unresponsive as it was under very high IO load. It
> > > pings but opening a new ssh session needs too much time. I can stop the dd
> > > (ctrl-c) and after a few minutes the machine is fine again.
> > > 
> > > cifs with vers=3.1.1 seems to be fine with 6.6.32.
> > > Linux 6.10-rc3 is fine with vers=1.0 and vers=2.0.
> > > 
> > > Bisected down to:
> > > 
> > > cifs-fix-writeback-data-corruption.patch
> > > which is:
> > > Upstream commit f3dc1bdb6b0b0693562c7c54a6c28bafa608ba3c
> > > and
> > > linux-stable commit e45deec35bf7f1f4f992a707b2d04a8c162f2240
> > > 
> > > Reverting this patch on 6.6.32 fixes the problem for me.
> > 
> > Odd, that commit is kind of needed :(
> > 
> > Is there some later commit that resolves the issue here that we should
> > pick up for the stable trees?
> > 
> 
> Hope this helps:
> 
> Linux 6.9.4 is broken in the same way and so is 6.9.0.

How about Linus's tree?

thnanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
  2024-06-12 14:53     ` Greg KH
@ 2024-06-12 17:38       ` Steven French
  2024-06-12 19:21         ` Thomas Voegtle
  0 siblings, 1 reply; 9+ messages in thread
From: Steven French @ 2024-06-12 17:38 UTC (permalink / raw)
  To: Greg KH, Thomas Voegtle
  Cc: stable@vger.kernel.org, David Howells, smfrench@gmail.com

Thanks for catching this - I found at least one case (even if we don't want to ever encourage anyone to mount with these old dialects) where I was able to repro a dd hang.

I tried some experiments with both 6.10-rc2 and with 6.8 and don't see a performance degradation with this, but there are some cases with SMB1 where performance hit might be expected (if rsize or wsize is negotiated to very small size, modern dialects support larger default wsize and rsize).  I just did try an experiment with vers=1.0 and 6.6.33 and did reproduce a problem though so am looking into that now (I see session disconnected part way through the copy in /proc/fs/cifs/DebugData - do you see the same thing).   I am not seeing an issue with normal modern dialects though but I will take a look and see if we can narrow down what is happening in this old smb1 path.

Can you check two things:
1) what is the wsize and rsize that was negotiation ("mount | grep cifs") will show this?
2) what is the server type?

The repro I tried was "dd if=/dev/zero of=/mnt1/48GB bs=4MB count=12000" and so far vers=1.0 to 6.6.33 to Samba (ksmbd does not support the older less secure dialects) was the only repro

-----Original Message-----
From: Greg KH <gregkh@linuxfoundation.org> 
Sent: Wednesday, June 12, 2024 9:53 AM
To: Thomas Voegtle <tv@lio96.de>
Cc: stable@vger.kernel.org; David Howells <dhowells@redhat.com>; Steven French <Steven.French@microsoft.com>
Subject: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0

On Wed, Jun 12, 2024 at 04:44:27PM +0200, Thomas Voegtle wrote:
> On Wed, 12 Jun 2024, Greg KH wrote:
> 
> > On Tue, Jun 11, 2024 at 09:20:33AM +0200, Thomas Voegtle wrote:
> > > 
> > > Hello,
> > > 
> > > a machine booted with Linux 6.6.23 up to 6.6.32:
> > > 
> > > writing /dev/zero with dd on a mounted cifs share with vers=1.0 or
> > > vers=2.0 slows down drastically in my setup after writing approx. 
> > > 46GB of data.
> > > 
> > > The whole machine gets unresponsive as it was under very high IO 
> > > load. It pings but opening a new ssh session needs too much time. 
> > > I can stop the dd
> > > (ctrl-c) and after a few minutes the machine is fine again.
> > > 
> > > cifs with vers=3.1.1 seems to be fine with 6.6.32.
> > > Linux 6.10-rc3 is fine with vers=1.0 and vers=2.0.
> > > 
> > > Bisected down to:
> > > 
> > > cifs-fix-writeback-data-corruption.patch
> > > which is:
> > > Upstream commit f3dc1bdb6b0b0693562c7c54a6c28bafa608ba3c
> > > and
> > > linux-stable commit e45deec35bf7f1f4f992a707b2d04a8c162f2240
> > > 
> > > Reverting this patch on 6.6.32 fixes the problem for me.
> > 
> > Odd, that commit is kind of needed :(
> > 
> > Is there some later commit that resolves the issue here that we 
> > should pick up for the stable trees?
> > 
> 
> Hope this helps:
> 
> Linux 6.9.4 is broken in the same way and so is 6.9.0.

How about Linus's tree?

thnanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
  2024-06-12 17:38       ` [EXTERNAL] " Steven French
@ 2024-06-12 19:21         ` Thomas Voegtle
  2024-06-13 18:38           ` Steven French
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Voegtle @ 2024-06-12 19:21 UTC (permalink / raw)
  To: Steven French
  Cc: Greg KH, stable@vger.kernel.org, David Howells,
	smfrench@gmail.com

On Wed, 12 Jun 2024, Steven French wrote:

> Thanks for catching this - I found at least one case (even if we don't 
> want to ever encourage anyone to mount with these old dialects) where I 
> was able to repro a dd hang.
>
> I tried some experiments with both 6.10-rc2 and with 6.8 and don't see a 
> performance degradation with this, but there are some cases with SMB1 
> where performance hit might be expected (if rsize or wsize is negotiated 
> to very small size, modern dialects support larger default wsize and 
> rsize).  I just did try an experiment with vers=1.0 and 6.6.33 and did 
> reproduce a problem though so am looking into that now (I see session 
> disconnected part way through the copy in /proc/fs/cifs/DebugData - do 
> you see the same thing).  I am not seeing an issue with normal modern

You mean this stuff:
         MIDs:
         Server ConnectionId: 0x6
                 State: 2 com: 9 pid: 10 cbdata: 00000000c583976f mid 
309943
                 State: 2 com: 9 pid: 10 cbdata: 0000000085b5bf16 mid 
309944
                 State: 2 com: 9 pid: 10 cbdata: 000000008b353163 mid 
309945
                 State: 2 com: 9 pid: 10 cbdata: 00000000898b6503 mid 
309946
...

Yes, can see that.


> dialects though but I will take a look and see if we can narrow down 
> what is happening in this old smb1 path.
>
> Can you check two things:
> 1) what is the wsize and rsize that was negotiation ("mount | grep cifs") will show this?

rsize=65536,wsize=65536 with vers=2.0

rsize=1048576,wsize=65536 with vers=1.0

> 2) what is the server type?

That is an older Samba Server 4.9.18 with a bunch of patches (Debian?).
I can test with several Windows Server versions if you like.


>
> The repro I tried was "dd if=/dev/zero of=/mnt1/48GB bs=4MB count=12000" 
> and so far vers=1.0 to 6.6.33 to Samba (ksmbd does not support the older 
> less secure dialects) was the only repro

For vers=2.0 it needs a few GB more to hit the problem. In my setup 
it is 58GB with Linux 6.9.0. I know. It's weird.


              Thomas



>
> -----Original Message-----
> From: Greg KH <gregkh@linuxfoundation.org>
> Sent: Wednesday, June 12, 2024 9:53 AM
> To: Thomas Voegtle <tv@lio96.de>
> Cc: stable@vger.kernel.org; David Howells <dhowells@redhat.com>; Steven French <Steven.French@microsoft.com>
> Subject: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
>
> On Wed, Jun 12, 2024 at 04:44:27PM +0200, Thomas Voegtle wrote:
>> On Wed, 12 Jun 2024, Greg KH wrote:
>>
>>> On Tue, Jun 11, 2024 at 09:20:33AM +0200, Thomas Voegtle wrote:
>>>>
>>>> Hello,
>>>>
>>>> a machine booted with Linux 6.6.23 up to 6.6.32:
>>>>
>>>> writing /dev/zero with dd on a mounted cifs share with vers=1.0 or
>>>> vers=2.0 slows down drastically in my setup after writing approx.
>>>> 46GB of data.
>>>>
>>>> The whole machine gets unresponsive as it was under very high IO
>>>> load. It pings but opening a new ssh session needs too much time.
>>>> I can stop the dd
>>>> (ctrl-c) and after a few minutes the machine is fine again.
>>>>
>>>> cifs with vers=3.1.1 seems to be fine with 6.6.32.
>>>> Linux 6.10-rc3 is fine with vers=1.0 and vers=2.0.
>>>>
>>>> Bisected down to:
>>>>
>>>> cifs-fix-writeback-data-corruption.patch
>>>> which is:
>>>> Upstream commit f3dc1bdb6b0b0693562c7c54a6c28bafa608ba3c
>>>> and
>>>> linux-stable commit e45deec35bf7f1f4f992a707b2d04a8c162f2240
>>>>
>>>> Reverting this patch on 6.6.32 fixes the problem for me.
>>>
>>> Odd, that commit is kind of needed :(
>>>
>>> Is there some later commit that resolves the issue here that we
>>> should pick up for the stable trees?
>>>
>>
>> Hope this helps:
>>
>> Linux 6.9.4 is broken in the same way and so is 6.9.0.
>
> How about Linus's tree?
>
> thnanks,
>
> greg k-h
>
>

       Thomas

-- 
  Thomas V


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
  2024-06-12 19:21         ` Thomas Voegtle
@ 2024-06-13 18:38           ` Steven French
  2024-06-13 19:21             ` Thomas Voegtle
  0 siblings, 1 reply; 9+ messages in thread
From: Steven French @ 2024-06-13 18:38 UTC (permalink / raw)
  To: Thomas Voegtle
  Cc: Greg KH, stable@vger.kernel.org, David Howells,
	smfrench@gmail.com

I haven't been able to repro the problem today with vers=1.0 (with 6.6.33 or 6.9.2) mounted to Samba so was wondering.

For the "vers=1.0" and "vers=2.0" cases where you saw a failure can you "cat /proc/fs/cifs/Stats | grep reconnect" to see if there were network disconnect/reconnects during the copy.

And also for the "vers=2.0" failure case you reported (which I have been unable to repro the failure to) could you do "cat /proc/fs/cifs/Stats | grep Writes" so we can see if any failed writes in that scenario.

And can you paste the exact dd command you are running (I have been trying the copy various ways with dd and bs=1MB or bs=4M) in case that is why I am having trouble reproducing it.


-----Original Message-----
From: Thomas Voegtle <tv@lio96.de> 
Sent: Wednesday, June 12, 2024 2:21 PM
To: Steven French <Steven.French@microsoft.com>
Cc: Greg KH <gregkh@linuxfoundation.org>; stable@vger.kernel.org; David Howells <dhowells@redhat.com>; smfrench@gmail.com
Subject: RE: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0

[You don't often get email from tv@lio96.de. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

On Wed, 12 Jun 2024, Steven French wrote:

> Thanks for catching this - I found at least one case (even if we don't 
> want to ever encourage anyone to mount with these old dialects) where 
> I was able to repro a dd hang.
>
> I tried some experiments with both 6.10-rc2 and with 6.8 and don't see 
> a performance degradation with this, but there are some cases with 
> SMB1 where performance hit might be expected (if rsize or wsize is 
> negotiated to very small size, modern dialects support larger default 
> wsize and rsize).  I just did try an experiment with vers=1.0 and 
> 6.6.33 and did reproduce a problem though so am looking into that now 
> (I see session disconnected part way through the copy in 
> /proc/fs/cifs/DebugData - do you see the same thing).  I am not seeing 
> an issue with normal modern

You mean this stuff:
         MIDs:
         Server ConnectionId: 0x6
                 State: 2 com: 9 pid: 10 cbdata: 00000000c583976f mid
309943
                 State: 2 com: 9 pid: 10 cbdata: 0000000085b5bf16 mid
309944
                 State: 2 com: 9 pid: 10 cbdata: 000000008b353163 mid
309945
                 State: 2 com: 9 pid: 10 cbdata: 00000000898b6503 mid
309946
...

Yes, can see that.


> dialects though but I will take a look and see if we can narrow down 
> what is happening in this old smb1 path.
>
> Can you check two things:
> 1) what is the wsize and rsize that was negotiation ("mount | grep cifs") will show this?

rsize=65536,wsize=65536 with vers=2.0

rsize=1048576,wsize=65536 with vers=1.0

> 2) what is the server type?

That is an older Samba Server 4.9.18 with a bunch of patches (Debian?).
I can test with several Windows Server versions if you like.


>
> The repro I tried was "dd if=/dev/zero of=/mnt1/48GB bs=4MB count=12000"
> and so far vers=1.0 to 6.6.33 to Samba (ksmbd does not support the 
> older less secure dialects) was the only repro

For vers=2.0 it needs a few GB more to hit the problem. In my setup it is 58GB with Linux 6.9.0. I know. It's weird.


              Thomas



>
> -----Original Message-----
> From: Greg KH <gregkh@linuxfoundation.org>
> Sent: Wednesday, June 12, 2024 9:53 AM
> To: Thomas Voegtle <tv@lio96.de>
> Cc: stable@vger.kernel.org; David Howells <dhowells@redhat.com>; 
> Steven French <Steven.French@microsoft.com>
> Subject: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big 
> files with vers=1.0 and 2.0
>
> On Wed, Jun 12, 2024 at 04:44:27PM +0200, Thomas Voegtle wrote:
>> On Wed, 12 Jun 2024, Greg KH wrote:
>>
>>> On Tue, Jun 11, 2024 at 09:20:33AM +0200, Thomas Voegtle wrote:
>>>>
>>>> Hello,
>>>>
>>>> a machine booted with Linux 6.6.23 up to 6.6.32:
>>>>
>>>> writing /dev/zero with dd on a mounted cifs share with vers=1.0 or
>>>> vers=2.0 slows down drastically in my setup after writing approx.
>>>> 46GB of data.
>>>>
>>>> The whole machine gets unresponsive as it was under very high IO 
>>>> load. It pings but opening a new ssh session needs too much time.
>>>> I can stop the dd
>>>> (ctrl-c) and after a few minutes the machine is fine again.
>>>>
>>>> cifs with vers=3.1.1 seems to be fine with 6.6.32.
>>>> Linux 6.10-rc3 is fine with vers=1.0 and vers=2.0.
>>>>
>>>> Bisected down to:
>>>>
>>>> cifs-fix-writeback-data-corruption.patch
>>>> which is:
>>>> Upstream commit f3dc1bdb6b0b0693562c7c54a6c28bafa608ba3c
>>>> and
>>>> linux-stable commit e45deec35bf7f1f4f992a707b2d04a8c162f2240
>>>>
>>>> Reverting this patch on 6.6.32 fixes the problem for me.
>>>
>>> Odd, that commit is kind of needed :(
>>>
>>> Is there some later commit that resolves the issue here that we 
>>> should pick up for the stable trees?
>>>
>>
>> Hope this helps:
>>
>> Linux 6.9.4 is broken in the same way and so is 6.9.0.
>
> How about Linus's tree?
>
> thnanks,
>
> greg k-h
>
>

       Thomas

--
  Thomas V


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
  2024-06-13 18:38           ` Steven French
@ 2024-06-13 19:21             ` Thomas Voegtle
  2024-06-13 20:07               ` Steven French
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Voegtle @ 2024-06-13 19:21 UTC (permalink / raw)
  To: Steven French
  Cc: Greg KH, stable@vger.kernel.org, David Howells,
	smfrench@gmail.com

On Thu, 13 Jun 2024, Steven French wrote:

> I haven't been able to repro the problem today with vers=1.0 (with 
> 6.6.33 or 6.9.2) mounted to Samba so was wondering.
>
> For the "vers=1.0" and "vers=2.0" cases where you saw a failure can you 
> "cat /proc/fs/cifs/Stats | grep reconnect" to see if there were network 
> disconnect/reconnects during the copy.
>
> And also for the "vers=2.0" failure case you reported (which I have been 
> unable to repro the failure to) could you do "cat /proc/fs/cifs/Stats | 
> grep Writes" so we can see if any failed writes in that scenario.

On Linux 6.9.0 and vers=2.0 while hitting the bug and getting slower:

cat /proc/fs/cifs/Stats | grep -E 'Writes|reconnect' ; uptime
0 session 0 share reconnects
Writes: 887694 Bytes: 58072834560
  21:15:27 up 6 min,  2 users,  load average: 9.36, 2.84, 1.06

The last one:
cat /proc/fs/cifs/Stats | grep -E 'Writes|reconnect' ; uptime
0 session 0 share reconnects
Writes: 901903 Bytes: 58985102336
  21:20:22 up 11 min,  2 users,  load average: 28.16, 17.01, 7.49


> And can you paste the exact dd command you are running (I have been 
> trying the copy various ways with dd and bs=1MB or bs=4M) in case that 
> is why I am having trouble reproducing it.

Strange.
I just do this:
dd if=/dev/zero of=bigfile status=progress

Something over 70G is good.
Everything else freezes and you hardly can interrupt the dd.
Maybe with more memory or faster target it is different?

It is so nice reproducable for me, so I did a bisect search for the fix, 
and it is:

commit 3ee1a1fc39819906f04d6c62c180e760cd3a689d (refs/bisect/fixed)
Author: David Howells <dhowells@redhat.com>
Date:   Fri Oct 6 18:29:59 2023 +0100

     cifs: Cut over to using netfslib


And that's bad? As I see it, there are many commits for preparation 
commits to switch and a few fixes. Too many for stable?




>
>
> -----Original Message-----
> From: Thomas Voegtle <tv@lio96.de>
> Sent: Wednesday, June 12, 2024 2:21 PM
> To: Steven French <Steven.French@microsoft.com>
> Cc: Greg KH <gregkh@linuxfoundation.org>; stable@vger.kernel.org; David Howells <dhowells@redhat.com>; smfrench@gmail.com
> Subject: RE: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
>
> [You don't often get email from tv@lio96.de. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Wed, 12 Jun 2024, Steven French wrote:
>
>> Thanks for catching this - I found at least one case (even if we don't
>> want to ever encourage anyone to mount with these old dialects) where
>> I was able to repro a dd hang.
>>
>> I tried some experiments with both 6.10-rc2 and with 6.8 and don't see
>> a performance degradation with this, but there are some cases with
>> SMB1 where performance hit might be expected (if rsize or wsize is
>> negotiated to very small size, modern dialects support larger default
>> wsize and rsize).  I just did try an experiment with vers=1.0 and
>> 6.6.33 and did reproduce a problem though so am looking into that now
>> (I see session disconnected part way through the copy in
>> /proc/fs/cifs/DebugData - do you see the same thing).  I am not seeing
>> an issue with normal modern
>
> You mean this stuff:
>         MIDs:
>         Server ConnectionId: 0x6
>                 State: 2 com: 9 pid: 10 cbdata: 00000000c583976f mid
> 309943
>                 State: 2 com: 9 pid: 10 cbdata: 0000000085b5bf16 mid
> 309944
>                 State: 2 com: 9 pid: 10 cbdata: 000000008b353163 mid
> 309945
>                 State: 2 com: 9 pid: 10 cbdata: 00000000898b6503 mid
> 309946
> ...
>
> Yes, can see that.
>
>
>> dialects though but I will take a look and see if we can narrow down
>> what is happening in this old smb1 path.
>>
>> Can you check two things:
>> 1) what is the wsize and rsize that was negotiation ("mount | grep cifs") will show this?
>
> rsize=65536,wsize=65536 with vers=2.0
>
> rsize=1048576,wsize=65536 with vers=1.0
>
>> 2) what is the server type?
>
> That is an older Samba Server 4.9.18 with a bunch of patches (Debian?).
> I can test with several Windows Server versions if you like.
>
>
>>
>> The repro I tried was "dd if=/dev/zero of=/mnt1/48GB bs=4MB count=12000"
>> and so far vers=1.0 to 6.6.33 to Samba (ksmbd does not support the
>> older less secure dialects) was the only repro
>
> For vers=2.0 it needs a few GB more to hit the problem. In my setup it is 58GB with Linux 6.9.0. I know. It's weird.
>
>
>              Thomas
>
>
>
>>
>> -----Original Message-----
>> From: Greg KH <gregkh@linuxfoundation.org>
>> Sent: Wednesday, June 12, 2024 9:53 AM
>> To: Thomas Voegtle <tv@lio96.de>
>> Cc: stable@vger.kernel.org; David Howells <dhowells@redhat.com>;
>> Steven French <Steven.French@microsoft.com>
>> Subject: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big
>> files with vers=1.0 and 2.0
>>
>> On Wed, Jun 12, 2024 at 04:44:27PM +0200, Thomas Voegtle wrote:
>>> On Wed, 12 Jun 2024, Greg KH wrote:
>>>
>>>> On Tue, Jun 11, 2024 at 09:20:33AM +0200, Thomas Voegtle wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> a machine booted with Linux 6.6.23 up to 6.6.32:
>>>>>
>>>>> writing /dev/zero with dd on a mounted cifs share with vers=1.0 or
>>>>> vers=2.0 slows down drastically in my setup after writing approx.
>>>>> 46GB of data.
>>>>>
>>>>> The whole machine gets unresponsive as it was under very high IO
>>>>> load. It pings but opening a new ssh session needs too much time.
>>>>> I can stop the dd
>>>>> (ctrl-c) and after a few minutes the machine is fine again.
>>>>>
>>>>> cifs with vers=3.1.1 seems to be fine with 6.6.32.
>>>>> Linux 6.10-rc3 is fine with vers=1.0 and vers=2.0.
>>>>>
>>>>> Bisected down to:
>>>>>
>>>>> cifs-fix-writeback-data-corruption.patch
>>>>> which is:
>>>>> Upstream commit f3dc1bdb6b0b0693562c7c54a6c28bafa608ba3c
>>>>> and
>>>>> linux-stable commit e45deec35bf7f1f4f992a707b2d04a8c162f2240
>>>>>
>>>>> Reverting this patch on 6.6.32 fixes the problem for me.
>>>>
>>>> Odd, that commit is kind of needed :(
>>>>
>>>> Is there some later commit that resolves the issue here that we
>>>> should pick up for the stable trees?
>>>>
>>>
>>> Hope this helps:
>>>
>>> Linux 6.9.4 is broken in the same way and so is 6.9.0.
>>
>> How about Linus's tree?
>>
>> thnanks,
>>
>> greg k-h
>>
>>
>
>       Thomas
>
> --
>  Thomas V
>
>

       Thomas

-- 
  Thomas V


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
  2024-06-13 19:21             ` Thomas Voegtle
@ 2024-06-13 20:07               ` Steven French
  0 siblings, 0 replies; 9+ messages in thread
From: Steven French @ 2024-06-13 20:07 UTC (permalink / raw)
  To: Thomas Voegtle
  Cc: Greg KH, stable@vger.kernel.org, David Howells,
	smfrench@gmail.com

> so I did a bisect search for the fix, and it is:
>      cifs: Cut over to using netfslib
> And that's bad? As I see it, there are many commits for preparation commits to switch and a few fixes. Too many for stable?

Yes - that changeset is part of a larger series (for the folios/netfs conversion) that would not make sense to backport for 6.6.

If we can have some better luck reproducing it it may not be hard to reproduce - in the case I was able to reproduce it briefly yesterday, I didn't see any i/o in flight but traffic was hung so I had assumed it was a reconnect issue (a slow or stuck response perhaps due to more i/o in flight now than before), but your Stats/DebugData rules out that idea.

-----Original Message-----
From: Thomas Voegtle <tv@lio96.de> 
Sent: Thursday, June 13, 2024 2:22 PM
To: Steven French <Steven.French@microsoft.com>
Cc: Greg KH <gregkh@linuxfoundation.org>; stable@vger.kernel.org; David Howells <dhowells@redhat.com>; smfrench@gmail.com
Subject: RE: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0

On Thu, 13 Jun 2024, Steven French wrote:

> I haven't been able to repro the problem today with vers=1.0 (with
> 6.6.33 or 6.9.2) mounted to Samba so was wondering.
>
> For the "vers=1.0" and "vers=2.0" cases where you saw a failure can 
> you "cat /proc/fs/cifs/Stats | grep reconnect" to see if there were 
> network disconnect/reconnects during the copy.
>
> And also for the "vers=2.0" failure case you reported (which I have 
> been unable to repro the failure to) could you do "cat 
> /proc/fs/cifs/Stats | grep Writes" so we can see if any failed writes in that scenario.

On Linux 6.9.0 and vers=2.0 while hitting the bug and getting slower:

cat /proc/fs/cifs/Stats | grep -E 'Writes|reconnect' ; uptime
0 session 0 share reconnects
Writes: 887694 Bytes: 58072834560
  21:15:27 up 6 min,  2 users,  load average: 9.36, 2.84, 1.06

The last one:
cat /proc/fs/cifs/Stats | grep -E 'Writes|reconnect' ; uptime
0 session 0 share reconnects
Writes: 901903 Bytes: 58985102336
  21:20:22 up 11 min,  2 users,  load average: 28.16, 17.01, 7.49


> And can you paste the exact dd command you are running (I have been 
> trying the copy various ways with dd and bs=1MB or bs=4M) in case that 
> is why I am having trouble reproducing it.

Strange.
I just do this:
dd if=/dev/zero of=bigfile status=progress

Something over 70G is good.
Everything else freezes and you hardly can interrupt the dd.
Maybe with more memory or faster target it is different?

It is so nice reproducable for me, so I did a bisect search for the fix, and it is:

commit 3ee1a1fc39819906f04d6c62c180e760cd3a689d (refs/bisect/fixed)
Author: David Howells <dhowells@redhat.com>
Date:   Fri Oct 6 18:29:59 2023 +0100

     cifs: Cut over to using netfslib


And that's bad? As I see it, there are many commits for preparation commits to switch and a few fixes. Too many for stable?




>
>
> -----Original Message-----
> From: Thomas Voegtle <tv@lio96.de>
> Sent: Wednesday, June 12, 2024 2:21 PM
> To: Steven French <Steven.French@microsoft.com>
> Cc: Greg KH <gregkh@linuxfoundation.org>; stable@vger.kernel.org; David Howells <dhowells@redhat.com>; smfrench@gmail.com
> Subject: RE: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0
>
> [You don't often get email from tv@lio96.de. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Wed, 12 Jun 2024, Steven French wrote:
>
>> Thanks for catching this - I found at least one case (even if we don't
>> want to ever encourage anyone to mount with these old dialects) where
>> I was able to repro a dd hang.
>>
>> I tried some experiments with both 6.10-rc2 and with 6.8 and don't see
>> a performance degradation with this, but there are some cases with
>> SMB1 where performance hit might be expected (if rsize or wsize is
>> negotiated to very small size, modern dialects support larger default
>> wsize and rsize).  I just did try an experiment with vers=1.0 and
>> 6.6.33 and did reproduce a problem though so am looking into that now
>> (I see session disconnected part way through the copy in
>> /proc/fs/cifs/DebugData - do you see the same thing).  I am not seeing
>> an issue with normal modern
>
> You mean this stuff:
>         MIDs:
>         Server ConnectionId: 0x6
>                 State: 2 com: 9 pid: 10 cbdata: 00000000c583976f mid
> 309943
>                 State: 2 com: 9 pid: 10 cbdata: 0000000085b5bf16 mid
> 309944
>                 State: 2 com: 9 pid: 10 cbdata: 000000008b353163 mid
> 309945
>                 State: 2 com: 9 pid: 10 cbdata: 00000000898b6503 mid
> 309946
> ...
>
> Yes, can see that.
>
>
>> dialects though but I will take a look and see if we can narrow down
>> what is happening in this old smb1 path.
>>
>> Can you check two things:
>> 1) what is the wsize and rsize that was negotiation ("mount | grep cifs") will show this?
>
> rsize=65536,wsize=65536 with vers=2.0
>
> rsize=1048576,wsize=65536 with vers=1.0
>
>> 2) what is the server type?
>
> That is an older Samba Server 4.9.18 with a bunch of patches (Debian?).
> I can test with several Windows Server versions if you like.
>
>
>>
>> The repro I tried was "dd if=/dev/zero of=/mnt1/48GB bs=4MB count=12000"
>> and so far vers=1.0 to 6.6.33 to Samba (ksmbd does not support the
>> older less secure dialects) was the only repro
>
> For vers=2.0 it needs a few GB more to hit the problem. In my setup it is 58GB with Linux 6.9.0. I know. It's weird.
>
>
>              Thomas
>
>
>
>>
>> -----Original Message-----
>> From: Greg KH <gregkh@linuxfoundation.org>
>> Sent: Wednesday, June 12, 2024 9:53 AM
>> To: Thomas Voegtle <tv@lio96.de>
>> Cc: stable@vger.kernel.org; David Howells <dhowells@redhat.com>;
>> Steven French <Steven.French@microsoft.com>
>> Subject: [EXTERNAL] Re: 6.6.y: cifs broken since 6.6.23 writing big
>> files with vers=1.0 and 2.0
>>
>> On Wed, Jun 12, 2024 at 04:44:27PM +0200, Thomas Voegtle wrote:
>>> On Wed, 12 Jun 2024, Greg KH wrote:
>>>
>>>> On Tue, Jun 11, 2024 at 09:20:33AM +0200, Thomas Voegtle wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> a machine booted with Linux 6.6.23 up to 6.6.32:
>>>>>
>>>>> writing /dev/zero with dd on a mounted cifs share with vers=1.0 or
>>>>> vers=2.0 slows down drastically in my setup after writing approx.
>>>>> 46GB of data.
>>>>>
>>>>> The whole machine gets unresponsive as it was under very high IO
>>>>> load. It pings but opening a new ssh session needs too much time.
>>>>> I can stop the dd
>>>>> (ctrl-c) and after a few minutes the machine is fine again.
>>>>>
>>>>> cifs with vers=3.1.1 seems to be fine with 6.6.32.
>>>>> Linux 6.10-rc3 is fine with vers=1.0 and vers=2.0.
>>>>>
>>>>> Bisected down to:
>>>>>
>>>>> cifs-fix-writeback-data-corruption.patch
>>>>> which is:
>>>>> Upstream commit f3dc1bdb6b0b0693562c7c54a6c28bafa608ba3c
>>>>> and
>>>>> linux-stable commit e45deec35bf7f1f4f992a707b2d04a8c162f2240
>>>>>
>>>>> Reverting this patch on 6.6.32 fixes the problem for me.
>>>>
>>>> Odd, that commit is kind of needed :(
>>>>
>>>> Is there some later commit that resolves the issue here that we
>>>> should pick up for the stable trees?
>>>>
>>>
>>> Hope this helps:
>>>
>>> Linux 6.9.4 is broken in the same way and so is 6.9.0.
>>
>> How about Linus's tree?
>>
>> thnanks,
>>
>> greg k-h
>>
>>
>
>       Thomas
>
> --
>  Thomas V
>
>

       Thomas

-- 
  Thomas V


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-06-13 20:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-11  7:20 6.6.y: cifs broken since 6.6.23 writing big files with vers=1.0 and 2.0 Thomas Voegtle
2024-06-12 12:50 ` Greg KH
2024-06-12 14:44   ` Thomas Voegtle
2024-06-12 14:53     ` Greg KH
2024-06-12 17:38       ` [EXTERNAL] " Steven French
2024-06-12 19:21         ` Thomas Voegtle
2024-06-13 18:38           ` Steven French
2024-06-13 19:21             ` Thomas Voegtle
2024-06-13 20:07               ` Steven French

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox