Linux CIFS filesystem development
 help / color / mirror / Atom feed
* [PATCH v3] cifs: fix pagecache leak when do writepages
@ 2025-09-11  3:01 Yang Erkun
  2025-09-11  3:22 ` yangerkun
  0 siblings, 1 reply; 15+ messages in thread
From: Yang Erkun @ 2025-09-11  3:01 UTC (permalink / raw)
  To: sfrench, gregkh, pc, lsahlber, sprasad, tom, dhowells, linux-cifs,
	samba-technical, stable, nspmangalore, ematsumiya
  Cc: yangerkun, yangerkun

After commit f3dc1bdb6b0b("cifs: Fix writeback data corruption"), the
writepages for cifs will find all folio needed writepage with two phase.
The first folio will be found in cifs_writepages_begin, and the latter
various folios will be found in cifs_extend_writeback.

All those will first get folio, and for normal case, once we set page
writeback and after do really write, we should put the reference, folio
found in cifs_extend_writeback do this with folio_batch_release. But the
folio found in cifs_writepages_begin never get the chance do it. And
every writepages call, we will leak a folio(found this problem while do
xfstests over cifs, the latter show that we will leak about 600M+ every
we run generic/074).

echo 3 > /proc/sys/vm/drop_caches ; cat /proc/meminfo | grep file
Active(file):      34092 kB
Inactive(file):   176192 kB
./check generic/074 (smb v1)
...
generic/074 50s ...  53s
Ran: generic/074
Passed all 1 tests

echo 3 > /proc/sys/vm/drop_caches ; cat /proc/meminfo | grep file
Active(file):      35036 kB
Inactive(file):   854708 kB

Besides, the exist path seem never handle this folio correctly, fix it too
with this patch.

The problem does not exist in mainline since writepages path for cifs
has changed to netfs(3ee1a1fc3981 ("cifs: Cut over to using netfslib")).
It's had to backport all related change, so try fix this problem with this
single patch.

Fixes: f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
Cc: stable@kernel.org # v6.6~v6.9
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
---
 fs/smb/client/file.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

v2->v3:

rebase on top 6.6 stable since some new patches has been backported

diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 7a2b81fbd9cf..f97478626e9d 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2884,17 +2884,21 @@ static ssize_t cifs_write_back_from_locked_folio(struct address_space *mapping,
 	rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile);
 	if (rc) {
 		cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", rc);
+		folio_unlock(folio);
 		goto err_xid;
 	}
 
 	rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize,
 					   &wsize, credits);
-	if (rc != 0)
+	if (rc != 0) {
+		folio_unlock(folio);
 		goto err_close;
+	}
 
 	wdata = cifs_writedata_alloc(cifs_writev_complete);
 	if (!wdata) {
 		rc = -ENOMEM;
+		folio_unlock(folio);
 		goto err_uncredit;
 	}
 
@@ -3041,16 +3045,21 @@ static ssize_t cifs_writepages_begin(struct address_space *mapping,
 lock_again:
 	if (wbc->sync_mode != WB_SYNC_NONE) {
 		ret = folio_lock_killable(folio);
-		if (ret < 0)
+		if (ret < 0) {
+			folio_put(folio);
 			return ret;
+		}
 	} else {
-		if (!folio_trylock(folio))
+		if (!folio_trylock(folio)) {
+			folio_put(folio);
 			goto search_again;
+		}
 	}
 
 	if (folio->mapping != mapping ||
 	    !folio_test_dirty(folio)) {
 		start += folio_size(folio);
+		folio_put(folio);
 		folio_unlock(folio);
 		goto search_again;
 	}
@@ -3081,6 +3090,7 @@ static ssize_t cifs_writepages_begin(struct address_space *mapping,
 out:
 	if (ret > 0)
 		*_start = start + ret;
+	folio_put(folio);
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11  3:01 [PATCH v3] cifs: fix pagecache leak when do writepages Yang Erkun
@ 2025-09-11  3:22 ` yangerkun
  2025-09-11 10:53   ` Greg KH
  2025-09-11 16:25   ` David Howells
  0 siblings, 2 replies; 15+ messages in thread
From: yangerkun @ 2025-09-11  3:22 UTC (permalink / raw)
  To: sfrench, gregkh, pc, lsahlber, sprasad, tom, dhowells, linux-cifs,
	samba-technical, stable, nspmangalore, ematsumiya
  Cc: yangerkun

Hello,

In stable version 6.6, IO operations for CIFS cause system memory leaks 
shortly after starting; our test case triggers this issue, and other 
users have reported it as well [1].

This problem does not occur in the mainline kernel after commit 
3ee1a1fc3981 ("cifs: Cut over to using netfslib") (v6.10-rc1), but 
backporting this fix to stable versions 6.6 through 6.9 is challenging. 
Therefore, I have decided to address the issue with a separate patch.

Hi Greg,

I have reviewed [2] to understand the process for submitting patches to 
stable branches. However, this patch may not fit their criteria since it 
is not a backport from mainline. Is there anything else I should do to 
make this patch appear more formal?


[1]. 
https://lore.kernel.org/all/CANT5p=r+Ce0FD7yjzJU4kq3v0UyzFNjgTz0eZ_=54fTe_H6_BQ@mail.gmail.com/
[2]. https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

在 2025/9/11 11:01, Yang Erkun 写道:
> After commit f3dc1bdb6b0b("cifs: Fix writeback data corruption"), the
> writepages for cifs will find all folio needed writepage with two phase.
> The first folio will be found in cifs_writepages_begin, and the latter
> various folios will be found in cifs_extend_writeback.
> 
> All those will first get folio, and for normal case, once we set page
> writeback and after do really write, we should put the reference, folio
> found in cifs_extend_writeback do this with folio_batch_release. But the
> folio found in cifs_writepages_begin never get the chance do it. And
> every writepages call, we will leak a folio(found this problem while do
> xfstests over cifs, the latter show that we will leak about 600M+ every
> we run generic/074).
> 
> echo 3 > /proc/sys/vm/drop_caches ; cat /proc/meminfo | grep file
> Active(file):      34092 kB
> Inactive(file):   176192 kB
> ./check generic/074 (smb v1)
> ...
> generic/074 50s ...  53s
> Ran: generic/074
> Passed all 1 tests
> 
> echo 3 > /proc/sys/vm/drop_caches ; cat /proc/meminfo | grep file
> Active(file):      35036 kB
> Inactive(file):   854708 kB
> 
> Besides, the exist path seem never handle this folio correctly, fix it too
> with this patch.
> 
> The problem does not exist in mainline since writepages path for cifs
> has changed to netfs(3ee1a1fc3981 ("cifs: Cut over to using netfslib")).
> It's had to backport all related change, so try fix this problem with this
> single patch.
> 
> Fixes: f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
> Cc: stable@kernel.org # v6.6~v6.9
> Signed-off-by: Yang Erkun <yangerkun@huawei.com>
> ---
>   fs/smb/client/file.c | 16 +++++++++++++---
>   1 file changed, 13 insertions(+), 3 deletions(-)
> 
> v2->v3:
> 
> rebase on top 6.6 stable since some new patches has been backported
> 
> diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
> index 7a2b81fbd9cf..f97478626e9d 100644
> --- a/fs/smb/client/file.c
> +++ b/fs/smb/client/file.c
> @@ -2884,17 +2884,21 @@ static ssize_t cifs_write_back_from_locked_folio(struct address_space *mapping,
>   	rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile);
>   	if (rc) {
>   		cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", rc);
> +		folio_unlock(folio);
>   		goto err_xid;
>   	}
>   
>   	rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize,
>   					   &wsize, credits);
> -	if (rc != 0)
> +	if (rc != 0) {
> +		folio_unlock(folio);
>   		goto err_close;
> +	}
>   
>   	wdata = cifs_writedata_alloc(cifs_writev_complete);
>   	if (!wdata) {
>   		rc = -ENOMEM;
> +		folio_unlock(folio);
>   		goto err_uncredit;
>   	}
>   
> @@ -3041,16 +3045,21 @@ static ssize_t cifs_writepages_begin(struct address_space *mapping,
>   lock_again:
>   	if (wbc->sync_mode != WB_SYNC_NONE) {
>   		ret = folio_lock_killable(folio);
> -		if (ret < 0)
> +		if (ret < 0) {
> +			folio_put(folio);
>   			return ret;
> +		}
>   	} else {
> -		if (!folio_trylock(folio))
> +		if (!folio_trylock(folio)) {
> +			folio_put(folio);
>   			goto search_again;
> +		}
>   	}
>   
>   	if (folio->mapping != mapping ||
>   	    !folio_test_dirty(folio)) {
>   		start += folio_size(folio);
> +		folio_put(folio);
>   		folio_unlock(folio);
>   		goto search_again;
>   	}
> @@ -3081,6 +3090,7 @@ static ssize_t cifs_writepages_begin(struct address_space *mapping,
>   out:
>   	if (ret > 0)
>   		*_start = start + ret;
> +	folio_put(folio);
>   	return ret;
>   }
>   

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11  3:22 ` yangerkun
@ 2025-09-11 10:53   ` Greg KH
  2025-09-11 11:09     ` yangerkun
  2025-09-11 16:31     ` David Howells
  2025-09-11 16:25   ` David Howells
  1 sibling, 2 replies; 15+ messages in thread
From: Greg KH @ 2025-09-11 10:53 UTC (permalink / raw)
  To: yangerkun
  Cc: sfrench, pc, lsahlber, sprasad, tom, dhowells, linux-cifs,
	samba-technical, stable, nspmangalore, ematsumiya, yangerkun

On Thu, Sep 11, 2025 at 11:22:57AM +0800, yangerkun wrote:
> Hello,
> 
> In stable version 6.6, IO operations for CIFS cause system memory leaks
> shortly after starting; our test case triggers this issue, and other users
> have reported it as well [1].
> 
> This problem does not occur in the mainline kernel after commit 3ee1a1fc3981
> ("cifs: Cut over to using netfslib") (v6.10-rc1), but backporting this fix
> to stable versions 6.6 through 6.9 is challenging. Therefore, I have decided
> to address the issue with a separate patch.
> 
> Hi Greg,
> 
> I have reviewed [2] to understand the process for submitting patches to
> stable branches. However, this patch may not fit their criteria since it is
> not a backport from mainline. Is there anything else I should do to make
> this patch appear more formal?

Yes, please include the info as to why this is not a backport from
upstream, and why it can only go into this one specific tree and get the
developers involved to agree with this.

But why not submit the upstream changes instead?  That should be much
simpler and is always preferred as that way the code can be maintained
easier over time.  Whenever we have these one-off changes, they are
almost always wrong and incur additional development efforts for future
changes in the same area.

So please, do the backports first.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11 10:53   ` Greg KH
@ 2025-09-11 11:09     ` yangerkun
  2025-09-11 11:17       ` Greg KH
  2025-09-11 16:31     ` David Howells
  1 sibling, 1 reply; 15+ messages in thread
From: yangerkun @ 2025-09-11 11:09 UTC (permalink / raw)
  To: Greg KH
  Cc: sfrench, pc, lsahlber, sprasad, tom, dhowells, linux-cifs,
	samba-technical, stable, nspmangalore, ematsumiya, yangerkun



在 2025/9/11 18:53, Greg KH 写道:
> On Thu, Sep 11, 2025 at 11:22:57AM +0800, yangerkun wrote:
>> Hello,
>>
>> In stable version 6.6, IO operations for CIFS cause system memory leaks
>> shortly after starting; our test case triggers this issue, and other users
>> have reported it as well [1].
>>
>> This problem does not occur in the mainline kernel after commit 3ee1a1fc3981
>> ("cifs: Cut over to using netfslib") (v6.10-rc1), but backporting this fix
>> to stable versions 6.6 through 6.9 is challenging. Therefore, I have decided
>> to address the issue with a separate patch.
>>
>> Hi Greg,
>>
>> I have reviewed [2] to understand the process for submitting patches to
>> stable branches. However, this patch may not fit their criteria since it is
>> not a backport from mainline. Is there anything else I should do to make
>> this patch appear more formal?
> 
> Yes, please include the info as to why this is not a backport from
> upstream, and why it can only go into this one specific tree and get the
> developers involved to agree with this.

Alright, the reason I favor this single patch is that the mainline 
solution involves a major refactor [1] to change the I/O path to 
netfslib. Backporting it would cause many conflicts, and such a large 
patch set would introduce numerous KABI changes. Therefore, this single 
patch is provided here instead...

[1]. 
https://lore.kernel.org/all/20240328165845.2782259-1-dhowells@redhat.com/

> 
> But why not submit the upstream changes instead?  That should be much
> simpler and is always preferred as that way the code can be maintained
> easier over time.  Whenever we have these one-off changes, they are
> almost always wrong and incur additional development efforts for future
> changes in the same area.
> 
> So please, do the backports first.
> 
> thanks,
> 
> greg k-h
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11 11:09     ` yangerkun
@ 2025-09-11 11:17       ` Greg KH
  2025-09-11 11:25         ` yangerkun
  2025-09-11 16:40         ` Steve French
  0 siblings, 2 replies; 15+ messages in thread
From: Greg KH @ 2025-09-11 11:17 UTC (permalink / raw)
  To: yangerkun
  Cc: sfrench, pc, lsahlber, sprasad, tom, dhowells, linux-cifs,
	samba-technical, stable, nspmangalore, ematsumiya, yangerkun

On Thu, Sep 11, 2025 at 07:09:30PM +0800, yangerkun wrote:
> 
> 
> 在 2025/9/11 18:53, Greg KH 写道:
> > On Thu, Sep 11, 2025 at 11:22:57AM +0800, yangerkun wrote:
> > > Hello,
> > > 
> > > In stable version 6.6, IO operations for CIFS cause system memory leaks
> > > shortly after starting; our test case triggers this issue, and other users
> > > have reported it as well [1].
> > > 
> > > This problem does not occur in the mainline kernel after commit 3ee1a1fc3981
> > > ("cifs: Cut over to using netfslib") (v6.10-rc1), but backporting this fix
> > > to stable versions 6.6 through 6.9 is challenging. Therefore, I have decided
> > > to address the issue with a separate patch.
> > > 
> > > Hi Greg,
> > > 
> > > I have reviewed [2] to understand the process for submitting patches to
> > > stable branches. However, this patch may not fit their criteria since it is
> > > not a backport from mainline. Is there anything else I should do to make
> > > this patch appear more formal?
> > 
> > Yes, please include the info as to why this is not a backport from
> > upstream, and why it can only go into this one specific tree and get the
> > developers involved to agree with this.
> 
> Alright, the reason I favor this single patch is that the mainline solution
> involves a major refactor [1] to change the I/O path to netfslib.
> Backporting it would cause many conflicts, and such a large patch set would
> introduce numerous KABI changes. Therefore, this single patch is provided
> here instead...

There is no stable kernel api, sorry, that is not a valid reason.  And
we've taken large patch sets in the past.

But if you can get the maintainers of the code to agree that this is the
best solution, we'll be glad to take it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11 11:17       ` Greg KH
@ 2025-09-11 11:25         ` yangerkun
  2025-09-11 11:40           ` Shyam Prasad N
  2025-09-11 16:40         ` Steve French
  1 sibling, 1 reply; 15+ messages in thread
From: yangerkun @ 2025-09-11 11:25 UTC (permalink / raw)
  To: Greg KH
  Cc: sfrench, pc, lsahlber, sprasad, tom, dhowells, linux-cifs,
	samba-technical, stable, nspmangalore, ematsumiya, yangerkun



在 2025/9/11 19:17, Greg KH 写道:
> On Thu, Sep 11, 2025 at 07:09:30PM +0800, yangerkun wrote:
>>
>>
>> 在 2025/9/11 18:53, Greg KH 写道:
>>> On Thu, Sep 11, 2025 at 11:22:57AM +0800, yangerkun wrote:
>>>> Hello,
>>>>
>>>> In stable version 6.6, IO operations for CIFS cause system memory leaks
>>>> shortly after starting; our test case triggers this issue, and other users
>>>> have reported it as well [1].
>>>>
>>>> This problem does not occur in the mainline kernel after commit 3ee1a1fc3981
>>>> ("cifs: Cut over to using netfslib") (v6.10-rc1), but backporting this fix
>>>> to stable versions 6.6 through 6.9 is challenging. Therefore, I have decided
>>>> to address the issue with a separate patch.
>>>>
>>>> Hi Greg,
>>>>
>>>> I have reviewed [2] to understand the process for submitting patches to
>>>> stable branches. However, this patch may not fit their criteria since it is
>>>> not a backport from mainline. Is there anything else I should do to make
>>>> this patch appear more formal?
>>>
>>> Yes, please include the info as to why this is not a backport from
>>> upstream, and why it can only go into this one specific tree and get the
>>> developers involved to agree with this.
>>
>> Alright, the reason I favor this single patch is that the mainline solution
>> involves a major refactor [1] to change the I/O path to netfslib.
>> Backporting it would cause many conflicts, and such a large patch set would
>> introduce numerous KABI changes. Therefore, this single patch is provided
>> here instead...
> 
> There is no stable kernel api, sorry, that is not a valid reason.  And
> we've taken large patch sets in the past.
> 
> But if you can get the maintainers of the code to agree that this is the
> best solution, we'll be glad to take it.

OK, Steve, can you help give a feedback for this patch?

Thanks,
Yang Erkun.

> 
> thanks,
> 
> greg k-h
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11 11:25         ` yangerkun
@ 2025-09-11 11:40           ` Shyam Prasad N
  2025-09-11 15:45             ` Enzo Matsumiya
  0 siblings, 1 reply; 15+ messages in thread
From: Shyam Prasad N @ 2025-09-11 11:40 UTC (permalink / raw)
  To: yangerkun
  Cc: Greg KH, sfrench, pc, lsahlber, sprasad, tom, dhowells,
	linux-cifs, samba-technical, stable, ematsumiya, yangerkun

On Thu, Sep 11, 2025 at 4:55 PM yangerkun <yangerkun@huawei.com> wrote:
>
>
>
> 在 2025/9/11 19:17, Greg KH 写道:
> > On Thu, Sep 11, 2025 at 07:09:30PM +0800, yangerkun wrote:
> >>
> >>
> >> 在 2025/9/11 18:53, Greg KH 写道:
> >>> On Thu, Sep 11, 2025 at 11:22:57AM +0800, yangerkun wrote:
> >>>> Hello,
> >>>>
> >>>> In stable version 6.6, IO operations for CIFS cause system memory leaks
> >>>> shortly after starting; our test case triggers this issue, and other users
> >>>> have reported it as well [1].
> >>>>
> >>>> This problem does not occur in the mainline kernel after commit 3ee1a1fc3981
> >>>> ("cifs: Cut over to using netfslib") (v6.10-rc1), but backporting this fix
> >>>> to stable versions 6.6 through 6.9 is challenging. Therefore, I have decided
> >>>> to address the issue with a separate patch.
> >>>>
> >>>> Hi Greg,
> >>>>
> >>>> I have reviewed [2] to understand the process for submitting patches to
> >>>> stable branches. However, this patch may not fit their criteria since it is
> >>>> not a backport from mainline. Is there anything else I should do to make
> >>>> this patch appear more formal?
> >>>
> >>> Yes, please include the info as to why this is not a backport from
> >>> upstream, and why it can only go into this one specific tree and get the
> >>> developers involved to agree with this.
> >>
> >> Alright, the reason I favor this single patch is that the mainline solution
> >> involves a major refactor [1] to change the I/O path to netfslib.
> >> Backporting it would cause many conflicts, and such a large patch set would
> >> introduce numerous KABI changes. Therefore, this single patch is provided
> >> here instead...
> >
> > There is no stable kernel api, sorry, that is not a valid reason.  And
> > we've taken large patch sets in the past.
> >
> > But if you can get the maintainers of the code to agree that this is the
> > best solution, we'll be glad to take it.
>
> OK, Steve, can you help give a feedback for this patch?
>
> Thanks,
> Yang Erkun.
>
> >
> > thanks,
> >
> > greg k-h
> >

Hi Greg,

Steve can give you the final confirmation, but I can add some context here.

This bug was never fixed upstream since the write/read code path was
entirely refactored (with most of the folio maintenance
responsibilities offloaded to netfs).
We've recently had at least a couple of customers complaining about
this in Microsoft, following which we've been able to repro the
growing memory usage with a certain type of application workload.
We've also been able to verify that the issue does not reproduce when
cifs.ko was built with this patch against the 6.6 kernel of Azure
Linux (and that kernel is mostly equivalent to stable 6.6). If you
need a confirmation that this patch fixes the issue even on stable
6.6, we can do that check.

Additionally @Enzo Matsumiya also mentioned that SLES had to backport
this change to their v6.4 kernel to fix this folio leak.

-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11 11:40           ` Shyam Prasad N
@ 2025-09-11 15:45             ` Enzo Matsumiya
  0 siblings, 0 replies; 15+ messages in thread
From: Enzo Matsumiya @ 2025-09-11 15:45 UTC (permalink / raw)
  To: yangerkun
  Cc: Shyam Prasad N, Greg KH, sfrench, pc, lsahlber, sprasad, tom,
	dhowells, linux-cifs, samba-technical, stable, yangerkun

On 09/11, Shyam Prasad N wrote:
>On Thu, Sep 11, 2025 at 4:55 PM yangerkun <yangerkun@huawei.com> wrote:
>>
>>
>>
>> 在 2025/9/11 19:17, Greg KH 写道:
>> > On Thu, Sep 11, 2025 at 07:09:30PM +0800, yangerkun wrote:
>> >>
>> >>
>> >> 在 2025/9/11 18:53, Greg KH 写道:
>> >>> On Thu, Sep 11, 2025 at 11:22:57AM +0800, yangerkun wrote:
>> >>>> Hello,
>> >>>>
>> >>>> In stable version 6.6, IO operations for CIFS cause system memory leaks
>> >>>> shortly after starting; our test case triggers this issue, and other users
>> >>>> have reported it as well [1].
>> >>>>
>> >>>> This problem does not occur in the mainline kernel after commit 3ee1a1fc3981
>> >>>> ("cifs: Cut over to using netfslib") (v6.10-rc1), but backporting this fix
>> >>>> to stable versions 6.6 through 6.9 is challenging. Therefore, I have decided
>> >>>> to address the issue with a separate patch.
>> >>>>
>> >>>> Hi Greg,
>> >>>>
>> >>>> I have reviewed [2] to understand the process for submitting patches to
>> >>>> stable branches. However, this patch may not fit their criteria since it is
>> >>>> not a backport from mainline. Is there anything else I should do to make
>> >>>> this patch appear more formal?
>> >>>
>> >>> Yes, please include the info as to why this is not a backport from
>> >>> upstream, and why it can only go into this one specific tree and get the
>> >>> developers involved to agree with this.
>> >>
>> >> Alright, the reason I favor this single patch is that the mainline solution
>> >> involves a major refactor [1] to change the I/O path to netfslib.
>> >> Backporting it would cause many conflicts, and such a large patch set would
>> >> introduce numerous KABI changes. Therefore, this single patch is provided
>> >> here instead...
>> >
>> > There is no stable kernel api, sorry, that is not a valid reason.  And
>> > we've taken large patch sets in the past.
>> >
>> > But if you can get the maintainers of the code to agree that this is the
>> > best solution, we'll be glad to take it.
>>
>> OK, Steve, can you help give a feedback for this patch?
>>
>> Thanks,
>> Yang Erkun.
>>
>> >
>> > thanks,
>> >
>> > greg k-h
>> >
>
>Hi Greg,
>
>Steve can give you the final confirmation, but I can add some context here.
>
>This bug was never fixed upstream since the write/read code path was
>entirely refactored (with most of the folio maintenance
>responsibilities offloaded to netfs).
>We've recently had at least a couple of customers complaining about
>this in Microsoft, following which we've been able to repro the
>growing memory usage with a certain type of application workload.
>We've also been able to verify that the issue does not reproduce when
>cifs.ko was built with this patch against the 6.6 kernel of Azure
>Linux (and that kernel is mostly equivalent to stable 6.6). If you
>need a confirmation that this patch fixes the issue even on stable
>6.6, we can do that check.
>
>Additionally @Enzo Matsumiya also mentioned that SLES had to backport
>this change to their v6.4 kernel to fix this folio leak.

That's right.

Just a note that we did it for v6.4 because we had backported the fixed
commit (f3dc1bdb6b0) to begin with.
But I haven't checked if stable-v6.4 (i.e. without cifs_writepages_begin())
is affected.


Cheers,

Enzo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11  3:22 ` yangerkun
  2025-09-11 10:53   ` Greg KH
@ 2025-09-11 16:25   ` David Howells
  2025-09-11 18:17     ` Matthew Wilcox
  1 sibling, 1 reply; 15+ messages in thread
From: David Howells @ 2025-09-11 16:25 UTC (permalink / raw)
  To: yangerkun
  Cc: dhowells, sfrench, gregkh, pc, lsahlber, sprasad, tom, willy,
	linux-cifs, samba-technical, stable, nspmangalore, ematsumiya,
	yangerkun

yangerkun <yangerkun@huawei.com> wrote:

> >     	if (folio->mapping != mapping ||
> >   	    !folio_test_dirty(folio)) {
> >   		start += folio_size(folio);
> > +		folio_put(folio);
> >   		folio_unlock(folio);
> >   		goto search_again;

I wonder if the put should be prior to the unlock.  It probably doesn't matter
as we keep control of the folio until both have happened.

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11 10:53   ` Greg KH
  2025-09-11 11:09     ` yangerkun
@ 2025-09-11 16:31     ` David Howells
  2025-09-11 16:38       ` Greg KH
  1 sibling, 1 reply; 15+ messages in thread
From: David Howells @ 2025-09-11 16:31 UTC (permalink / raw)
  To: Greg KH
  Cc: dhowells, yangerkun, sfrench, pc, lsahlber, sprasad, tom,
	linux-cifs, samba-technical, stable, nspmangalore, ematsumiya,
	yangerkun

Greg KH <gregkh@linuxfoundation.org> wrote:

> Yes, please include the info as to why this is not a backport from
> upstream, and why it can only go into this one specific tree and get the
> developers involved to agree with this.

Backporting the massive amount of changes to netfslib, fscache, cifs, afs, 9p,
ceph and nfs would kind of diminish the notion that this is a stable
kernel;-).

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11 16:31     ` David Howells
@ 2025-09-11 16:38       ` Greg KH
  0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2025-09-11 16:38 UTC (permalink / raw)
  To: David Howells
  Cc: yangerkun, sfrench, pc, lsahlber, sprasad, tom, linux-cifs,
	samba-technical, stable, nspmangalore, ematsumiya, yangerkun

On Thu, Sep 11, 2025 at 05:31:56PM +0100, David Howells wrote:
> Greg KH <gregkh@linuxfoundation.org> wrote:
> 
> > Yes, please include the info as to why this is not a backport from
> > upstream, and why it can only go into this one specific tree and get the
> > developers involved to agree with this.
> 
> Backporting the massive amount of changes to netfslib, fscache, cifs, afs, 9p,
> ceph and nfs would kind of diminish the notion that this is a stable
> kernel;-).

Ok, then that's a good reason why, let's document that!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11 11:17       ` Greg KH
  2025-09-11 11:25         ` yangerkun
@ 2025-09-11 16:40         ` Steve French
  2025-09-11 17:10           ` Greg KH
  1 sibling, 1 reply; 15+ messages in thread
From: Steve French @ 2025-09-11 16:40 UTC (permalink / raw)
  To: Greg KH
  Cc: yangerkun, sfrench, pc, lsahlber, sprasad, tom, dhowells,
	linux-cifs, samba-technical, stable, nspmangalore, ematsumiya,
	yangerkun

David and I discussed this today and this patch is MUCH safer than
backporting the later (6.10) netfs changes which would be much larger
and riskier to include (and presumably could affect code outside
cifs.ko as well where this patch is narrowly targeted).

I am fine with this patch.from Yang for 6.6 stable


On Thu, Sep 11, 2025 at 6:45 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Sep 11, 2025 at 07:09:30PM +0800, yangerkun wrote:
> >
> >
> > 在 2025/9/11 18:53, Greg KH 写道:
> > > On Thu, Sep 11, 2025 at 11:22:57AM +0800, yangerkun wrote:
> > > > Hello,
> > > >
> > > > In stable version 6.6, IO operations for CIFS cause system memory leaks
> > > > shortly after starting; our test case triggers this issue, and other users
> > > > have reported it as well [1].
> > > >
> > > > This problem does not occur in the mainline kernel after commit 3ee1a1fc3981
> > > > ("cifs: Cut over to using netfslib") (v6.10-rc1), but backporting this fix
> > > > to stable versions 6.6 through 6.9 is challenging. Therefore, I have decided
> > > > to address the issue with a separate patch.
> > > >
> > > > Hi Greg,
> > > >
> > > > I have reviewed [2] to understand the process for submitting patches to
> > > > stable branches. However, this patch may not fit their criteria since it is
> > > > not a backport from mainline. Is there anything else I should do to make
> > > > this patch appear more formal?
> > >
> > > Yes, please include the info as to why this is not a backport from
> > > upstream, and why it can only go into this one specific tree and get the
> > > developers involved to agree with this.
> >
> > Alright, the reason I favor this single patch is that the mainline solution
> > involves a major refactor [1] to change the I/O path to netfslib.
> > Backporting it would cause many conflicts, and such a large patch set would
> > introduce numerous KABI changes. Therefore, this single patch is provided
> > here instead...
>
> There is no stable kernel api, sorry, that is not a valid reason.  And
> we've taken large patch sets in the past.
>
> But if you can get the maintainers of the code to agree that this is the
> best solution, we'll be glad to take it.
>
> thanks,
>
> greg k-h
>


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11 16:40         ` Steve French
@ 2025-09-11 17:10           ` Greg KH
  0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2025-09-11 17:10 UTC (permalink / raw)
  To: Steve French
  Cc: yangerkun, sfrench, pc, lsahlber, sprasad, tom, dhowells,
	linux-cifs, samba-technical, stable, nspmangalore, ematsumiya,
	yangerkun

On Thu, Sep 11, 2025 at 11:40:08AM -0500, Steve French wrote:
> David and I discussed this today and this patch is MUCH safer than
> backporting the later (6.10) netfs changes which would be much larger
> and riskier to include (and presumably could affect code outside
> cifs.ko as well where this patch is narrowly targeted).
> 
> I am fine with this patch.from Yang for 6.6 stable

Great, thanks for reviewing this.  Yang, can you resubmit this with the
additional information so we can apply it?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11 16:25   ` David Howells
@ 2025-09-11 18:17     ` Matthew Wilcox
  2025-09-12  1:12       ` yangerkun
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2025-09-11 18:17 UTC (permalink / raw)
  To: David Howells
  Cc: yangerkun, sfrench, gregkh, pc, lsahlber, sprasad, tom,
	linux-cifs, samba-technical, stable, nspmangalore, ematsumiya,
	yangerkun

On Thu, Sep 11, 2025 at 05:25:06PM +0100, David Howells wrote:
> yangerkun <yangerkun@huawei.com> wrote:
> 
> > >     	if (folio->mapping != mapping ||
> > >   	    !folio_test_dirty(folio)) {
> > >   		start += folio_size(folio);
> > > +		folio_put(folio);
> > >   		folio_unlock(folio);
> > >   		goto search_again;
> 
> I wonder if the put should be prior to the unlock.  It probably doesn't matter
> as we keep control of the folio until both have happened.

Well, folio->mapping != mapping is the condition for 'this folio has
been truncated', so this folio_put() may well be the last one.  I'd
put it after the folio_unlock() for safety.
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] cifs: fix pagecache leak when do writepages
  2025-09-11 18:17     ` Matthew Wilcox
@ 2025-09-12  1:12       ` yangerkun
  0 siblings, 0 replies; 15+ messages in thread
From: yangerkun @ 2025-09-12  1:12 UTC (permalink / raw)
  To: Matthew Wilcox, David Howells
  Cc: sfrench, gregkh, pc, sprasad, tom, linux-cifs, samba-technical,
	stable, nspmangalore, ematsumiya, yangerkun



在 2025/9/12 2:17, Matthew Wilcox 写道:
> On Thu, Sep 11, 2025 at 05:25:06PM +0100, David Howells wrote:
>> yangerkun <yangerkun@huawei.com> wrote:
>>
>>>>      	if (folio->mapping != mapping ||
>>>>    	    !folio_test_dirty(folio)) {
>>>>    		start += folio_size(folio);
>>>> +		folio_put(folio);
>>>>    		folio_unlock(folio);
>>>>    		goto search_again;
>>
>> I wonder if the put should be prior to the unlock.  It probably doesn't matter
>> as we keep control of the folio until both have happened.
> 
> Well, folio->mapping != mapping is the condition for 'this folio has
> been truncated', so this folio_put() may well be the last one.  I'd
> put it after the folio_unlock() for safety.
>>
> 

Thanks for pointing this out. Yes, I have check usage from other file
systems; using folio_put after folio_unlock is a better approach.

Thanks,
Erkun.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-09-12  1:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-11  3:01 [PATCH v3] cifs: fix pagecache leak when do writepages Yang Erkun
2025-09-11  3:22 ` yangerkun
2025-09-11 10:53   ` Greg KH
2025-09-11 11:09     ` yangerkun
2025-09-11 11:17       ` Greg KH
2025-09-11 11:25         ` yangerkun
2025-09-11 11:40           ` Shyam Prasad N
2025-09-11 15:45             ` Enzo Matsumiya
2025-09-11 16:40         ` Steve French
2025-09-11 17:10           ` Greg KH
2025-09-11 16:31     ` David Howells
2025-09-11 16:38       ` Greg KH
2025-09-11 16:25   ` David Howells
2025-09-11 18:17     ` Matthew Wilcox
2025-09-12  1:12       ` yangerkun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox