Excessive page cache occupies DMA32 memory

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Excessive page cache occupies DMA32 memory
@ 2025-07-21 15:03 Muhammad Usama Anjum
  2025-07-21 17:13 ` Matthew Wilcox
  0 siblings, 1 reply; 8+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-21 15:03 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-kernel, gregkh, usama.anjum, Andrew Morton, kernel,
	linux-mm, linux-fsdevel

Hello,

When 10-12GB our of total 16GB RAM is being used as page cache
(active_file + inactive_file) at suspend time, the drivers fail to allocate
dma memory at resume as dma memory is either occupied by the page cache or
fragmented. Example:

kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 1 UID: 0 PID: 7693 Comm: kworker/u33:5 Not tainted 6.11.11-valve17-1-neptune-611-g027868a0ac03 #1 3843143b92e9da0fa2d3d5f21f51beaed15c7d59
Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
Call Trace:
 <TASK>
 dump_stack_lvl+0x4e/0x70
 warn_alloc+0x164/0x190
 ? srso_return_thunk+0x5/0x5f
 ? __alloc_pages_direct_compact+0xaf/0x360
 __alloc_pages_slowpath.constprop.0+0xc75/0xd70
 __alloc_pages_noprof+0x321/0x350
 __dma_direct_alloc_pages.isra.0+0x14a/0x290
 dma_direct_alloc+0x70/0x270
 mhi_fw_load_handler+0x126/0x340 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf]
 mhi_pm_st_worker+0x5e8/0xac0 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf]
 ? srso_return_thunk+0x5/0x5f
 process_one_work+0x17e/0x330
 worker_thread+0x2ce/0x3f0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xd2/0x100
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x34/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>
Mem-Info:
active_anon:513809 inactive_anon:152 isolated_anon:0
active_file:359315 inactive_file:2487001 isolated_file:0
unevictable:637 dirty:19 writeback:0
slab_reclaimable:160391 slab_unreclaimable:39729
mapped:175836 shmem:51039 pagetables:4415
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:125666 free_pcp:0 free_cma:0
Node 0 active_anon:2055236kB inactive_anon:608kB active_file:1437260kB inactive_file:9948004kB unevictable:2548kB isolated(anon):0kB isolated(file):0kB mapped:703344kB dirty:76kB writeback:0kB shmem:204156kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:495616kB writeback_tmp:0kB kernel_stack:9440kB pagetables:17660kB sec_pagetables:0kB all_unreclaimable? no
Node 0 DMA free:68kB boost:0kB min:68kB low:84kB high:100kB reserved_highatomic:0KB active_anon:8kB inactive_anon:0kB active_file:0kB inactive_file:13232kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 1808 14772 0 0
Node 0 DMA32 free:9796kB boost:0kB min:8264kB low:10328kB high:12392kB reserved_highatomic:0KB active_anon:14148kB inactive_anon:88kB active_file:128kB inactive_file:1757192kB unevictable:0kB writepending:0kB present:1935736kB managed:1867440kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 12964 0 0
Node 0 DMA: 5*4kB (U) 0*8kB 1*16kB (U) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 68kB
Node 0 DMA32: 103*4kB (UME) 52*8kB (UME) 43*16kB (UME) 58*32kB (UME) 35*64kB (UME) 23*128kB (UME) 5*256kB (ME) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 9836kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
2897795 total pagecache pages
0 pages in swap cache
Free swap  = 8630724kB
Total swap = 8630776kB
3892604 pages RAM
0 pages HighMem/MovableOnly
101363 pages reserved
0 pages cma reserved
0 pages hwpoisoned

As you can see above, the ~11 GB of page cache has consumed DMA32 pages,
leaving only 9.8MB free but heavily fragmented with no contiguous blocks
≥512KB. Its hard to reproduce by a test. We have received several reports
for v6.11 kernel. As we don't have reliable reproducer yet, we cannot test
if other kernels are also affected.

Current mitigations are:
1 Pre-allocate buffer in drivers and don't free them even if they are only
  used during during initialization at boot and resume. But it wastes memory
  and unacceptable even if its just 2-4MB.
2 Drop caches at suspend. But it causes latency during suspension and
  slowness on resume. There is no way to drop only couple of GB of page
  cache as that wouldn't take long at suspend time.

Greg dislikes 1 and rejects it which is understandable. [1]:
> It should be reclaiming this, as it's just cache, not really used
> memory.

Would it be reasonable to add a mechanism to limit page cache growth?
I think, there should be some watermark or similar by which we can
indicate to page cache to don't go above it. Or at suspend, drop only
a part of of the page cache and not the entire page cache. What other
options are available? 

[1] https://lore.kernel.org/all/2025071722-panther-legwarmer-d2be@gregkh 

Thanks,
Muhammad Usama Anjum

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Excessive page cache occupies DMA32 memory
  2025-07-21 15:03 Excessive page cache occupies DMA32 memory Muhammad Usama Anjum
@ 2025-07-21 17:13 ` Matthew Wilcox
  2025-07-22  5:32   ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2025-07-21 17:13 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: linux-kernel, gregkh, Andrew Morton, kernel, linux-mm,
	linux-fsdevel

On Mon, Jul 21, 2025 at 08:03:12PM +0500, Muhammad Usama Anjum wrote:
> Hello,
> 
> When 10-12GB our of total 16GB RAM is being used as page cache
> (active_file + inactive_file) at suspend time, the drivers fail to allocate
> dma memory at resume as dma memory is either occupied by the page cache or
> fragmented. Example:
> 
> kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0

Just to be clear, this is not a page cache problem.  The driver is asking
us to do a 512kB allocation without doing I/O!  This is a ridiculous
request that should be expected to fail.

The solution, whatever it may be, is not related to the page cache.
I reject your diagnosis.  Almost all of the page cache is clean and
could be dropped (as far as I can tell from the output below).

Now, I'm not too familiar with how the page allocator chooses to fail
this request.  Maybe it should be trying harder to drop bits of the page
cache.  Maybe it should be doing some compaction.  I am not inclined to
go digging on your behalf, because frankly I'm offended by the suggestion
that the page cache is at fault.

Perhaps somebody else will help you, or you can dig into this yourself.

> CPU: 1 UID: 0 PID: 7693 Comm: kworker/u33:5 Not tainted 6.11.11-valve17-1-neptune-611-g027868a0ac03 #1 3843143b92e9da0fa2d3d5f21f51beaed15c7d59
> Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
> Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x4e/0x70
>  warn_alloc+0x164/0x190
>  ? srso_return_thunk+0x5/0x5f
>  ? __alloc_pages_direct_compact+0xaf/0x360
>  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
>  __alloc_pages_noprof+0x321/0x350
>  __dma_direct_alloc_pages.isra.0+0x14a/0x290
>  dma_direct_alloc+0x70/0x270
>  mhi_fw_load_handler+0x126/0x340 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf]
>  mhi_pm_st_worker+0x5e8/0xac0 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf]
>  ? srso_return_thunk+0x5/0x5f
>  process_one_work+0x17e/0x330
>  worker_thread+0x2ce/0x3f0
>  ? __pfx_worker_thread+0x10/0x10
>  kthread+0xd2/0x100
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork+0x34/0x50
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork_asm+0x1a/0x30
>  </TASK>
> Mem-Info:
> active_anon:513809 inactive_anon:152 isolated_anon:0
> active_file:359315 inactive_file:2487001 isolated_file:0
> unevictable:637 dirty:19 writeback:0
> slab_reclaimable:160391 slab_unreclaimable:39729
> mapped:175836 shmem:51039 pagetables:4415
> sec_pagetables:0 bounce:0
> kernel_misc_reclaimable:0
> free:125666 free_pcp:0 free_cma:0
> Node 0 active_anon:2055236kB inactive_anon:608kB active_file:1437260kB inactive_file:9948004kB unevictable:2548kB isolated(anon):0kB isolated(file):0kB mapped:703344kB dirty:76kB writeback:0kB shmem:204156kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:495616kB writeback_tmp:0kB kernel_stack:9440kB pagetables:17660kB sec_pagetables:0kB all_unreclaimable? no
> Node 0 DMA free:68kB boost:0kB min:68kB low:84kB high:100kB reserved_highatomic:0KB active_anon:8kB inactive_anon:0kB active_file:0kB inactive_file:13232kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> lowmem_reserve[]: 0 1808 14772 0 0
> Node 0 DMA32 free:9796kB boost:0kB min:8264kB low:10328kB high:12392kB reserved_highatomic:0KB active_anon:14148kB inactive_anon:88kB active_file:128kB inactive_file:1757192kB unevictable:0kB writepending:0kB present:1935736kB managed:1867440kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> lowmem_reserve[]: 0 0 12964 0 0
> Node 0 DMA: 5*4kB (U) 0*8kB 1*16kB (U) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 68kB
> Node 0 DMA32: 103*4kB (UME) 52*8kB (UME) 43*16kB (UME) 58*32kB (UME) 35*64kB (UME) 23*128kB (UME) 5*256kB (ME) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 9836kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> 2897795 total pagecache pages
> 0 pages in swap cache
> Free swap  = 8630724kB
> Total swap = 8630776kB
> 3892604 pages RAM
> 0 pages HighMem/MovableOnly
> 101363 pages reserved
> 0 pages cma reserved
> 0 pages hwpoisoned
> 
> As you can see above, the ~11 GB of page cache has consumed DMA32 pages,
> leaving only 9.8MB free but heavily fragmented with no contiguous blocks
> ≥512KB. Its hard to reproduce by a test. We have received several reports
> for v6.11 kernel. As we don't have reliable reproducer yet, we cannot test
> if other kernels are also affected.
> 
> Current mitigations are:
> 1 Pre-allocate buffer in drivers and don't free them even if they are only
>   used during during initialization at boot and resume. But it wastes memory
>   and unacceptable even if its just 2-4MB.
> 2 Drop caches at suspend. But it causes latency during suspension and
>   slowness on resume. There is no way to drop only couple of GB of page
>   cache as that wouldn't take long at suspend time.
> 
> Greg dislikes 1 and rejects it which is understandable. [1]:
> > It should be reclaiming this, as it's just cache, not really used
> > memory.
> 
> Would it be reasonable to add a mechanism to limit page cache growth?
> I think, there should be some watermark or similar by which we can
> indicate to page cache to don't go above it. Or at suspend, drop only
> a part of of the page cache and not the entire page cache. What other
> options are available? 
> 
> [1] https://lore.kernel.org/all/2025071722-panther-legwarmer-d2be@gregkh 
> 
> Thanks,
> Muhammad Usama Anjum

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Excessive page cache occupies DMA32 memory
  2025-07-21 17:13 ` Matthew Wilcox
@ 2025-07-22  5:32   ` Greg KH
  2025-07-22  6:05     ` Muhammad Usama Anjum
  0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2025-07-22  5:32 UTC (permalink / raw)
  To: Matthew Wilcox, Muhammad Usama Anjum, linux-kernel, Andrew Morton,
	kernel, linux-mm, linux-fsdevel

On Mon, Jul 21, 2025 at 06:13:10PM +0100, Matthew Wilcox wrote:
> On Mon, Jul 21, 2025 at 08:03:12PM +0500, Muhammad Usama Anjum wrote:
> > Hello,
> > 
> > When 10-12GB our of total 16GB RAM is being used as page cache
> > (active_file + inactive_file) at suspend time, the drivers fail to allocate
> > dma memory at resume as dma memory is either occupied by the page cache or
> > fragmented. Example:
> > 
> > kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
> 
> Just to be clear, this is not a page cache problem.  The driver is asking
> us to do a 512kB allocation without doing I/O!  This is a ridiculous
> request that should be expected to fail.
> 
> The solution, whatever it may be, is not related to the page cache.
> I reject your diagnosis.  Almost all of the page cache is clean and
> could be dropped (as far as I can tell from the output below).
> 
> Now, I'm not too familiar with how the page allocator chooses to fail
> this request.  Maybe it should be trying harder to drop bits of the page
> cache.  Maybe it should be doing some compaction.  I am not inclined to
> go digging on your behalf, because frankly I'm offended by the suggestion
> that the page cache is at fault.
> 
> Perhaps somebody else will help you, or you can dig into this yourself.

I'm with Matthew, this really looks like a driver bug somehow.  If there
is page cache memory that is "clean", the driver should be able to
access it just fine if really required.

What exact driver(s) is having this problem?  What is the exact error,
and on what lines of code?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Excessive page cache occupies DMA32 memory
  2025-07-22  5:32   ` Greg KH
@ 2025-07-22  6:05     ` Muhammad Usama Anjum
  2025-07-22  7:24       ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-22  6:05 UTC (permalink / raw)
  To: Greg KH, Matthew Wilcox, Baochen Qiang, Jeff Hugo,
	Manivannan Sadhasivam, Jeff Johnson, Marek Szyprowski
  Cc: linux-fsdevel, linux-mm, kernel, Andrew Morton, linux-kernel,
	iommu, Robin Murphy

Adding ath/mhi and dma API developers to the discussion.

On 7/22/25 10:32 AM, Greg KH wrote:
> On Mon, Jul 21, 2025 at 06:13:10PM +0100, Matthew Wilcox wrote:
>> On Mon, Jul 21, 2025 at 08:03:12PM +0500, Muhammad Usama Anjum wrote:
>>> Hello,
>>>
>>> When 10-12GB our of total 16GB RAM is being used as page cache
>>> (active_file + inactive_file) at suspend time, the drivers fail to allocate
>>> dma memory at resume as dma memory is either occupied by the page cache or
>>> fragmented. Example:
>>>
>>> kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
>>
>> Just to be clear, this is not a page cache problem.  The driver is asking
>> us to do a 512kB allocation without doing I/O!  This is a ridiculous
>> request that should be expected to fail.
>>
>> The solution, whatever it may be, is not related to the page cache.
>> I reject your diagnosis.  Almost all of the page cache is clean and
>> could be dropped (as far as I can tell from the output below).
>>
>> Now, I'm not too familiar with how the page allocator chooses to fail
>> this request.  Maybe it should be trying harder to drop bits of the page
>> cache.  Maybe it should be doing some compaction. 
That's very thoughtful. I'll look at the page allocator why isn't it dropping
cache or doing compaction.

>> I am not inclined to
>> go digging on your behalf, because frankly I'm offended by the suggestion
>> that the page cache is at fault.
I apologize—that wasn't my intention.

>>
>> Perhaps somebody else will help you, or you can dig into this yourself.
> 
> I'm with Matthew, this really looks like a driver bug somehow.  If there
> is page cache memory that is "clean", the driver should be able to
> access it just fine if really required.
> 
> What exact driver(s) is having this problem?  What is the exact error,
> and on what lines of code?
The issue occurs on both ath11k and mhi drivers during resume, when
dma_alloc_coherent(GFP_KERNEL) fails and returns -ENOMEM. This failure has
been observed at multiple points in these drivers.

For example, in the mhi driver, the failure is triggered when the
MHI's st_worker gets scheduled-in at resume.

mhi_pm_st_worker()
-> mhi_fw_load_handler()
   -> mhi_load_image_bhi()
      -> mhi_alloc_bhi_buffer()
         -> dma_alloc_coherent(GFP_KERNEL) returns -ENOMEM


Thank you,
- Usama


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Excessive page cache occupies DMA32 memory
  2025-07-22  6:05     ` Muhammad Usama Anjum
@ 2025-07-22  7:24       ` Greg KH
  2025-07-22 10:03         ` Robin Murphy
  0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2025-07-22  7:24 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Matthew Wilcox, Baochen Qiang, Jeff Hugo, Manivannan Sadhasivam,
	Jeff Johnson, Marek Szyprowski, linux-fsdevel, linux-mm, kernel,
	Andrew Morton, linux-kernel, iommu, Robin Murphy

On Tue, Jul 22, 2025 at 11:05:11AM +0500, Muhammad Usama Anjum wrote:
> Adding ath/mhi and dma API developers to the discussion.
> 
> On 7/22/25 10:32 AM, Greg KH wrote:
> > On Mon, Jul 21, 2025 at 06:13:10PM +0100, Matthew Wilcox wrote:
> >> On Mon, Jul 21, 2025 at 08:03:12PM +0500, Muhammad Usama Anjum wrote:
> >>> Hello,
> >>>
> >>> When 10-12GB our of total 16GB RAM is being used as page cache
> >>> (active_file + inactive_file) at suspend time, the drivers fail to allocate
> >>> dma memory at resume as dma memory is either occupied by the page cache or
> >>> fragmented. Example:
> >>>
> >>> kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
> >>
> >> Just to be clear, this is not a page cache problem.  The driver is asking
> >> us to do a 512kB allocation without doing I/O!  This is a ridiculous
> >> request that should be expected to fail.
> >>
> >> The solution, whatever it may be, is not related to the page cache.
> >> I reject your diagnosis.  Almost all of the page cache is clean and
> >> could be dropped (as far as I can tell from the output below).
> >>
> >> Now, I'm not too familiar with how the page allocator chooses to fail
> >> this request.  Maybe it should be trying harder to drop bits of the page
> >> cache.  Maybe it should be doing some compaction. 
> That's very thoughtful. I'll look at the page allocator why isn't it dropping
> cache or doing compaction.
> 
> >> I am not inclined to
> >> go digging on your behalf, because frankly I'm offended by the suggestion
> >> that the page cache is at fault.
> I apologize—that wasn't my intention.
> 
> >>
> >> Perhaps somebody else will help you, or you can dig into this yourself.
> > 
> > I'm with Matthew, this really looks like a driver bug somehow.  If there
> > is page cache memory that is "clean", the driver should be able to
> > access it just fine if really required.
> > 
> > What exact driver(s) is having this problem?  What is the exact error,
> > and on what lines of code?
> The issue occurs on both ath11k and mhi drivers during resume, when
> dma_alloc_coherent(GFP_KERNEL) fails and returns -ENOMEM. This failure has
> been observed at multiple points in these drivers.
> 
> For example, in the mhi driver, the failure is triggered when the
> MHI's st_worker gets scheduled-in at resume.
> 
> mhi_pm_st_worker()
> -> mhi_fw_load_handler()
>    -> mhi_load_image_bhi()
>       -> mhi_alloc_bhi_buffer()
>          -> dma_alloc_coherent(GFP_KERNEL) returns -ENOMEM

And what is the exact size you are asking for here?
What is the dma ops set to for your system?  Are you sure that is
working properly for your platform?  What platform is this exactly?

The driver isn't asking for DMA32 here, so that shouldn't be the issue,
so why do you feel it is?  Have you tried using the tracing stuff for
dma allocations to see exactly what is going on for this failure?

I think you need to do a bit more debugging :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Excessive page cache occupies DMA32 memory
  2025-07-22  7:24       ` Greg KH
@ 2025-07-22 10:03         ` Robin Murphy
  2025-07-23  6:50           ` Baochen Qiang
  2025-08-21 13:39           ` Muhammad Usama Anjum
  0 siblings, 2 replies; 8+ messages in thread
From: Robin Murphy @ 2025-07-22 10:03 UTC (permalink / raw)
  To: Greg KH, Muhammad Usama Anjum
  Cc: Matthew Wilcox, Baochen Qiang, Jeff Hugo, Manivannan Sadhasivam,
	Jeff Johnson, Marek Szyprowski, linux-fsdevel, linux-mm, kernel,
	Andrew Morton, linux-kernel, iommu

On 2025-07-22 8:24 am, Greg KH wrote:
> On Tue, Jul 22, 2025 at 11:05:11AM +0500, Muhammad Usama Anjum wrote:
>> Adding ath/mhi and dma API developers to the discussion.
>>
>> On 7/22/25 10:32 AM, Greg KH wrote:
>>> On Mon, Jul 21, 2025 at 06:13:10PM +0100, Matthew Wilcox wrote:
>>>> On Mon, Jul 21, 2025 at 08:03:12PM +0500, Muhammad Usama Anjum wrote:
>>>>> Hello,
>>>>>
>>>>> When 10-12GB our of total 16GB RAM is being used as page cache
>>>>> (active_file + inactive_file) at suspend time, the drivers fail to allocate
>>>>> dma memory at resume as dma memory is either occupied by the page cache or
>>>>> fragmented. Example:
>>>>>
>>>>> kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
>>>>
>>>> Just to be clear, this is not a page cache problem.  The driver is asking
>>>> us to do a 512kB allocation without doing I/O!  This is a ridiculous
>>>> request that should be expected to fail.
>>>>
>>>> The solution, whatever it may be, is not related to the page cache.
>>>> I reject your diagnosis.  Almost all of the page cache is clean and
>>>> could be dropped (as far as I can tell from the output below).
>>>>
>>>> Now, I'm not too familiar with how the page allocator chooses to fail
>>>> this request.  Maybe it should be trying harder to drop bits of the page
>>>> cache.  Maybe it should be doing some compaction.
>> That's very thoughtful. I'll look at the page allocator why isn't it dropping
>> cache or doing compaction.
>>
>>>> I am not inclined to
>>>> go digging on your behalf, because frankly I'm offended by the suggestion
>>>> that the page cache is at fault.
>> I apologize—that wasn't my intention.
>>
>>>>
>>>> Perhaps somebody else will help you, or you can dig into this yourself.
>>>
>>> I'm with Matthew, this really looks like a driver bug somehow.  If there
>>> is page cache memory that is "clean", the driver should be able to
>>> access it just fine if really required.
>>>
>>> What exact driver(s) is having this problem?  What is the exact error,
>>> and on what lines of code?
>> The issue occurs on both ath11k and mhi drivers during resume, when
>> dma_alloc_coherent(GFP_KERNEL) fails and returns -ENOMEM. This failure has
>> been observed at multiple points in these drivers.
>>
>> For example, in the mhi driver, the failure is triggered when the
>> MHI's st_worker gets scheduled-in at resume.
>>
>> mhi_pm_st_worker()
>> -> mhi_fw_load_handler()
>>     -> mhi_load_image_bhi()
>>        -> mhi_alloc_bhi_buffer()
>>           -> dma_alloc_coherent(GFP_KERNEL) returns -ENOMEM
> 
> And what is the exact size you are asking for here?
> What is the dma ops set to for your system?  Are you sure that is
> working properly for your platform?  What platform is this exactly?
> 
> The driver isn't asking for DMA32 here, so that shouldn't be the issue,
> so why do you feel it is?  Have you tried using the tracing stuff for
> dma allocations to see exactly what is going on for this failure?

I'm guessing the device has a 32-bit DMA mask, and the allocation ends 
up in __dma_direct_alloc_pages() such that that adds GFP_DMA32 in order 
to try to satisfy the mask via regular page allocation. How GFP_KERNEL 
turns into GFP_NOIO, though, given that the DMA layer certainly isn't 
(knowingly) messing with __GFP_IO or __GFP_FS, is more of a mystery... I 
suppose "during resume" is the red flag there - is this worker perhaps 
trying to run too early in some restricted context before the rest of 
the system has fully woken up?

Thanks,
Robin.

> 
> I think you need to do a bit more debugging :)
> 
> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Excessive page cache occupies DMA32 memory
  2025-07-22 10:03         ` Robin Murphy
@ 2025-07-23  6:50           ` Baochen Qiang
  2025-08-21 13:39           ` Muhammad Usama Anjum
  1 sibling, 0 replies; 8+ messages in thread
From: Baochen Qiang @ 2025-07-23  6:50 UTC (permalink / raw)
  To: Robin Murphy, Greg KH, Muhammad Usama Anjum
  Cc: Matthew Wilcox, Jeff Hugo, Manivannan Sadhasivam, Jeff Johnson,
	Marek Szyprowski, linux-fsdevel, linux-mm, kernel, Andrew Morton,
	linux-kernel, iommu



On 7/22/2025 6:03 PM, Robin Murphy wrote:
> On 2025-07-22 8:24 am, Greg KH wrote:
>> On Tue, Jul 22, 2025 at 11:05:11AM +0500, Muhammad Usama Anjum wrote:
>>> Adding ath/mhi and dma API developers to the discussion.
>>>
>>> On 7/22/25 10:32 AM, Greg KH wrote:
>>>> On Mon, Jul 21, 2025 at 06:13:10PM +0100, Matthew Wilcox wrote:
>>>>> On Mon, Jul 21, 2025 at 08:03:12PM +0500, Muhammad Usama Anjum wrote:
>>>>>> Hello,
>>>>>>
>>>>>> When 10-12GB our of total 16GB RAM is being used as page cache
>>>>>> (active_file + inactive_file) at suspend time, the drivers fail to allocate
>>>>>> dma memory at resume as dma memory is either occupied by the page cache or
>>>>>> fragmented. Example:
>>>>>>
>>>>>> kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32),
>>>>>> nodemask=(null),cpuset=/,mems_allowed=0
>>>>>
>>>>> Just to be clear, this is not a page cache problem.  The driver is asking
>>>>> us to do a 512kB allocation without doing I/O!  This is a ridiculous
>>>>> request that should be expected to fail.
>>>>>
>>>>> The solution, whatever it may be, is not related to the page cache.
>>>>> I reject your diagnosis.  Almost all of the page cache is clean and
>>>>> could be dropped (as far as I can tell from the output below).
>>>>>
>>>>> Now, I'm not too familiar with how the page allocator chooses to fail
>>>>> this request.  Maybe it should be trying harder to drop bits of the page
>>>>> cache.  Maybe it should be doing some compaction.
>>> That's very thoughtful. I'll look at the page allocator why isn't it dropping
>>> cache or doing compaction.
>>>
>>>>> I am not inclined to
>>>>> go digging on your behalf, because frankly I'm offended by the suggestion
>>>>> that the page cache is at fault.
>>> I apologize—that wasn't my intention.
>>>
>>>>>
>>>>> Perhaps somebody else will help you, or you can dig into this yourself.
>>>>
>>>> I'm with Matthew, this really looks like a driver bug somehow.  If there
>>>> is page cache memory that is "clean", the driver should be able to
>>>> access it just fine if really required.
>>>>
>>>> What exact driver(s) is having this problem?  What is the exact error,
>>>> and on what lines of code?
>>> The issue occurs on both ath11k and mhi drivers during resume, when
>>> dma_alloc_coherent(GFP_KERNEL) fails and returns -ENOMEM. This failure has
>>> been observed at multiple points in these drivers.
>>>
>>> For example, in the mhi driver, the failure is triggered when the
>>> MHI's st_worker gets scheduled-in at resume.
>>>
>>> mhi_pm_st_worker()
>>> -> mhi_fw_load_handler()
>>>     -> mhi_load_image_bhi()
>>>        -> mhi_alloc_bhi_buffer()
>>>           -> dma_alloc_coherent(GFP_KERNEL) returns -ENOMEM
>>
>> And what is the exact size you are asking for here?
>> What is the dma ops set to for your system?  Are you sure that is
>> working properly for your platform?  What platform is this exactly?
>>
>> The driver isn't asking for DMA32 here, so that shouldn't be the issue,
>> so why do you feel it is?  Have you tried using the tracing stuff for
>> dma allocations to see exactly what is going on for this failure?
> 
> I'm guessing the device has a 32-bit DMA mask, and the allocation ends up in

Yeah, the device is capable of 32 bit coherent DMA only.

> __dma_direct_alloc_pages() such that that adds GFP_DMA32 in order to try to satisfy the
> mask via regular page allocation. How GFP_KERNEL turns into GFP_NOIO, though, given that
> the DMA layer certainly isn't (knowingly) messing with __GFP_IO or __GFP_FS, is more of a
> mystery... I suppose "during resume" is the red flag there - is this worker perhaps trying
> to run too early in some restricted context before the rest of the system has fully woken up?

the worker is running at __resume_early stage.

> 
> Thanks,
> Robin.
> 
>>
>> I think you need to do a bit more debugging :)
>>
>> thanks,
>>
>> greg k-h
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Excessive page cache occupies DMA32 memory
  2025-07-22 10:03         ` Robin Murphy
  2025-07-23  6:50           ` Baochen Qiang
@ 2025-08-21 13:39           ` Muhammad Usama Anjum
  1 sibling, 0 replies; 8+ messages in thread
From: Muhammad Usama Anjum @ 2025-08-21 13:39 UTC (permalink / raw)
  To: Robin Murphy, Greg KH
  Cc: usama.anjum, Matthew Wilcox, Baochen Qiang, Jeff Hugo,
	Manivannan Sadhasivam, Jeff Johnson, Marek Szyprowski,
	linux-fsdevel, linux-mm, kernel, Andrew Morton, linux-kernel,
	iommu, David Hildenbrand

Sorry, it took some time to investigate.

On 7/22/25 3:03 PM, Robin Murphy wrote:
> On 2025-07-22 8:24 am, Greg KH wrote:
>> On Tue, Jul 22, 2025 at 11:05:11AM +0500, Muhammad Usama Anjum wrote:
>>> Adding ath/mhi and dma API developers to the discussion.
>>>
>>> On 7/22/25 10:32 AM, Greg KH wrote:
>>>> On Mon, Jul 21, 2025 at 06:13:10PM +0100, Matthew Wilcox wrote:
>>>>> On Mon, Jul 21, 2025 at 08:03:12PM +0500, Muhammad Usama Anjum wrote:
>>>>>> Hello,
>>>>>>
>>>>>> When 10-12GB our of total 16GB RAM is being used as page cache
>>>>>> (active_file + inactive_file) at suspend time, the drivers fail to allocate
>>>>>> dma memory at resume as dma memory is either occupied by the page cache or
>>>>>> fragmented. Example:
>>>>>>
>>>>>> kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
>>>>>
>>>>> Just to be clear, this is not a page cache problem.  The driver is asking
>>>>> us to do a 512kB allocation without doing I/O!  This is a ridiculous
>>>>> request that should be expected to fail.
>>>>>
>>>>> The solution, whatever it may be, is not related to the page cache.
>>>>> I reject your diagnosis.  Almost all of the page cache is clean and
>>>>> could be dropped (as far as I can tell from the output below).
>>>>>
>>>>> Now, I'm not too familiar with how the page allocator chooses to fail
>>>>> this request.  Maybe it should be trying harder to drop bits of the page
>>>>> cache.  Maybe it should be doing some compaction.
>>> That's very thoughtful. I'll look at the page allocator why isn't it dropping
>>> cache or doing compaction.
>>>
>>>>> I am not inclined to
>>>>> go digging on your behalf, because frankly I'm offended by the suggestion
>>>>> that the page cache is at fault.
>>> I apologize—that wasn't my intention.
>>>
>>>>>
>>>>> Perhaps somebody else will help you, or you can dig into this yourself.
>>>>
>>>> I'm with Matthew, this really looks like a driver bug somehow.  If there
>>>> is page cache memory that is "clean", the driver should be able to
>>>> access it just fine if really required.
>>>>
>>>> What exact driver(s) is having this problem?  What is the exact error,
>>>> and on what lines of code?
>>> The issue occurs on both ath11k and mhi drivers during resume, when
>>> dma_alloc_coherent(GFP_KERNEL) fails and returns -ENOMEM. This failure has
>>> been observed at multiple points in these drivers.
>>>
>>> For example, in the mhi driver, the failure is triggered when the
>>> MHI's st_worker gets scheduled-in at resume.
>>>
>>> mhi_pm_st_worker()
>>> -> mhi_fw_load_handler()
>>>     -> mhi_load_image_bhi()
>>>        -> mhi_alloc_bhi_buffer()
>>>           -> dma_alloc_coherent(GFP_KERNEL) returns -ENOMEM
>>
>> And what is the exact size you are asking for here?
512 KB

>> What is the dma ops set to for your system?  Are you sure that is
>> working properly for your platform?  What platform is this exactly?
Its x86_64 device.

>>
>> The driver isn't asking for DMA32 here, so that shouldn't be the issue,
>> so why do you feel it is?  Have you tried using the tracing stuff for
>> dma allocations to see exactly what is going on for this failure?
> 
> I'm guessing the device has a 32-bit DMA mask, and the allocation ends up in __dma_direct_alloc_pages() such that that adds GFP_DMA32 in order to try to satisfy the mask via regular page allocation. How GFP_KERNEL turns into GFP_NOIO, though, given that the DMA layer certainly isn't (knowingly) messing with __GFP_IO or __GFP_FS, is more of a mystery... I suppose "during resume" is the red flag there - is this worker perhaps trying to run too early in some restricted context before the rest of the system has fully woken up?

So GFP_KERNEL gets converted to only GFP_RECLAIM as GFP_IO and GFP_FS
are disabled by the pm subsystem at suspend time and they are only enabled
after the system has woken up.

GFP_FLAGS
0xcc0		GFP_KERNL = GFP_RECLAIM | GFP_IO | __GFP_FS
0xcc4		GFP_RECLAIM | GFP_IO | __GFP_FS | ___GFP_DMA32
0xc04		GFP_RECLAIM | ___GFP_DMA32

Somewhat debugging log:

[ 1914.214543] mhi_fw_load_handler:
[ 1914.220346] [Debug] dma_alloc_coherent cc0
[ 1914.220352] [Debug] dma_alloc_attrs cc0
[ 1914.220359] [Debug] __dma_direct_alloc_pages cc0
[ 1914.220360] [Debug] __dma_direct_alloc_pages cc4
[ 1914.220365] [Debug] __alloc_pages_noprof cc4
[ 1914.220367] [Debug] __alloc_pages_noprof allowed c04
[ 1914.220371] [Debug] prepare_alloc_pages allowed alloc_gfp = c04 alloc_flags = 1
[ 1914.220374] [Debug] prepare_alloc_pages allowed alloc_gfp = c04 alloc_flags = 1
[ 1914.220379] [Debug] __alloc_pages_slowpath [restart] gfp_mask c04
[ 1914.220381] [Debug] __alloc_pages_slowpath alloc_flags 840
[ 1914.220384] [Debug] __alloc_pages_slowpath: skipping direct compaction
[ 1914.220386] [Debug] __alloc_pages_slowpath [retry]
[ 1914.220387] [Debug] __alloc_pages_slowpath wake_all_kswapds
[ 1914.220836] [Debug] __alloc_pages_slowpath: [nopage] no page found
[ 1914.220839] [Debug] __alloc_pages_slowpath: GFP_NOFAIL not set

Just for experimenting even if I keep GFP_IO and GFP_FS enabled, kswapd's
waitqueue show that its already active.

Another hack which I've tested is by adding __GFP_NOFAIL with GFP_KERNEL, the
allocation worked this time. But kernel seemed to tried very hard and finally
found memory from somewhere.

Its hard to identify the actual issue.

Although its hard to reproduce (I've very strange reproducer), I've tested v6.15.11
and I'm not able to  reproduce the same issue there. So something has changed
which isn't triggering this issue. I plan to do bisection now. 

Please feel free to share if you think there can be something better to debug/bisect
it.

-- 
---
Thanks,
Usama

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-21 13:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-21 15:03 Excessive page cache occupies DMA32 memory Muhammad Usama Anjum
2025-07-21 17:13 ` Matthew Wilcox
2025-07-22  5:32   ` Greg KH
2025-07-22  6:05     ` Muhammad Usama Anjum
2025-07-22  7:24       ` Greg KH
2025-07-22 10:03         ` Robin Murphy
2025-07-23  6:50           ` Baochen Qiang
2025-08-21 13:39           ` Muhammad Usama Anjum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).