* iomap_writepages WARN_ON_ONCE(PF_MEMALLOC)
@ 2026-07-01 20:51 Andreas Gruenbacher
2026-07-02 10:59 ` Christoph Hellwig
2026-07-04 4:40 ` Shakeel Butt
0 siblings, 2 replies; 4+ messages in thread
From: Andreas Gruenbacher @ 2026-07-01 20:51 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Darrick J. Wong, gfs2, linux-xfs,
linux-fsdevel, LKML
Hi Christoph,
I'm running into a problem with the following check in
iomap_writepages() on gfs2 in RHEL-8:
/*
* Writeback from reclaim context should never happen except in the case
* of a VM regression so warn about it and refuse to write the data.
*/
if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC | PF_KSWAPD)) ==
PF_MEMALLOC))
return -EIO;
In my case, reclaim is triggered by the following command:
echo anything > /sys/fs/cgroup/memory/.../memory.force_empty
And we end up on this code path:
mem_cgroup_force_empty_write -> mem_cgroup_force_empty ->
try_to_free_mem_cgroup_pages -> do_try_to_free_pages ->
shrink_zones -> shrink_node -> shrink_node_memcgs -> shrink_slab ->
do_shrink_slab -> super_cache_scan -> prune_icache_sb -> dispose_list ->
evict -> gfs2_evict_inode -> evict_linked_inode -> gfs2_log_flush ->
gfs2_ordered_write -> filemap_fdatawrite -> __filemap_fdatawrite ->
__filemap_fdatawrite_range -> do_writepages -> gfs2_writepages ->
iomap_writepages
The PF_MEMALLOC flag is set in try_to_free_mem_cgroup_pages() before
calling do_try_to_free_pages(). We end up in gfs2_evict_inode(), and
the filesystem is required to evict the inode. At that point, gfs2 has
no way of recovering from an -EIO error from iomap_writepages().
That WARN_ON_ONCE() was originally added to xfs in commit 070ecdca54dd
("xfs: skip writeback from reclaim context") and later modified to
allow kswapd writerback in commit d4f7a5cbd544 ("xfs: allow writeback
from kswapd").
Before commit c2dc7e5589a1 ("iomap: move the PF_MEMALLOC check to
iomap_writepages"), the comment above the WARN_ON_ONCE() indicated
that the reason for rejecting the write was to prevent stack
overflows. But this code path triggers reclaim right at the top of the
stack, so that's certainly not an issue here.
Do you have any suggestions?
The above is on RHEL-8, but a similar code path exists in cgroups v2,
triggered via:
echo reclaim_amount > /sys/fs/cgroup/.../memory.reclaim
That code path starts with:
memory_reclaim -> user_proactive_reclaim -> try_to_free_mem_cgroup_pages ->
Thanks,
Andreas
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: iomap_writepages WARN_ON_ONCE(PF_MEMALLOC)
2026-07-01 20:51 iomap_writepages WARN_ON_ONCE(PF_MEMALLOC) Andreas Gruenbacher
@ 2026-07-02 10:59 ` Christoph Hellwig
2026-07-04 0:56 ` Matthew Wilcox
2026-07-04 4:40 ` Shakeel Butt
1 sibling, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2026-07-02 10:59 UTC (permalink / raw)
To: Andreas Gruenbacher
Cc: Christoph Hellwig, Christian Brauner, Darrick J. Wong, gfs2,
linux-xfs, linux-fsdevel, LKML
On Wed, Jul 01, 2026 at 10:51:06PM +0200, Andreas Gruenbacher wrote:
> Do you have any suggestions?
>
> The above is on RHEL-8, but a similar code path exists in cgroups v2,
> triggered via:
>
> echo reclaim_amount > /sys/fs/cgroup/.../memory.reclaim
>
> That code path starts with:
>
> memory_reclaim -> user_proactive_reclaim -> try_to_free_mem_cgroup_pages ->
PF_MEMALLOC is a sign of direct reclaim. cgroup code is doing really
weird things when it is set and it is doing writeback.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: iomap_writepages WARN_ON_ONCE(PF_MEMALLOC)
2026-07-02 10:59 ` Christoph Hellwig
@ 2026-07-04 0:56 ` Matthew Wilcox
0 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2026-07-04 0:56 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andreas Gruenbacher, Christian Brauner, Darrick J. Wong, gfs2,
linux-xfs, linux-fsdevel, LKML, Tejun Heo, Johannes Weiner,
Michal Koutný, cgroups, Michal Hocko, Roman Gushchin,
Shakeel Butt, Muchun Song, linux-mm
On Thu, Jul 02, 2026 at 03:59:24AM -0700, Christoph Hellwig wrote:
> On Wed, Jul 01, 2026 at 10:51:06PM +0200, Andreas Gruenbacher wrote:
> > Do you have any suggestions?
> >
> > The above is on RHEL-8, but a similar code path exists in cgroups v2,
> > triggered via:
> >
> > echo reclaim_amount > /sys/fs/cgroup/.../memory.reclaim
> >
> > That code path starts with:
> >
> > memory_reclaim -> user_proactive_reclaim -> try_to_free_mem_cgroup_pages ->
>
> PF_MEMALLOC is a sign of direct reclaim. cgroup code is doing really
> weird things when it is set and it is doing writeback.
If we're going to blame the cgroup people for doing weird things, let's
cc them so they stand a chance of seeing this ... original at:
https://lore.kernel.org/linux-fsdevel/CAHc6FU4tz8-HmEf2_XKT0NT8N=rv5OMcY79PxTACkXAVLOAUpg@mail.gmail.com/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: iomap_writepages WARN_ON_ONCE(PF_MEMALLOC)
2026-07-01 20:51 iomap_writepages WARN_ON_ONCE(PF_MEMALLOC) Andreas Gruenbacher
2026-07-02 10:59 ` Christoph Hellwig
@ 2026-07-04 4:40 ` Shakeel Butt
1 sibling, 0 replies; 4+ messages in thread
From: Shakeel Butt @ 2026-07-04 4:40 UTC (permalink / raw)
To: Andreas Gruenbacher
Cc: Christoph Hellwig, Christian Brauner, Darrick J. Wong, gfs2,
linux-xfs, linux-fsdevel, LKML
On Wed, Jul 01, 2026 at 10:51:06PM +0200, Andreas Gruenbacher wrote:
> Hi Christoph,
>
> I'm running into a problem with the following check in
> iomap_writepages() on gfs2 in RHEL-8:
>
> /*
> * Writeback from reclaim context should never happen except in the case
> * of a VM regression so warn about it and refuse to write the data.
> */
> if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC | PF_KSWAPD)) ==
> PF_MEMALLOC))
> return -EIO;
>
> In my case, reclaim is triggered by the following command:
>
> echo anything > /sys/fs/cgroup/memory/.../memory.force_empty
>
> And we end up on this code path:
>
> mem_cgroup_force_empty_write -> mem_cgroup_force_empty ->
> try_to_free_mem_cgroup_pages -> do_try_to_free_pages ->
> shrink_zones -> shrink_node -> shrink_node_memcgs -> shrink_slab ->
> do_shrink_slab -> super_cache_scan -> prune_icache_sb -> dispose_list ->
> evict -> gfs2_evict_inode -> evict_linked_inode -> gfs2_log_flush ->
> gfs2_ordered_write -> filemap_fdatawrite -> __filemap_fdatawrite ->
> __filemap_fdatawrite_range -> do_writepages -> gfs2_writepages ->
> iomap_writepages
I don't see anything specific to memcg here as the same code path as above can
be taken in the global reclaim. The main question is why gfs2 is evicting a
dirty inode from reclaim. If you check prune_icache_sb which uses
inode_lru_isolate() to select inodes to evict and inode_lru_isolate() has
(inode_state_read(inode) & ~I_REFERENCED) check to filter dirty inodes.
It seems to me that gfs2 and VFS are not agreeing if an inode is dirty or not
and a simple search in gfs2 shows that it is doing (inode->i_flags & I_DIRTY)
instead of (inode_state_read(inode) & I_DIRTY)). Since core vfs code is using
inode_state_read() wrapper to check inode state, I am assuming that is the right
way to do it.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-07-04 4:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 20:51 iomap_writepages WARN_ON_ONCE(PF_MEMALLOC) Andreas Gruenbacher
2026-07-02 10:59 ` Christoph Hellwig
2026-07-04 0:56 ` Matthew Wilcox
2026-07-04 4:40 ` Shakeel Butt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox