All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yosry Ahmed <yosryahmed@google.com>,
	Rik van Riel <riel@surriel.com>,
	Balbir Singh <balbirs@nvidia.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	hakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel-team@meta.com,
	Nhat Pham <nphamcs@gmail.com>
Subject: Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap
Date: Tue, 14 Jan 2025 17:46:37 +0100	[thread overview]
Message-ID: <Z4aU7dn_TKeeTmP_@tiehlicka> (raw)
In-Reply-To: <20250114160955.GA1115056@cmpxchg.org>

On Tue 14-01-25 11:09:55, Johannes Weiner wrote:
> Hi,
> 
> On Mon, Dec 16, 2024 at 04:39:12PM +0100, Michal Hocko wrote:
> > On Thu 12-12-24 13:30:12, Johannes Weiner wrote:
[...]
> > > If we return -ENOMEM to an OOM victim in a fault, the fault handler
> > > will re-trigger OOM, which will find the existing OOM victim and do
> > > nothing, then restart the fault.
> > 
> > IIRC the task will handle the pending SIGKILL if the #PF fails. If the
> > charge happens from the exit path then we rely on ENOMEM returned from
> > gup as a signal to back off. Do we have any caller that keeps retrying
> > on ENOMEM?
> 
> We managed to extract a stack trace of the livelocked task:
> 
> obj_cgroup_may_swap
> zswap_store
> swap_writepage
> shrink_folio_list
> shrink_lruvec
> shrink_node
> do_try_to_free_pages
> try_to_free_mem_cgroup_pages

OK, so this is the reclaim path and it fails due to reasons you mention
below. This will retry several times until it hits mem_cgroup_oom which
will bail in mem_cgroup_out_of_memory because of task_is_dying (returns
true) and retry the charge + reclaim (as the oom killer hasn't done
anything) with passed_oom = true this time and eventually got to nomem
path and returns ENOMEM. This should propaged -ENOMEM down the path

> charge_memcg
> mem_cgroup_swapin_charge_folio
> __read_swap_cache_async
> swapin_readahead
> do_swap_page
> handle_mm_fault
> do_user_addr_fault
> exc_page_fault
> asm_exc_page_fault
> __get_user

All the way here and return the failure to futex_cleanup which doesn't
retry __get_user on the failure AFAICS (exit_robust_list). But I might
be missing something, it's been quite some time since I've looked into
futex code.

> futex_cleanup
> fuxtex_exit_release
> do_exit
> do_group_exit
> get_signal
> arch_do_signal_or_restart
> exit_to_user_mode_prepare
> syscall_exit_to_user_mode
> do_syscall
> entry_SYSCALL_64
> syscall
> 
> Both memory.max and memory.zswap.max are hit. I don't see how this
> could ever make forward progress - the futex fault will retry until it
> succeeds.

I must be missing something but I do not see the retry, could you point
me where this is happening please?

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2025-01-14 16:46 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-12 16:57 [PATCH v2] memcg: allow exiting tasks to write back data to swap Rik van Riel
2024-12-12 17:06 ` Yosry Ahmed
2024-12-12 17:51   ` Shakeel Butt
2024-12-12 18:02     ` Rik van Riel
2024-12-12 18:18       ` Nhat Pham
2024-12-12 18:11   ` Nhat Pham
2024-12-12 18:30   ` Johannes Weiner
2024-12-12 21:35     ` Shakeel Butt
2024-12-12 21:41       ` Yosry Ahmed
2024-12-13  0:32     ` Roman Gushchin
2024-12-13  4:42       ` Johannes Weiner
2024-12-16 15:39     ` Michal Hocko
2025-01-14 16:09       ` Johannes Weiner
2025-01-14 16:46         ` Michal Hocko [this message]
2025-01-14 16:51           ` Rik van Riel
2025-01-14 17:00             ` Michal Hocko
2025-01-14 17:11               ` Rik van Riel
2025-01-14 18:13                 ` Michal Hocko
2025-01-14 19:23                   ` Johannes Weiner
2025-01-14 19:42                     ` Michal Hocko
2025-01-15 17:35                       ` Rik van Riel
2025-01-15 19:41                         ` Michal Hocko
2025-01-14 16:54           ` Michal Hocko
2025-01-14 16:56             ` Rik van Riel
2025-01-14 16:56             ` Michal Hocko
2024-12-12 18:31 ` Roman Gushchin
2024-12-12 20:00   ` Rik van Riel
2024-12-13  0:49     ` Roman Gushchin
2024-12-13  2:54     ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4aU7dn_TKeeTmP_@tiehlicka \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbirs@nvidia.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.