From: Shakeel Butt <shakeelb@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Linux MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Cgroups <cgroups@vger.kernel.org>,
Mina Almasry <almasrymina@google.com>,
David Rientjes <rientjes@google.com>,
Greg Thelen <gthelen@google.com>,
Sandipan Das <sandipan@linux.ibm.com>,
Shuah Khan <shuah@kernel.org>,
Adrian Moreno <amorenoz@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
stable@vger.kernel.org
Subject: Re: [PATCH] hugetlb_cgroup: fix offline of hugetlb cgroup with reservations
Date: Thu, 3 Dec 2020 14:11:49 -0800 [thread overview]
Message-ID: <CALvZod44ZLA8U=ormvuKZhJ1vCJf8qOHMRSouih4E-oaLihV=Q@mail.gmail.com> (raw)
In-Reply-To: <20201203220242.158165-1-mike.kravetz@oracle.com>
On Thu, Dec 3, 2020 at 2:04 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> Adrian Moreno was ruuning a kubernetes 1.19 + containerd/docker workload
> using hugetlbfs. In this environment the issue is reproduced by:
> 1 - Start a simple pod that uses the recently added HugePages medium
> feature (pod yaml attached)
> 2 - Start a DPDK app. It doesn't need to run successfully (as in transfer
> packets) nor interact with real hardware. It seems just initializing
> the EAL layer (which handles hugepage reservation and locking) is
> enough to trigger the issue
> 3 - Delete the Pod (or let it "Complete").
>
> This would result in a kworker thread going into a tight loop (top output):
> 1425 root 20 0 0 0 0 R 99.7 0.0 5:22.45
> kworker/28:7+cgroup_destroy
>
> 'perf top -g' reports:
> - 63.28% 0.01% [kernel] [k] worker_thread
> - 49.97% worker_thread
> - 52.64% process_one_work
> - 62.08% css_killed_work_fn
> - hugetlb_cgroup_css_offline
> 41.52% _raw_spin_lock
> - 2.82% _cond_resched
> rcu_all_qs
> 2.66% PageHuge
> - 0.57% schedule
> - 0.57% __schedule
>
> We are spinning in the do-while loop in hugetlb_cgroup_css_offline.
> Worse yet, we are holding the master cgroup lock (cgroup_mutex) while
> infinitely spinning. Little else can be done on the system as the
> cgroup_mutex can not be acquired.
>
> Do note that the issue can be reproduced by simply offlining a hugetlb
> cgroup containing pages with reservation counts.
>
> The loop in hugetlb_cgroup_css_offline is moving page counts from the
> cgroup being offlined to the parent cgroup. This is done for each hstate,
> and is repeated until hugetlb_cgroup_have_usage returns false. The routine
> moving counts (hugetlb_cgroup_move_parent) is only moving 'usage' counts.
> The routine hugetlb_cgroup_have_usage is checking for both 'usage' and
> 'reservation' counts. Discussion about what to do with reservation
> counts when reparenting was discussed here:
>
> https://lore.kernel.org/linux-kselftest/CAHS8izMFAYTgxym-Hzb_JmkTK1N_S9tGN71uS6MFV+R7swYu5A@mail.gmail.com/
>
> The decision was made to leave a zombie cgroup for with reservation
> counts. Unfortunately, the code checking reservation counts was
> incorrectly added to hugetlb_cgroup_have_usage.
>
> To fix the issue, simply remove the check for reservation counts. While
> fixing this issue, a related bug in hugetlb_cgroup_css_offline was noticed.
> The hstate index is not reinitialized each time through the do-while loop.
> Fix this as well.
>
> Fixes: 1adc4d419aa2 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations")
> Cc: <stable@vger.kernel.org>
> Reported-by: Adrian Moreno <amorenoz@redhat.com>
> Tested-by: Adrian Moreno <amorenoz@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
prev parent reply other threads:[~2020-12-03 22:12 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-03 22:02 [PATCH] hugetlb_cgroup: fix offline of hugetlb cgroup with reservations Mike Kravetz
2020-12-03 22:11 ` Shakeel Butt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALvZod44ZLA8U=ormvuKZhJ1vCJf8qOHMRSouih4E-oaLihV=Q@mail.gmail.com' \
--to=shakeelb@google.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=amorenoz@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=gthelen@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=rientjes@google.com \
--cc=sandipan@linux.ibm.com \
--cc=shuah@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).