From: Peter Xu <peterx@redhat.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>,
Oscar Salvador <OSalvador@suse.com>,
cve@kernel.org, linux-kernel@vger.kernel.org,
linux-cve-announce@vger.kernel.org,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: CVE-2024-36000: mm/hugetlb: fix missing hugetlb_lock for resv uncharge
Date: Thu, 23 May 2024 11:42:30 -0400 [thread overview]
Message-ID: <Zk9j5j-VSAOWrmg7@x1n> (raw)
In-Reply-To: <Zk8bgS8IboS-7jQw@localhost.localdomain>
On Thu, May 23, 2024 at 12:33:37PM +0200, Oscar Salvador wrote:
> On the theoretical part:
>
> And we could have
>
> CPU0 CPU1
> dequeue_huge_page_vma
> dequeue_huge_page_node
> move_page_to_active_list
> release_lock
> hugetlb_cgroup_pre_destroy
> for_each_page_in_active_list
> <-- pages empty cgroups are skipped -->
> hugetlb_cgroup_move_parent
> move_page_to_parent
> hugetlb_cgroup_commit_charge <-- too late
> page[2].lru.next = (void *)h_cg;
Would this happen with/without the patch? IIUC the patch didn't change
this path yet on hugetlb_cgroup_commit_charge(), and AFAIU the release_lock
is always covering the commit charge, with/without my patch:
spin_lock_irq(&hugetlb_lock);
folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
...
hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio);
if (deferred_reserve) {
hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h),
h_cg, folio);
}
spin_unlock_irq(&hugetlb_lock);
What the previous patch changed, IMHO, is when the rare race happened first
on reservation, and I think Mike used to describe that with a rich comment,
which can be against a concurrent hugetlb_reserve_pages():
if (unlikely(map_chg > map_commit)) {
/*
* The page was added to the reservation map between
* vma_needs_reservation and vma_commit_reservation.
* This indicates a race with hugetlb_reserve_pages.
* Adjust for the subpool count incremented above AND
* in hugetlb_reserve_pages for the same page. Also,
* the reservation count added in hugetlb_reserve_pages
* no longer applies.
*/
long rsv_adjust;
rsv_adjust = hugepage_subpool_put_pages(spool, 1);
hugetlb_acct_memory(h, -rsv_adjust);
if (deferred_reserve) {
spin_lock_irq(&hugetlb_lock);
hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
pages_per_huge_page(h), folio);
spin_unlock_irq(&hugetlb_lock);
}
}
This should be after the point of hugetlb_cgroup_commit_charge(), and when
without the lock the problem is we can have concurrent accessor / updater
to the memcg.
However here after a 2nd look I don't even see at least the css offliner
would update the _hugetlb_cgroup_rsvd at all here.. so I'm not sure whether
a race could happen. I meant, hugetlb_cgroup_move_parent() doesn't even
touch _hugetlb_cgroup_rsvd which is the object that can race. It only
does:
set_hugetlb_cgroup(folio, parent);
While in this case it's only about _hugetlb_cgroup.
It's pretty confusing to me here, doesn't it mean that when someone offline
a child_cg here we'll still leave the folio's _hugetlb_cgroup_rsvd pointing
to it, even if _hugetlb_cgroup starting to point to parent?... I was
expecting hugetlb_cgroup_move_parent() also move the rsvd cg here too.
The other thing is, when hugetlb_cgroup_move_parent() does the cg movement,
does it need to css_put() ref on the child cg and css_tryget() on the
parent, just like what we did in __hugetlb_cgroup_charge_cgroup(), at least
for _hugetlb_cgroup?
I really don't know enough on these areas to tell, perhaps I missed
something. But maybe any of you may have some idea.. In general, I think
besides LOCKDEP the lock is definitely needed to at least make sure things
like:
__set_hugetlb_cgroup(folio, NULL, rsvd);
page_counter_uncharge(),
To be an atomic op, otherwise two threads can see the old memcg
concurrently and maybe they'll uncharge counter twice. But again currently
I don't know how that can be triggered if hugetlb_cgroup_move_parent() is
not even touching resv cg..
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2024-05-23 15:42 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-20 9:48 CVE-2024-36000: mm/hugetlb: fix missing hugetlb_lock for resv uncharge Greg Kroah-Hartman
2024-05-20 15:14 ` Michal Hocko
2024-05-21 19:38 ` Peter Xu
2024-05-23 7:30 ` Michal Hocko
2024-05-23 10:33 ` Oscar Salvador
2024-05-23 13:08 ` Michal Hocko
2024-05-23 15:42 ` Peter Xu [this message]
2024-06-14 11:48 ` Oscar Salvador
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zk9j5j-VSAOWrmg7@x1n \
--to=peterx@redhat.com \
--cc=OSalvador@suse.com \
--cc=cve@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=linux-cve-announce@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.