Re: CVE-2024-36000: mm/hugetlb: fix missing hugetlb_lock for resv uncharge

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>,
	Oscar Salvador <OSalvador@suse.com>,
	cve@kernel.org, linux-kernel@vger.kernel.org,
	linux-cve-announce@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: CVE-2024-36000: mm/hugetlb: fix missing hugetlb_lock for resv uncharge
Date: Thu, 23 May 2024 11:42:30 -0400	[thread overview]
Message-ID: <Zk9j5j-VSAOWrmg7@x1n> (raw)
In-Reply-To: <Zk8bgS8IboS-7jQw@localhost.localdomain>

On Thu, May 23, 2024 at 12:33:37PM +0200, Oscar Salvador wrote:
> On the theoretical part:
> 
> And we could have
> 
>      CPU0                                   CPU1
>  dequeue_huge_page_vma
>   dequeue_huge_page_node
>    move_page_to_active_list
>  release_lock
>                                            hugetlb_cgroup_pre_destroy
>                                             for_each_page_in_active_list
>                                             <-- pages empty cgroups are skipped -->
>                                              hugetlb_cgroup_move_parent
>                                              move_page_to_parent
>  hugetlb_cgroup_commit_charge <-- too late
>   page[2].lru.next = (void *)h_cg;

Would this happen with/without the patch?  IIUC the patch didn't change
this path yet on hugetlb_cgroup_commit_charge(), and AFAIU the release_lock
is always covering the commit charge, with/without my patch:

	spin_lock_irq(&hugetlb_lock);
	folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
        ...
	hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio);
	if (deferred_reserve) {
		hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h),
						  h_cg, folio);
	}
	spin_unlock_irq(&hugetlb_lock);

What the previous patch changed, IMHO, is when the rare race happened first
on reservation, and I think Mike used to describe that with a rich comment,
which can be against a concurrent hugetlb_reserve_pages():

	if (unlikely(map_chg > map_commit)) {
		/*
		 * The page was added to the reservation map between
		 * vma_needs_reservation and vma_commit_reservation.
		 * This indicates a race with hugetlb_reserve_pages.
		 * Adjust for the subpool count incremented above AND
		 * in hugetlb_reserve_pages for the same page.  Also,
		 * the reservation count added in hugetlb_reserve_pages
		 * no longer applies.
		 */
		long rsv_adjust;

		rsv_adjust = hugepage_subpool_put_pages(spool, 1);
		hugetlb_acct_memory(h, -rsv_adjust);
		if (deferred_reserve) {
			spin_lock_irq(&hugetlb_lock);
			hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
					pages_per_huge_page(h), folio);
			spin_unlock_irq(&hugetlb_lock);
		}
	}

This should be after the point of hugetlb_cgroup_commit_charge(), and when
without the lock the problem is we can have concurrent accessor / updater
to the memcg.

However here after a 2nd look I don't even see at least the css offliner
would update the _hugetlb_cgroup_rsvd at all here.. so I'm not sure whether
a race could happen.  I meant, hugetlb_cgroup_move_parent() doesn't even
touch _hugetlb_cgroup_rsvd which is the object that can race.  It only
does:

	set_hugetlb_cgroup(folio, parent);

While in this case it's only about _hugetlb_cgroup.

It's pretty confusing to me here, doesn't it mean that when someone offline
a child_cg here we'll still leave the folio's _hugetlb_cgroup_rsvd pointing
to it, even if _hugetlb_cgroup starting to point to parent?...  I was
expecting hugetlb_cgroup_move_parent() also move the rsvd cg here too.

The other thing is, when hugetlb_cgroup_move_parent() does the cg movement,
does it need to css_put() ref on the child cg and css_tryget() on the
parent, just like what we did in __hugetlb_cgroup_charge_cgroup(), at least
for _hugetlb_cgroup?

I really don't know enough on these areas to tell, perhaps I missed
something.  But maybe any of you may have some idea..  In general, I think
besides LOCKDEP the lock is definitely needed to at least make sure things
like:

	__set_hugetlb_cgroup(folio, NULL, rsvd);
	page_counter_uncharge(),

To be an atomic op, otherwise two threads can see the old memcg
concurrently and maybe they'll uncharge counter twice.  But again currently
I don't know how that can be triggered if hugetlb_cgroup_move_parent() is
not even touching resv cg..

Thanks,

-- 
Peter Xu

next prev parent reply	other threads:[~2024-05-23 15:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-20  9:48 CVE-2024-36000: mm/hugetlb: fix missing hugetlb_lock for resv uncharge Greg Kroah-Hartman
2024-05-20 15:14 ` Michal Hocko
2024-05-21 19:38   ` Peter Xu
2024-05-23  7:30     ` Michal Hocko
2024-05-23 10:33       ` Oscar Salvador
2024-05-23 13:08         ` Michal Hocko
2024-05-23 15:42         ` Peter Xu [this message]
2024-06-14 11:48           ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zk9j5j-VSAOWrmg7@x1n \
    --to=peterx@redhat.com \
    --cc=OSalvador@suse.com \
    --cc=cve@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-cve-announce@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.