From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28E01C52D7F for ; Thu, 15 Aug 2024 21:53:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF8B18D000D; Thu, 15 Aug 2024 17:53:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA7A18D000B; Thu, 15 Aug 2024 17:53:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 920F28D000D; Thu, 15 Aug 2024 17:53:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 75E718D000B for ; Thu, 15 Aug 2024 17:53:28 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D709FA17B1 for ; Thu, 15 Aug 2024 21:53:27 +0000 (UTC) X-FDA: 82455831654.07.B429708 Received: from mail-oi1-f171.google.com (mail-oi1-f171.google.com [209.85.167.171]) by imf27.hostedemail.com (Postfix) with ESMTP id 0584940013 for ; Thu, 15 Aug 2024 21:53:25 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="jc7rFm/z"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.167.171 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723758724; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8zJpy/IyCPqQzjs+ERcVL5gW6xIYZ4aJwCr3Umsn95I=; b=2ZsPebjAJkE46dK57BWa4uwrXzyyy9vrMbdne5SNe49Y4vXlT6rign5INT7KAsc8SXzd1H Us1Vy/YizJXgzpyTEsm9LkPEFMKt4mClv6s5boaHjkW4yaEJhOZ45McjIhXKTflOl7xHlK JG+GzexZ7VGaiwMltWOqh2i9tEo+CaE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723758724; a=rsa-sha256; cv=none; b=xYveo4FCtZMdcXkcOhT0ZozvoXa7CeFVl7dAmoT7y9Th+R6B+bKWYoPpCrZM1XByTmJ/OX TDh5C2aOJ/zY3v0oYyfVGNIxXklRsMB6JBbhi2dFF6lkV979MswDtR17j95slko0E2hv8e 54mldZCAAx6/6aFJFquf5KWAgouc02s= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="jc7rFm/z"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.167.171 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-oi1-f171.google.com with SMTP id 5614622812f47-3db51133978so873162b6e.3 for ; Thu, 15 Aug 2024 14:53:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723758805; x=1724363605; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8zJpy/IyCPqQzjs+ERcVL5gW6xIYZ4aJwCr3Umsn95I=; b=jc7rFm/zzwILwDRmz1pd8vfr8jXbcGjTHC9m5Rwt519mXB77a2uEsVxUCS4TeXLldA aylehLHFz6MXXFoM9oG4g8fF7n7rWSrCRD+VKzBenp66bSTpsEcj/f536wzyDcSL9AXK Gpe7M8c74eYiAqi5mxekstocEkp38mBmDAqsG0yHntY47g4vSbg9ShH+t6lGr3b+C6sa 8BLHo7nPiELnDOpunWPyWNwyX4UGo0nSiJ6bmcBNbJNwFknt6rlkVpxw2A7InvuB6rvR qxoxetX/D1H4QMnp8LzuwCrBjYih7B5Fvr1pP0LlYNPw+CXF+NaQkYQt3YowthCCPLPz kEPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723758805; x=1724363605; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8zJpy/IyCPqQzjs+ERcVL5gW6xIYZ4aJwCr3Umsn95I=; b=Q6hwk52ZQxeNLURKR+bt6uwgIpxPvvdl8w+w5TMM66Uk/bmjJNIFeo7oeu09muTW/U BV/UrJE5mp4H74YV9h9XqW/dkLlBAnS5TGY1DrDXiUpqRjeNjAB8CmV4aUFS1e7Cq7rZ 31DSC6owi+ZRCRSwy+jABIvzZKG96d8sttIQg3jY2KOO0A8T4SQku1ijgrKTogZZMpT6 4Ncq/XIMu+MIqFIalsU5f1jiFtAtrdqGf1a2qw+n41gUDT6qLqhx1wzXfSwghHb5hApl 6kwIhIAGdovkzMieBJivslpQJXOvrSAphbSOeUden8ab9t4dXfR0gbz7HJR8fbzr5FH4 8/Ww== X-Forwarded-Encrypted: i=1; AJvYcCXxeJF27JOV1Z2Ya2KAI5UdJ1zE9JeD9bkfLABHtDVTGGR8vrkRtbwSt0NSctnTmKyow9MaO9QzgNkwpKiMEKiUzsg= X-Gm-Message-State: AOJu0Yyl8w0JL+KXIkgndlPdLEynoCdwAo8/yFBc2PjUb2uumEPpxn+s sN1LhUjzSA58DnsTRx2UM6YsE+t3kDv5JFCj5Yz3s86gCy0Gn63L X-Google-Smtp-Source: AGHT+IGOW9B3upBTZIdKzWpP0/qRNUf6283GowEY8DUKzjuipp8t5d3FT3uRQsiJ3fEXUOoxYFPvlw== X-Received: by 2002:a05:6808:120d:b0:3d6:2b42:82ff with SMTP id 5614622812f47-3dd3ae2a516mr809151b6e.37.1723758804767; Thu, 15 Aug 2024 14:53:24 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7c6b63694a1sm1593939a12.92.2024.08.15.14.53.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Aug 2024 14:53:24 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: chrisl@kernel.org Cc: akpm@linux-foundation.org, david@redhat.com, hughd@google.com, justinjiang@vivo.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, ryan.roberts@arm.com, v-songbaohua@oppo.com, ying.huang@intel.com, yosryahmed@google.com Subject: Re: [PATCH v3 2/2] mm: attempt to batch free swap entries for zap_pte_range() Date: Fri, 16 Aug 2024 09:53:08 +1200 Message-Id: <20240815215308.55233-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0584940013 X-Stat-Signature: s3c3ps9y63rcrt4iecoobwtsieof8sbw X-Rspam-User: X-HE-Tag: 1723758805-437597 X-HE-Meta: U2FsdGVkX1/bjnMrc8x41RjpfSYjvuedPwxI1gpAk3Cl78NNmXwsVKf2wkC2jWBZqSFvP/SmaM5qrciNglVxduFM3hmOfW+4bzSijYrqXoij/gyRaRfFLG/YAnTMrDs00vSf8A9CoPHIw9t6OuxoofJB8MjJCTigUjXmXHI474lfOR4YM1wAM49/3jnjicd0OkgAAVDhbx6y5KovFd+OyB5D8rxtHt4gwXfIS6PcZkYV3XeCRrjpbIX3meRQ98c1aN9sSwR5zWoJpnm5PItkOt4P2lmWzLgEqa0M4mTfG3BCxis/gykz0d3eWVwROzilt1i2aabvGDOzsSsm0vW8AsjZ3LfD1EzUG+OaMSnSlg22WVUbTFGRj9GO29lmdDxaAUkctC9joRdc8E1fpii8bZr+FQ/o1UDEFjGsdlhkjZT754Op2vnZD8b5aRPN/ZBqOW/+t9rlwQnjd3HwgR7wWVdCqitmIjxPsvjZVW0T2qIF/JU5J/lhJGcm4cBwL5TPDmN/H+9mwWpS3EWv3TTyRRLZ5F6hJ68k+ldEJ5/0dcdc+SJ9p33fqkCpUu25fbFQ8vz1TvAdP5Nw/BSx3iAi8JNrNq0dn1hz1Qn01l1JdipttCTHDbpHhdbUk5+DJe+MR07fiXoRnI71UNN1cvfZ1hbecvwW7q4MivM3x5l9cvOwg9YCZ1apmJ0Gsp62abucsUR/sMJRXJ8rBQquZk6Md4tg66GiOX9jczR463OXxDtnyTwBnpGR6OiUumRjwTFKG8/o7Lpy8/pXuW3A2/kgMhGb34EXayH+W2++KrLojrrpD4yfQJJFeAsH+MmeRUBEJ+9uY0qiPVH1CVgO+s3odCg1Iz2aQMKRzQUX+8GaO28Wer9IsQQE3vKnHkdQGEd5GGIPf9s/s794CvMd+SnCpTYAuwWa3Dg81jlp6EaJ7XnTiw4b/U8HEfjZCAecqfQ09QQzx8fHV0zV16Gu4k1 OA6JxnB9 PBSmW+E20wOwpZafTby8dzGRKqL8JTLtC2zx2i60UJ6rW0L3KNpFtRds6FFUO0QW2VGSExHZKEbskKB/wAarlzTuVjkbSwNS2nU+QVfDB1Yj0qOTUbFhKI++TPlIkatZZsLTYKPBv60ziK2fqIhj+LdJn5w7udZ67uQT7n0MwBC/zaGT1X3o9rory7BOyadw8gHKW2aYo+m/kNQeSrB7atULDCHM5WwAu6YKpc0xVGnIQJtAFJdc/itov4gIaY3xsMA7pcfVR6xLKP63/bn1aPi2XpMbGy9vn80Coa56h7jWOkOweuYttwZe6+P3C+j5kfwwcVs2XKash6/712oATiV7bcNm2mKvAXZdTpih7xR21OFwTsO1pkptQw/CXSEGvM+r31WA/L0CCaJz0V3vb7SDaN+6M7rFzpkRQeA6FN/p8+aE+NyJUdjkYLRrHKUQey1NLUrkvfgZCylV4hdo6+x0WgKEHvGh/AYimY1j75PpVpZJ/TmGksXuO5Pvudqd/fQISRhSmnaKzQVvS6sFmfnYG71PG2GzWrwWfZR6IHI++LROAnzfBjRlDpmKQ5oHzq50tueOCHc/uPclguykxk1R6i0zTe4l4laGf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 16, 2024 at 6:29 AM Chris Li wrote: > > Hi Barry, > > We got a crash report from syzbot that has been bisect into this change. > > +static bool __swap_entries_free(struct swap_info_struct *si, > > +               swp_entry_t entry, int nr) > > +{ > > +       unsigned long offset = swp_offset(entry); > > +       unsigned int type = swp_type(entry); > > +       struct swap_cluster_info *ci; > > +       bool has_cache = false; > > +       unsigned char count; > > +       int i; > > + > > +       if (nr <= 1 || swap_count(data_race(si->swap_map[offset])) != 1) > > +               goto fallback; > > +       /* cross into another cluster */ > > +       if (nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER) > > +               goto fallback; > > + > > +       ci = lock_cluster_or_swap_info(si, offset); > > +       if (!swap_is_last_map(si, offset, nr, &has_cache)) { > > +               unlock_cluster_or_swap_info(si, ci); > > +               goto fallback; > > +       } > > +       for (i = 0; i < nr; i++) > > +               WRITE_ONCE(si->swap_map[offset + i], SWAP_HAS_CACHE); > > +       unlock_cluster_or_swap_info(si, ci); > > + > > +       if (!has_cache) { > > +               spin_lock(&si->lock); > > +               swap_entry_range_free(si, entry, nr); > > Here it calls swap_entry_range_free() to free a range of the swap > entry. However the swap_entry_range_free() has the assumption that all > entries belong to the same folio and charge to the same memcg. > It eventually pass down to swap_cgroup_record(), which BUG on this line: > > VM_BUG_ON(sc->id != old); > > The root cause is that the swap entries are not from the same memcg. > Thankos Yosry for finding the root cause. > > > +               spin_unlock(&si->lock); > > +       } > > +       return has_cache; > > + > > +fallback: > > +       for (i = 0; i < nr; i++) { > > +               if (data_race(si->swap_map[offset + i])) { > > +                       count = __swap_entry_free(si, swp_entry(type, offset + i)); > > +                       if (count == SWAP_HAS_CACHE) > > +                               has_cache = true; > > +               } else { > > +                       WARN_ON_ONCE(1); > > +               } > > +       } > > +       return has_cache; > > +} > > + > >  /* > >   * Drop the last HAS_CACHE flag of swap entries, caller have to > >   * ensure all entries belong to the same cgroup. > > @@ -1792,11 +1856,9 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr) > >  { > >         const unsigned long start_offset = swp_offset(entry); > >         const unsigned long end_offset = start_offset + nr; > > -       unsigned int type = swp_type(entry); > >         struct swap_info_struct *si; > >         bool any_only_cache = false; > >         unsigned long offset; > > -       unsigned char count; > > > >         if (non_swap_entry(entry)) > >                 return; > > @@ -1811,15 +1873,7 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr) > >         /* > >          * First free all entries in the range. > >          */ > > -       for (offset = start_offset; offset < end_offset; offset++) { > > -               if (data_race(si->swap_map[offset])) { > > -                       count = __swap_entry_free(si, swp_entry(type, offset)); > > -                       if (count == SWAP_HAS_CACHE) > > -                               any_only_cache = true; > > -               } else { > > -                       WARN_ON_ONCE(1); > > -               } > > -       } > > +       any_only_cache = __swap_entries_free(si, entry, nr); > > Here we are just doing a page table walk, there is no guarantee the > 'nr' number of swap entries came from the same folio and previously > charged to the same memcg. The swap_pte_batch() only checks they are > the same swap type, does not check they charge to the same memcg. > Sorry for the trouble, thanks for the report, Yosry & Chris. Does the below fix the problem? otherwise, we might remove the assumption all swaps must belong to one swap_cgroup in batch free? >From c68e0d780ba808da4bb682b753e3fa77c4f96e13 Mon Sep 17 00:00:00 2001 From: Barry Song Date: Fri, 16 Aug 2024 09:36:23 +1200 Subject: [PATCH] mm: check all swaps belong to same swap_cgroup in swap_pte_batch() Right now, it is possible two folios are contiguous in swap slots but they don't belong to one memcg. In this case, even we return a large nr, we can't really batch free all slots. Reported-by: Yosry Ahmed Reported-by: Chris Li Signed-off-by: Barry Song --- mm/internal.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index adbf8c88c9df..d1f1e221212d 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -15,6 +15,7 @@ #include #include #include +#include #include /* Internal core VMA manipulation functions. */ @@ -275,18 +276,22 @@ static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte) { pte_t expected_pte = pte_next_swp_offset(pte); const pte_t *end_ptep = start_ptep + max_nr; + swp_entry_t entry = pte_to_swp_entry(pte); pte_t *ptep = start_ptep + 1; + unsigned short cgroup_id; VM_WARN_ON(max_nr < 1); VM_WARN_ON(!is_swap_pte(pte)); - VM_WARN_ON(non_swap_entry(pte_to_swp_entry(pte))); + VM_WARN_ON(non_swap_entry(entry)); + cgroup_id = lookup_swap_cgroup_id(entry); while (ptep < end_ptep) { pte = ptep_get(ptep); if (!pte_same(pte, expected_pte)) break; - + if (lookup_swap_cgroup_id(pte_to_swp_entry(pte)) != cgroup_id) + break; expected_pte = pte_next_swp_offset(expected_pte); ptep++; } -- 2.34.1 > Chris > > > > >         /* > >          * Short-circuit the below loop if none of the entries had their > > -- > > 2.34.1 > > Thanks Barry