linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Bert Karwatzki <spasswolf@web.de>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Andrew Morton	 <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
		spassowlf@web.de
Subject: Re: [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region()
Date: Sat, 05 Oct 2024 02:56:01 +0200	[thread overview]
Message-ID: <088a3541b85b783ef68337bd4bb790d62f200dfa.camel@web.de> (raw)
In-Reply-To: <3b83746d-b73e-4b89-9b74-5aba5a827f45@lucifer.local>

Am Freitag, dem 04.10.2024 um 23:41 +0100 schrieb Lorenzo Stoakes:
> On Fri, Oct 04, 2024 at 11:35:44AM +0200, Bert Karwatzki wrote:
> > Here's the log procduced by this kernel:
> >
> > c9e7f76815d3 (HEAD -> maple_tree_debug_4) hack: set of info stuff v5
> > 7e3bb072761a mm: correct error handling in mmap_region()
> > 77df9e4bb222 (tag: next-20241001, origin/master, origin/HEAD, master) Add linux-next specific files for 20241001
> >
> > Again it took two attempts to trigger the bug.
> >
> > Bert Karwatzki
> >
>
> Sending an updated, cleaned up version of the patch with a lot of
> explanation. This is functionally identical to the v3 fix I already sent so
> you can try that or this to confirm it resolves your issue.
>
> If you are able to do so, I can submit this to upstream for a hotfix. If
> not, well then back to the drawing board and I'd be very shocked :)
>
> I have been able to reproduce the issue locally in our userland testing
> suite entirely consistently, and this patch resolves the issue and also
> continues to pass all maple tree unit tests.
>
> Again thank you so much for all your help - I hope you are able to find a
> spare moment to quickly give this one a try and confirm whether it does
> indeed address the problem you've reported.
>
> Thanks, Lorenzo
>
> ----8<----
> From 126d65bd9839cd3ec941007872b357e27fd56066 Mon Sep 17 00:00:00 2001
> From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Date: Fri, 4 Oct 2024 15:18:58 +0100
> Subject: [PATCH] maple_tree: correct tree corruption on spanning store
>
> Writing a data range into a maple tree may involve overwriting a number of
> existing entries that span across more than one node. Doing so invokes a
> 'spanning' store.
>
> Performing a spanning store across two leaf nodes in a maple tree in which
> entries are overwritten is achieved by first initialising a 'big' node,
> which will store the coalesced entries between the two nodes comprising
> entries prior to the newly stored entry, the newly stored entry, and
> subsequent entries.
>
> This 'big node' is then merged back into the tree and the tree is
> rebalanced, replacing the entries across the spanned nodes with those
> contained in the big node.
>
> The operation is performed in mas_wr_spanning_store() which starts by
> establishing two maple tree state objects ('mas' objects) on the left of
> the range and on the right (l_mas and r_mas respectively).
>
> l_mas traverses to the beginning of the range to be stored in order to copy
> the data BEFORE the requested store into the big node.
>
> We then insert our new entry immediately afterwards (both the left copy and
> the storing of the new entry are combined and performed by
> mas_store_b_node()).
>
> r_mas traverses to the populated slot immediately after, in order to copy
> the data AFTER the requested store into the big node.
>
> This copy of the right-hand node is performed by mas_mab_cp() as long as
> r_mas indicates that there's data to copy, i.e. r_mas.offset <= r_mas.end.
>
> We traverse r_mas to this position in mas_wr_node_walk() using a simple
> loop:
>
> 	while (offset < count && mas->index > wr_mas->pivots[offset])
> 		offset++;
>
> Note here that count is determined to be the (inclusive) index of the last
> node containing data in the node as determined by ma_data_end().
>
> This means that even in searching for mas->index, which will have been set
> to one plus the end of the target range in order to traverse to the next
> slot in mas_wr_spanning_store(), we will terminate the iteration at the end
> of the node range even if this condition is not met due to the offset <
> count condition.
>
> The fact this right hand node contains the end of the range being stored is
> why we are traversing it, and this loop is why we appear to discover a
> viable range within the right node to copy to the big one.
>
> However, if the node that r_mas traverses contains a pivot EQUAL to the end
> of the range being stored, and this is the LAST pivot contained within the
> node, something unexpected happens:
>
> 1. The l_mas traversal copy and insertion of the new entry in the big node
>    is performed via mas_store_b_node() correctly.
>
> 2. The traversal performed by mas_wr_node_walk() means our r_mas.offset is
>    set to the offset of the entry equal to the end of the range we store.
>
> 3. We therefore copy this DUPLICATE of the final pivot into the big node,
>    and insert this DUPLICATE entry, alongside its invalid slot entry
>    immediately after the newly inserted entry.
>
> 4. The big node containing this duplicated is inserted into the tree which
>    is rebalanced, and therefore the maple tree becomes corrupted.
>
> Note that if the right hand node had one or more entries with pivots of
> greater value than the end of the stored range, this would not happen. If
> it contained entries with pivots of lesser value it would not be the right
> node in this spanning store.
>
> This appears to have been at risk of happening throughout the maple tree's
> history, however it seemed significantly less likely to occur until
> recently.
>
> The balancing of the tree seems to have made it unlikely that you would
> happen to perform a store that both spans two nodes AND would overwrite
> precisely the entry with the largest pivot in the right-hand node which
> contains no further larger pivots.
>
> The work performed in commit f8d112a4e657 ("mm/mmap: avoid zeroing vma tree
> in mmap_region()") seems to have made the probability of this event much
> more likely.
>
> Previous to this change, MAP_FIXED mappings which were overwritten would
> first be cleared before any subsequent store or importantly - merge of
> surrounding entries - would be performed.
>
> After this change, this is no longer the case, and this means that, in the
> worst case, a number of entries might be overwritten in combination with a
> merge (and subsequent overwriting expansion) between both the prior entry
> AND a subsequent entry.
>
> The motivation for this change arose from Bert Karwatzki's report of
> encountering mm instability after the release of kernel v6.12-rc1 which,
> after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
> options, was identified as maple tree corruption.
>
> After Bert very generously provided his time and ability to reproduce this
> event consistently, I was able to finally identify that the issue discussed
> in this commit message was occurring for him.
>
> The solution implemented in this patch is:
>
> 1. Adjust mas_wr_walk_index() to return a boolean value indicating whether
>    the containing node is actually populated with entries possessing pivots
>    equal to or greater than mas->index.
>
> 2. When traversing the right node in mas_wr_spanning_store(), use this
>    value to determine whether to try to copy from the right node - if it is
>    not populated, then do not do so.
>
> This passes all maple tree unit tests and resolves the reported bug.
> ---
>  lib/maple_tree.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 37abf0fe380b..e6f0da908ba7 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -2194,6 +2194,8 @@ static inline void mas_node_or_none(struct ma_state *mas,
>
>  /*
>   * mas_wr_node_walk() - Find the correct offset for the index in the @mas.
> + *                      If @mas->index cannot be found within the containing
> + *                      node, we traverse to the last entry in the node.
>   * @wr_mas: The maple write state
>   *
>   * Uses mas_slot_locked() and does not need to worry about dead nodes.
> @@ -3527,6 +3529,12 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
>  	return true;
>  }
>
> +/*
> + * Traverse the maple tree until the offset of mas->index is reached.
> + *
> + * Return: Is this node actually populated with entries possessing pivots equal
> + *         to or greater than mas->index?
> + */
>  static bool mas_wr_walk_index(struct ma_wr_state *wr_mas)
>  {
>  	struct ma_state *mas = wr_mas->mas;
> @@ -3535,8 +3543,11 @@ static bool mas_wr_walk_index(struct ma_wr_state *wr_mas)
>  		mas_wr_walk_descend(wr_mas);
>  		wr_mas->content = mas_slot_locked(mas, wr_mas->slots,
>  						  mas->offset);
> -		if (ma_is_leaf(wr_mas->type))
> -			return true;
> +		if (ma_is_leaf(wr_mas->type)) {
> +			unsigned long pivot = wr_mas->pivots[mas->offset];
> +
> +			return pivot == 0 || mas->index <= pivot;
> +		}
>  		mas_wr_walk_traverse(wr_mas);
>
>  	}
> @@ -3696,6 +3707,7 @@ static noinline void mas_wr_spanning_store(struct ma_wr_state *wr_mas)
>  	struct maple_big_node b_node;
>  	struct ma_state *mas;
>  	unsigned char height;
> +	bool r_populated;
>
>  	/* Left and Right side of spanning store */
>  	MA_STATE(l_mas, NULL, 0, 0);
> @@ -3737,7 +3749,7 @@ static noinline void mas_wr_spanning_store(struct ma_wr_state *wr_mas)
>  		r_mas.last++;
>
>  	r_mas.index = r_mas.last;
> -	mas_wr_walk_index(&r_wr_mas);
> +	r_populated = mas_wr_walk_index(&r_wr_mas);
>  	r_mas.last = r_mas.index = mas->last;
>
>  	/* Set up left side. */
> @@ -3761,7 +3773,7 @@ static noinline void mas_wr_spanning_store(struct ma_wr_state *wr_mas)
>  	/* Copy l_mas and store the value in b_node. */
>  	mas_store_b_node(&l_wr_mas, &b_node, l_mas.end);
>  	/* Copy r_mas into b_node. */
> -	if (r_mas.offset <= r_mas.end)
> +	if (r_populated && r_mas.offset <= r_mas.end)
>  		mas_mab_cp(&r_mas, r_mas.offset, r_mas.end,
>  			   &b_node, b_node.b_end + 1);
>  	else
> --
> 2.46.2

I just tested this and it passed ten tests (i.e. upgrading the proton version i
steam) in a row.

Bert Karwatzki


  reply	other threads:[~2024-10-05  0:56 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-04  9:35 [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region() Bert Karwatzki
2024-10-04  9:58 ` Lorenzo Stoakes
2024-10-04 14:23 ` Lorenzo Stoakes
2024-10-04 14:26   ` Lorenzo Stoakes
2024-10-04 14:32     ` Lorenzo Stoakes
2024-10-04 14:58       ` Lorenzo Stoakes
2024-10-04 22:41 ` Lorenzo Stoakes
2024-10-05  0:56   ` Bert Karwatzki [this message]
2024-10-05  6:21     ` Lorenzo Stoakes
2024-10-05  8:57       ` Bert Karwatzki
2024-10-05 11:11         ` Lorenzo Stoakes
  -- strict thread matches above, loose matches on Subject: below --
2024-10-13 22:35 Bert Karwatzki
2024-10-14  9:46 ` Lorenzo Stoakes
2024-10-16 10:28   ` Bert Karwatzki
2024-10-16 11:16     ` Lorenzo Stoakes
2024-10-16 14:13     ` Liam R. Howlett
2024-10-04  8:51 Bert Karwatzki
2024-10-04  8:59 ` Lorenzo Stoakes
2024-10-03 17:07 Bert Karwatzki
2024-10-03 17:24 ` Lorenzo Stoakes
2024-10-03 19:32 ` Lorenzo Stoakes
2024-10-04  8:36 ` Lorenzo Stoakes
2024-10-03 13:09 Bert Karwatzki
2024-10-03 13:34 ` Lorenzo Stoakes
2024-10-03 10:51 Bert Karwatzki
2024-10-03 11:17 ` Lorenzo Stoakes
2024-10-03 10:41 Bert Karwatzki
2024-10-03 10:46 ` Lorenzo Stoakes
2024-10-03  8:59 Bert Karwatzki
2024-10-03  9:04 ` Lorenzo Stoakes
2024-10-03  9:27 ` Lorenzo Stoakes
2024-10-02 22:58 Bert Karwatzki
2024-10-03  7:43 ` Lorenzo Stoakes
2024-10-02 22:57 Bert Karwatzki
2024-10-03  8:06 ` Lorenzo Stoakes
2024-10-02 21:58 Bert Karwatzki
2024-10-02 21:48 Bert Karwatzki
2024-10-02 21:41 Bert Karwatzki
     [not found] <20241002105131.4545-1-spasswolf@web.de>
2024-10-02 11:19 ` Lorenzo Stoakes
2024-10-01  2:34 Bert Karwatzki
2024-10-01  8:02 ` Lorenzo Stoakes
2024-10-01  8:38   ` Bert Karwatzki
2024-10-01  8:49     ` Lorenzo Stoakes
2024-10-01  8:55       ` Bert Karwatzki
2024-10-01  8:59         ` Lorenzo Stoakes
2024-10-01  9:10           ` Bert Karwatzki
2024-10-01  9:20             ` Lorenzo Stoakes
2024-10-01  9:49               ` Lorenzo Stoakes
2024-10-01  9:57                 ` Bert Karwatzki
2024-10-01 10:02                   ` Lorenzo Stoakes
2024-10-01 10:22                     ` Bert Karwatzki
2024-10-01 10:33                       ` Lorenzo Stoakes
2024-10-01 10:42                         ` Bert Karwatzki
2024-10-01 11:23                           ` Lorenzo Stoakes
2024-10-01 11:56 ` Lorenzo Stoakes
2024-10-01 16:43   ` Bert Karwatzki
2024-10-01 18:01     ` Lorenzo Stoakes
2024-10-02  8:39       ` Lorenzo Stoakes
2024-10-02  8:48         ` Lorenzo Stoakes
2024-10-02 12:13 ` Lorenzo Stoakes
2024-10-02 13:23   ` Lorenzo Stoakes
2024-10-02 16:13     ` Bert Karwatzki
2024-10-02 17:19       ` Lorenzo Stoakes
2024-10-02 18:28         ` Lorenzo Stoakes
2024-10-02 18:54           ` Lorenzo Stoakes
2024-10-02 20:06           ` Bert Karwatzki
2024-10-02 20:22             ` Lorenzo Stoakes
2024-10-02 20:39               ` Bert Karwatzki
2024-10-02 20:44                 ` Lorenzo Stoakes
2024-10-02 21:13                   ` Lorenzo Stoakes
2024-08-30  4:00 [PATCH v8 00/21] Avoid MAP_FIXED gap exposure Liam R. Howlett
2024-08-30  4:00 ` [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region() Liam R. Howlett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=088a3541b85b783ef68337bd4bb790d62f200dfa.camel@web.de \
    --to=spasswolf@web.de \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=spassowlf@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).