* [QUESTION] about the maple tree and current status of mmap_lock scalability @ 2022-12-28 12:48 Hyeonggon Yoo 2022-12-28 17:10 ` Suren Baghdasaryan 2022-12-28 20:50 ` Matthew Wilcox 0 siblings, 2 replies; 14+ messages in thread From: Hyeonggon Yoo @ 2022-12-28 12:48 UTC (permalink / raw) To: linux-mm, liam.howlett, willy, surenb, ldufour, michel, vbabka, linux-kernel Hello mm folks, I have a few questions about the current status of mmap_lock scalability. ============================================================= What is currently causing the kernel to use mmap_lock to protect the maple tree? ============================================================= I understand that the long-term goal is to remove the need for mmap_lock in readers while traversing the maple tree, using techniques such as RCU or SPF. What is the biggest obstacle preventing this from being achieved at this time? ================================================== How does the maple tree provide RCU-safe manipulation of VMAs? ================================================== Is it similar to the approach suggested in the RCUVM paper (replacing the original root node with a new root node that shares most of its nodes and deferring the freeing of stale nodes using RCU)? I'm having difficulty understanding the design of the maple tree in this regard. [RCUVM paper] https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf Thank you for your time. --- Hyeonggon ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2022-12-28 12:48 [QUESTION] about the maple tree and current status of mmap_lock scalability Hyeonggon Yoo @ 2022-12-28 17:10 ` Suren Baghdasaryan 2022-12-29 11:33 ` Hyeonggon Yoo 2022-12-28 20:50 ` Matthew Wilcox 1 sibling, 1 reply; 14+ messages in thread From: Suren Baghdasaryan @ 2022-12-28 17:10 UTC (permalink / raw) To: Hyeonggon Yoo Cc: linux-mm, liam.howlett, willy, ldufour, michel, vbabka, linux-kernel Hi Hyeonggon, On Wed, Dec 28, 2022 at 4:49 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote: > > Hello mm folks, > > I have a few questions about the current status of mmap_lock scalability. > > ============================================================= > What is currently causing the kernel to use mmap_lock to protect the maple tree? > ============================================================= > > I understand that the long-term goal is to remove the need for mmap_lock in readers > while traversing the maple tree, using techniques such as RCU or SPF. > What is the biggest obstacle preventing this from being achieved at this time? Maple tree has an RCU mode which does not need mmap_lock for traversal. Liam and I were testing it recently and Liam fixed a number of issues to enable it. It seems stable now and the fixes are incorporated into the "per-vma locks" patchset which I prepared in this branch: https://github.com/surenbaghdasaryan/linux/tree/per_vma_lock. I haven't posted this patchset upstream yet but it's pretty much ready to go. I'm planning to post it in early January. Thanks, Suren. > > ================================================== > How does the maple tree provide RCU-safe manipulation of VMAs? > ================================================== > > Is it similar to the approach suggested in the RCUVM paper (replacing the original > root node with a new root node that shares most of its nodes and deferring > the freeing of stale nodes using RCU)? > > I'm having difficulty understanding the design of the maple tree in this regard. > > [RCUVM paper] https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf > > Thank you for your time. > > --- > Hyeonggon ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2022-12-28 17:10 ` Suren Baghdasaryan @ 2022-12-29 11:33 ` Hyeonggon Yoo 0 siblings, 0 replies; 14+ messages in thread From: Hyeonggon Yoo @ 2022-12-29 11:33 UTC (permalink / raw) To: Suren Baghdasaryan Cc: linux-mm, liam.howlett, willy, ldufour, michel, vbabka, linux-kernel On Wed, Dec 28, 2022 at 09:10:20AM -0800, Suren Baghdasaryan wrote: > Hi Hyeonggon, > > On Wed, Dec 28, 2022 at 4:49 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote: > > > > Hello mm folks, > > > > I have a few questions about the current status of mmap_lock scalability. > > > > ============================================================= > > What is currently causing the kernel to use mmap_lock to protect the maple tree? > > ============================================================= > > > > I understand that the long-term goal is to remove the need for mmap_lock in readers > > while traversing the maple tree, using techniques such as RCU or SPF. > > What is the biggest obstacle preventing this from being achieved at this time? > > Maple tree has an RCU mode which does not need mmap_lock for > traversal. Liam and I were testing it recently and Liam fixed a number > of issues to enable it. It seems stable now and the fixes are > incorporated into the "per-vma locks" patchset which I prepared in > this branch: https://github.com/surenbaghdasaryan/linux/tree/per_vma_lock. Thank you for the link. I didn't realize how far the discussion had progressed. Let me check if I understand correctly: To allow non-overlapping page faults while writers are performing VMA operations, per-VMA locking moves from the mmap_lock to the VMA lock on the reader side during page fault. While maple tree traversal is done without locking, readers must take VMA lock in read mode within RCU read section (or retry taking mmap_lock if failed) to process page fault. This ensures that readers are not racing with writers for access to the same VMA. Am I getting it right? > I haven't posted this patchset upstream yet but it's pretty much ready > to go. I'm planning to post it in early January. Looking forward to that, thank you for working on this. -- Thanks, Hyeonggon ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2022-12-28 12:48 [QUESTION] about the maple tree and current status of mmap_lock scalability Hyeonggon Yoo 2022-12-28 17:10 ` Suren Baghdasaryan @ 2022-12-28 20:50 ` Matthew Wilcox 2022-12-29 14:22 ` Hyeonggon Yoo 1 sibling, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2022-12-28 20:50 UTC (permalink / raw) To: Hyeonggon Yoo Cc: linux-mm, liam.howlett, surenb, ldufour, michel, vbabka, linux-kernel On Wed, Dec 28, 2022 at 09:48:51PM +0900, Hyeonggon Yoo wrote: > Hello mm folks, > > I have a few questions about the current status of mmap_lock scalability. > > ============================================================= > What is currently causing the kernel to use mmap_lock to protect the maple tree? > ============================================================= > > I understand that the long-term goal is to remove the need for mmap_lock in readers > while traversing the maple tree, using techniques such as RCU or SPF. > What is the biggest obstacle preventing this from being achieved at this time? The long term goal is even larger than this. Ideally, the VMA tree would be protected by a spinlock rather than a mutex. That turned out to be too large a change for the moment (and isn't all that important compared to enabling RCU readers) > ================================================== > How does the maple tree provide RCU-safe manipulation of VMAs? > ================================================== > > Is it similar to the approach suggested in the RCUVM paper (replacing the original > root node with a new root node that shares most of its nodes and deferring > the freeing of stale nodes using RCU)? > > I'm having difficulty understanding the design of the maple tree in this regard. > > [RCUVM paper] https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf While I've read the RCUVM paper, I wouldn't say it was particularly an inspiration. The Maple Tree is independent of the VM; it's a general purpose B-tree. As with any B-tree, when modifying a node, we don't touch nodes that we don't need to touch. As with any RCU data structure, we defer freeing things while RCU readers might still have a reference to them. We don't necessarily go all the way to the root node when modifying a leaf node. For example, if we have this structure: Root: Node A, 4000, Node B Node A: p1, 50, p2, 100, p3, 150, p4, 200, NULL, 250, p6, 1000, p7 Node B: p8, 4050, p9, 4100, p10, 4150, p11, 4200, NULL, 4250, p13 and we replace p4 with a NULL over the whole range from 150-199, we construct a new Node A2 that contains: Node A2: p1, 50, p2, 100, p3, 150, NULL, 250, p6, 1000, p7 and we simply write A2 over the entry in Root. Then we mark Node A as dead and RCU-free Node A. There's no need to replace Root as stores to a pointer are atomic. If we need to rebalance between Node A and Node B, we will need to create a new Root (as well as both A and B), mark all of them as dead and RCU-free them. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2022-12-28 20:50 ` Matthew Wilcox @ 2022-12-29 14:22 ` Hyeonggon Yoo 2022-12-29 16:51 ` Matthew Wilcox 0 siblings, 1 reply; 14+ messages in thread From: Hyeonggon Yoo @ 2022-12-29 14:22 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-mm, liam.howlett, surenb, ldufour, michel, vbabka, linux-kernel On Wed, Dec 28, 2022 at 08:50:36PM +0000, Matthew Wilcox wrote: > On Wed, Dec 28, 2022 at 09:48:51PM +0900, Hyeonggon Yoo wrote: > > Hello mm folks, > > > > I have a few questions about the current status of mmap_lock scalability. > > > > ============================================================= > > What is currently causing the kernel to use mmap_lock to protect the maple tree? > > ============================================================= > > > > I understand that the long-term goal is to remove the need for mmap_lock in readers > > while traversing the maple tree, using techniques such as RCU or SPF. > > What is the biggest obstacle preventing this from being achieved at this time? > > The long term goal is even larger than this. Ideally, the VMA tree > would be protected by a spinlock rather than a mutex. You mean replacing mmap_lock rwsem with a spinlock? How is that possible if readers can take it for page fault? > That turned out > to be too large a change for the moment (and isn't all that important > compared to enabling RCU readers) Yeah, better to take one step at a time. > > > ================================================== > > How does the maple tree provide RCU-safe manipulation of VMAs? > > ================================================== > > > > Is it similar to the approach suggested in the RCUVM paper (replacing the original > > root node with a new root node that shares most of its nodes and deferring > > the freeing of stale nodes using RCU)? > > > > I'm having difficulty understanding the design of the maple tree in this regard. > > > > [RCUVM paper] https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf > > While I've read the RCUVM paper, I wouldn't say it was particularly an > inspiration. The Maple Tree is independent of the VM; it's a general > purpose B-tree. My intention was to ask how to synchronize with other VMA operations after the tree traversal with RCU. (Because it's unreasonable to handle page fault in RCU read-side critical section) Per-VMA lock seem to solve it by taking the VMA lock in read mode within RCU read-side critical section. > As with any B-tree, when modifying a node, we don't > touch nodes that we don't need to touch. As with any RCU data structure, > we defer freeing things while RCU readers might still have a reference > to them. > > We don't necessarily go all the way to the root node when modifying a > leaf node. For example, if we have this structure: > > Root: Node A, 4000, Node B > Node A: p1, 50, p2, 100, p3, 150, p4, 200, NULL, 250, p6, 1000, p7 > Node B: p8, 4050, p9, 4100, p10, 4150, p11, 4200, NULL, 4250, p13 > > and we replace p4 with a NULL over the whole range from 150-199, > we construct a new Node A2 that contains: > > Node A2: p1, 50, p2, 100, p3, 150, NULL, 250, p6, 1000, p7 > > and we simply write A2 over the entry in Root. Then we mark Node A as > dead and RCU-free Node A. There's no need to replace Root as stores > to a pointer are atomic. Thank you for explaining things in an easy and intuitive way. Okay, I get it's not a big problem to update the value(s) in a B-tree in RCU-safe way. > If we need to rebalance between Node A and > Node B, we will need to create a new Root (as well as both A and B), > mark all of them as dead and RCU-free them. -- Thanks, Hyeonggon ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2022-12-29 14:22 ` Hyeonggon Yoo @ 2022-12-29 16:51 ` Matthew Wilcox 2022-12-29 17:10 ` Lorenzo Stoakes 2023-01-02 12:04 ` Hyeonggon Yoo 0 siblings, 2 replies; 14+ messages in thread From: Matthew Wilcox @ 2022-12-29 16:51 UTC (permalink / raw) To: Hyeonggon Yoo Cc: linux-mm, liam.howlett, surenb, ldufour, michel, vbabka, linux-kernel On Thu, Dec 29, 2022 at 11:22:28PM +0900, Hyeonggon Yoo wrote: > On Wed, Dec 28, 2022 at 08:50:36PM +0000, Matthew Wilcox wrote: > > The long term goal is even larger than this. Ideally, the VMA tree > > would be protected by a spinlock rather than a mutex. > > You mean replacing mmap_lock rwsem with a spinlock? > How is that possible if readers can take it for page fault? The mmap_lock is taken for many, many things. So the plan was to have a spinlock in the maple tree (indeed, there's still one there; it's just in a union with the lockdep_map_p). VMA readers would walk the tree protected only by RCU; VMA writers would take the spinlock while modifying the tree. The work Suren, Liam & I are engaged in still uses the mmap semaphore for writers, but we do walk the tree under RCU protection. > > While I've read the RCUVM paper, I wouldn't say it was particularly an > > inspiration. The Maple Tree is independent of the VM; it's a general > > purpose B-tree. > > My intention was to ask how to synchronize with other VMA operations > after the tree traversal with RCU. (Because it's unreasonable to handle > page fault in RCU read-side critical section) > > Per-VMA lock seem to solve it by taking the VMA lock in read mode within > RCU read-side critical section. Right, but it's a little more complex than that. The real "lock" on the VMA is actually a sequence count. https://lwn.net/Articles/906852/ does a good job of explaining it, but the VMA lock is really there as a convenient way for the writer to wait for readers to be sufficiently "finished" with handling the page fault that any conflicting changes will be correctly retired. https://www.infradead.org/~willy/linux/store-free-page-faults.html outlines how I intend to proceed from Suren's current scheme (where RCU is only used to protect the tree walk) to using RCU for the entire page fault. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2022-12-29 16:51 ` Matthew Wilcox @ 2022-12-29 17:10 ` Lorenzo Stoakes 2022-12-29 17:21 ` Suren Baghdasaryan 2022-12-29 17:31 ` Matthew Wilcox 2023-01-02 12:04 ` Hyeonggon Yoo 1 sibling, 2 replies; 14+ messages in thread From: Lorenzo Stoakes @ 2022-12-29 17:10 UTC (permalink / raw) To: Matthew Wilcox Cc: Hyeonggon Yoo, linux-mm, liam.howlett, surenb, ldufour, michel, vbabka, linux-kernel On Thu, Dec 29, 2022 at 04:51:37PM +0000, Matthew Wilcox wrote: > The mmap_lock is taken for many, many things. [snip] I am currently describing the use of this lock (for 6.0) in the book and it is striking just how broadly it's used. I'm diagramming it out for 'core' users, i.e. non-driver and non-some other things, but even constraining that leaves a HUGE number of users. I've also documented the 'unexpected' uses of the page_table_lock, which seems to have been significantly improved over time but still a few cases remain! Am happy to give you (+ anybody else on MAINTAINERS list) an early copy of the relevant bit (once I've finished the diagrams anyway) if that'd be helpful! Now if you guys could stop obsoleting my work that'd be great ;) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2022-12-29 17:10 ` Lorenzo Stoakes @ 2022-12-29 17:21 ` Suren Baghdasaryan 2022-12-29 17:31 ` Matthew Wilcox 1 sibling, 0 replies; 14+ messages in thread From: Suren Baghdasaryan @ 2022-12-29 17:21 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Matthew Wilcox, Hyeonggon Yoo, linux-mm, liam.howlett, ldufour, michel, vbabka, linux-kernel On Thu, Dec 29, 2022 at 9:10 AM Lorenzo Stoakes <lstoakes@gmail.com> wrote: > > On Thu, Dec 29, 2022 at 04:51:37PM +0000, Matthew Wilcox wrote: > > The mmap_lock is taken for many, many things. [snip] > > I am currently describing the use of this lock (for 6.0) in the book and it is > striking just how broadly it's used. I'm diagramming it out for 'core' users, > i.e. non-driver and non-some other things, but even constraining that leaves a > HUGE number of users. I've also documented the 'unexpected' uses of the > page_table_lock, which seems to have been significantly improved over time but > still a few cases remain! > > Am happy to give you (+ anybody else on MAINTAINERS list) an early copy of the > relevant bit (once I've finished the diagrams anyway) if that'd be helpful! Yes please, that would be interesting. > > Now if you guys could stop obsoleting my work that'd be great ;) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2022-12-29 17:10 ` Lorenzo Stoakes 2022-12-29 17:21 ` Suren Baghdasaryan @ 2022-12-29 17:31 ` Matthew Wilcox 1 sibling, 0 replies; 14+ messages in thread From: Matthew Wilcox @ 2022-12-29 17:31 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Hyeonggon Yoo, linux-mm, liam.howlett, surenb, ldufour, michel, vbabka, linux-kernel On Thu, Dec 29, 2022 at 05:10:28PM +0000, Lorenzo Stoakes wrote: > On Thu, Dec 29, 2022 at 04:51:37PM +0000, Matthew Wilcox wrote: > > The mmap_lock is taken for many, many things. [snip] > > I am currently describing the use of this lock (for 6.0) in the book and it is > striking just how broadly it's used. I'm diagramming it out for 'core' users, > i.e. non-driver and non-some other things, but even constraining that leaves a > HUGE number of users. I fear this would be overwhelming. I don't think anybody would disagree that the mmap_lock needs to be split up like the BKL was, but we didn't do that by diagramming it out. Instead, we introduced new smaller locks that protected much better-defined things until eventually we were able to kill the BKL entirely. That's what I'm trying to do here -- there is one well-defined thing that the maple tree lock will protect, and that's the structure of the maple tree. It doesn't protect the data pointed to by the pointers stored in the tree, just the maple tree itself. > I've also documented the 'unexpected' uses of the > page_table_lock, which seems to have been significantly improved over time but > still a few cases remain! Now, I think this is useful. There's probably few enough abuses of the PTL that my brain can wrap itself around which ones are legitimate and then deal with the inappropriate ones. > Am happy to give you (+ anybody else on MAINTAINERS list) an early copy of the > relevant bit (once I've finished the diagrams anyway) if that'd be helpful! I'm definitely interested in the PTL. Thank you for the offer! > Now if you guys could stop obsoleting my work that'd be great ;) Never! How else will you get interest in the Second Edition Covering Linux 7.0? ;-) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2022-12-29 16:51 ` Matthew Wilcox 2022-12-29 17:10 ` Lorenzo Stoakes @ 2023-01-02 12:04 ` Hyeonggon Yoo 2023-01-02 14:37 ` Matthew Wilcox 1 sibling, 1 reply; 14+ messages in thread From: Hyeonggon Yoo @ 2023-01-02 12:04 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-mm, liam.howlett, surenb, ldufour, michel, vbabka, linux-kernel From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Matthew Wilcox <willy@infradead.org> Cc: linux-mm@kvack.org, liam.howlett@oracle.com, surenb@google.com, ldufour@linux.ibm.com, michel@lespinasse.org, vbabka@suse.cz, linux-kernel@vger.kernel.org Bcc: Subject: Re: [QUESTION] about the maple tree and current status of mmap_lock scalability Reply-To: In-Reply-To: <Y63FmaNoLAcdsLaU@casper.infradead.org> On Thu, Dec 29, 2022 at 04:51:37PM +0000, Matthew Wilcox wrote: > On Thu, Dec 29, 2022 at 11:22:28PM +0900, Hyeonggon Yoo wrote: > > On Wed, Dec 28, 2022 at 08:50:36PM +0000, Matthew Wilcox wrote: > > > The long term goal is even larger than this. Ideally, the VMA tree > > > would be protected by a spinlock rather than a mutex. > > > > You mean replacing mmap_lock rwsem with a spinlock? > > How is that possible if readers can take it for page fault? > > The mmap_lock is taken for many, many things. So the plan was to > have a spinlock in the maple tree (indeed, there's still one there; > it's just in a union with the lockdep_map_p). VMA readers would walk > the tree protected only by RCU; VMA writers would take the spinlock > while modifying the tree. The work Suren, Liam & I are engaged in > still uses the mmap semaphore for writers, but we do walk the tree > under RCU protection. > Thanks, I get it. so it's for less overhead for maple tree modification. > > > While I've read the RCUVM paper, I wouldn't say it was particularly an > > > inspiration. The Maple Tree is independent of the VM; it's a general > > > purpose B-tree. > > > > My intention was to ask how to synchronize with other VMA operations > > after the tree traversal with RCU. (Because it's unreasonable to handle > > page fault in RCU read-side critical section) > > > > Per-VMA lock seem to solve it by taking the VMA lock in read mode within > > RCU read-side critical section. > > Right, but it's a little more complex than that. The real "lock" on > the VMA is actually a sequence count. https://lwn.net/Articles/906852/ > does a good job of explaining it, but the VMA lock is really there as > a convenient way for the writer to wait for readers to be sufficiently > "finished" with handling the page fault that any conflicting changes > will be correctly retired. Oh, thanks, nice article! > https://www.infradead.org/~willy/linux/store-free-page-faults.html > outlines how I intend to proceed from Suren's current scheme (where > RCU is only used to protect the tree walk) to using RCU for the > entire page fault. Thank you for sharing this your outlines. Okay, so the planned scheme is: 1. Try to process entire page fault under RCU protection - if failed, goto 2. if succeeded, goto 4. 2. Fall back to Suren's scheme (try to take VMA rwsem) - if failed, goto 3. if succeeded, goto 4. 3. Fall back to mmap_lock - goto 4. 4. Finish page fault. To implement 1, __p*d_alloc() need to take gfp flags not to sleep in RCU read-side critical section. What about introducing PF_MEMALLOC_NOWAIT process flag forcing GFP_NOWAIT | __GFP_NOWARN similar to PF_MEMALLOC_NO{FS,IO}, looking like this? Will be less churn. diff --git a/include/linux/sched.h b/include/linux/sched.h index 853d08f7562b..77b88f30523b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1725,7 +1725,7 @@ extern struct pid *cad_pid; #define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */ #define PF__HOLE__00004000 0x00004000 #define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */ -#define PF__HOLE__00010000 0x00010000 +#define PF_MEMALLOC_NOWAIT 0x00010000 /* All allocation requests will force GFP_NOWAIT | __GFP_NOWARN */ #define PF_KSWAPD 0x00020000 /* I am kswapd */ #define PF_MEMALLOC_NOFS 0x00040000 /* All allocation requests will inherit GFP_NOFS */ #define PF_MEMALLOC_NOIO 0x00080000 /* All allocation requests will inherit GFP_NOIO */ diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 2a243616f222..4a1196646951 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -204,7 +204,8 @@ static inline gfp_t current_gfp_context(gfp_t flags) { unsigned int pflags = READ_ONCE(current->flags); - if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) { + if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS + | PF_MEMALLOC_PIN | PF_MEMALLOC_NOWAIT))) { /* * NOIO implies both NOIO and NOFS and it is a weaker context * so always make sure it makes precedence @@ -216,6 +217,8 @@ static inline gfp_t current_gfp_context(gfp_t flags) if (pflags & PF_MEMALLOC_PIN) flags &= ~__GFP_MOVABLE; + if (pflags & PF_MEMALLOC_NOWAIT) + flags = GFP_NOWAIT | __GFP_NOWARN; } return flags; } @@ -305,6 +308,18 @@ static inline void memalloc_noio_restore(unsigned int flags) current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags; } +static inline unsigned int memalloc_nowait_save(void) +{ + unsigned int flags = current->flags & PF_MEMALLOC_NOWAIT; + current->flags |= PF_MEMALLOC_NOWAIT; + return flags; +} + +static inline void memalloc_nowait_restore(unsigned int flags) +{ + current->flags = (current->flags & ~PF_MEMALLOC_NOWAIT) | flags; -- Thanks, Hyeonggon ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2023-01-02 12:04 ` Hyeonggon Yoo @ 2023-01-02 14:37 ` Matthew Wilcox 2023-02-20 14:26 ` Hyeonggon Yoo 0 siblings, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2023-01-02 14:37 UTC (permalink / raw) To: Hyeonggon Yoo Cc: linux-mm, liam.howlett, surenb, ldufour, michel, vbabka, linux-kernel On Mon, Jan 02, 2023 at 09:04:12PM +0900, Hyeonggon Yoo wrote: > > https://www.infradead.org/~willy/linux/store-free-page-faults.html > > outlines how I intend to proceed from Suren's current scheme (where > > RCU is only used to protect the tree walk) to using RCU for the > > entire page fault. > > Thank you for sharing this your outlines. > Okay, so the planned scheme is: > > 1. Try to process entire page fault under RCU protection > - if failed, goto 2. if succeeded, goto 4. > > 2. Fall back to Suren's scheme (try to take VMA rwsem) > - if failed, goto 3. if succeeded, goto 4. Right. The question is whether to restart the page fault under Suren's scheme, or just grab the VMA rwsem and continue. Experimentation needed. It's also worth noting that Michel has an alternative proposal, which is to drop out of RCU protection before trying to allocate memory, then re-enter RCU mode and check the sequence count hasn't changed on the entire MM. His proposal has the advantage of not trying to allocate memory while holding the RCU read lock, but the disadvantage of having to retry the page fault if anyone has called mmap() or munmap(). Which alternative is better is going to depend on the workload; do we see more calls to mmap()/munmap(), or do we need to enter page reclaim more often? I think they're largely equivalent performance-wise in the fast path. Another metric to consider is code complexity; he thinks his method is easier to understand and I think mine is easier. To be expected, I suppose ;-) > 3. Fall back to mmap_lock > - goto 4. > > 4. Finish page fault. > > To implement 1, __p*d_alloc() need to take gfp flags > not to sleep in RCU read-side critical section. > > What about introducing PF_MEMALLOC_NOWAIT process flag forcing > GFP_NOWAIT | __GFP_NOWARN > > similar to PF_MEMALLOC_NO{FS,IO}, looking like this? > > Will be less churn. Certainly less churn, but also far more risky. All of a sudden, codepaths which used to always succeed will now start failing, and either there aren't checks for memory allocation failures or those paths have never been tested before. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2023-01-02 14:37 ` Matthew Wilcox @ 2023-02-20 14:26 ` Hyeonggon Yoo 2023-02-20 14:43 ` Matthew Wilcox 0 siblings, 1 reply; 14+ messages in thread From: Hyeonggon Yoo @ 2023-02-20 14:26 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-mm, liam.howlett, surenb, ldufour, michel, vbabka, linux-kernel On Mon, Jan 02, 2023 at 02:37:02PM +0000, Matthew Wilcox wrote: > On Mon, Jan 02, 2023 at 09:04:12PM +0900, Hyeonggon Yoo wrote: > > > https://www.infradead.org/~willy/linux/store-free-page-faults.html > > > outlines how I intend to proceed from Suren's current scheme (where > > > RCU is only used to protect the tree walk) to using RCU for the > > > entire page fault. > > > > Thank you for sharing this your outlines. > > Okay, so the planned scheme is: > > > > 1. Try to process entire page fault under RCU protection > > - if failed, goto 2. if succeeded, goto 4. > > > > 2. Fall back to Suren's scheme (try to take VMA rwsem) > > - if failed, goto 3. if succeeded, goto 4. > > Right. The question is whether to restart the page fault under Suren's > scheme, or just grab the VMA rwsem and continue. Experimentation > needed. > > It's also worth noting that Michel has an alternative proposal, which > is to drop out of RCU protection before trying to allocate memory, then > re-enter RCU mode and check the sequence count hasn't changed on the > entire MM. His proposal has the advantage of not trying to allocate > memory while holding the RCU read lock, but the disadvantage of having > to retry the page fault if anyone has called mmap() or munmap(). Which > alternative is better is going to depend on the workload; do we see more > calls to mmap()/munmap(), or do we need to enter page reclaim more often? > I think they're largely equivalent performance-wise in the fast path. > Another metric to consider is code complexity; he thinks his method > is easier to understand and I think mine is easier. To be expected, > I suppose ;-) I'm planning to suggest a cooperative project to my colleagues that would involve making __p?d_alloc() take gfp flags. Wondering if there was any progress or conclusion made on which approach is better for full RCU page faults, or was there another solution proposed? Asking this because I don't want to waste my time if the approach has been abandoned. Regards, Hyeonggon > > 3. Fall back to mmap_lock > > - goto 4. > > > > 4. Finish page fault. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2023-02-20 14:26 ` Hyeonggon Yoo @ 2023-02-20 14:43 ` Matthew Wilcox 2023-02-22 11:38 ` Hyeonggon Yoo 0 siblings, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2023-02-20 14:43 UTC (permalink / raw) To: Hyeonggon Yoo Cc: linux-mm, liam.howlett, surenb, ldufour, michel, vbabka, linux-kernel On Mon, Feb 20, 2023 at 02:26:49PM +0000, Hyeonggon Yoo wrote: > On Mon, Jan 02, 2023 at 02:37:02PM +0000, Matthew Wilcox wrote: > > On Mon, Jan 02, 2023 at 09:04:12PM +0900, Hyeonggon Yoo wrote: > > > > https://www.infradead.org/~willy/linux/store-free-page-faults.html > > > > outlines how I intend to proceed from Suren's current scheme (where > > > > RCU is only used to protect the tree walk) to using RCU for the > > > > entire page fault. > > > > > > Thank you for sharing this your outlines. > > > Okay, so the planned scheme is: > > > > > > 1. Try to process entire page fault under RCU protection > > > - if failed, goto 2. if succeeded, goto 4. > > > > > > 2. Fall back to Suren's scheme (try to take VMA rwsem) > > > - if failed, goto 3. if succeeded, goto 4. > > > > Right. The question is whether to restart the page fault under Suren's > > scheme, or just grab the VMA rwsem and continue. Experimentation > > needed. > > > > It's also worth noting that Michel has an alternative proposal, which > > is to drop out of RCU protection before trying to allocate memory, then > > re-enter RCU mode and check the sequence count hasn't changed on the > > entire MM. His proposal has the advantage of not trying to allocate > > memory while holding the RCU read lock, but the disadvantage of having > > to retry the page fault if anyone has called mmap() or munmap(). Which > > alternative is better is going to depend on the workload; do we see more > > calls to mmap()/munmap(), or do we need to enter page reclaim more often? > > I think they're largely equivalent performance-wise in the fast path. > > Another metric to consider is code complexity; he thinks his method > > is easier to understand and I think mine is easier. To be expected, > > I suppose ;-) > > I'm planning to suggest a cooperative project to my colleagues > that would involve making __p?d_alloc() take gfp flags. > > Wondering if there was any progress or conclusion made on which > approach is better for full RCU page faults, or was there another > solution proposed? > > Asking this because I don't want to waste my time if the approach > has been abandoned. Thanks for checking, but nobody's made any progress on this, that I know of. (The __p?d_alloc() approach may also be useful to support vmalloc() with flags that aren't GFP_KERNEL compatible) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [QUESTION] about the maple tree and current status of mmap_lock scalability 2023-02-20 14:43 ` Matthew Wilcox @ 2023-02-22 11:38 ` Hyeonggon Yoo 0 siblings, 0 replies; 14+ messages in thread From: Hyeonggon Yoo @ 2023-02-22 11:38 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-mm, liam.howlett, surenb, ldufour, michel, vbabka, linux-kernel On Mon, Feb 20, 2023 at 02:43:23PM +0000, Matthew Wilcox wrote: > On Mon, Feb 20, 2023 at 02:26:49PM +0000, Hyeonggon Yoo wrote: > > On Mon, Jan 02, 2023 at 02:37:02PM +0000, Matthew Wilcox wrote: > > > On Mon, Jan 02, 2023 at 09:04:12PM +0900, Hyeonggon Yoo wrote: > > > > > https://www.infradead.org/~willy/linux/store-free-page-faults.html > > > > > outlines how I intend to proceed from Suren's current scheme (where > > > > > RCU is only used to protect the tree walk) to using RCU for the > > > > > entire page fault. > > > > > > > > Thank you for sharing this your outlines. > > > > Okay, so the planned scheme is: > > > > > > > > 1. Try to process entire page fault under RCU protection > > > > - if failed, goto 2. if succeeded, goto 4. > > > > > > > > 2. Fall back to Suren's scheme (try to take VMA rwsem) > > > > - if failed, goto 3. if succeeded, goto 4. > > > > > > Right. The question is whether to restart the page fault under Suren's > > > scheme, or just grab the VMA rwsem and continue. Experimentation > > > needed. > > > > > > It's also worth noting that Michel has an alternative proposal, which > > > is to drop out of RCU protection before trying to allocate memory, then > > > re-enter RCU mode and check the sequence count hasn't changed on the > > > entire MM. His proposal has the advantage of not trying to allocate > > > memory while holding the RCU read lock, but the disadvantage of having > > > to retry the page fault if anyone has called mmap() or munmap(). Which > > > alternative is better is going to depend on the workload; do we see more > > > calls to mmap()/munmap(), or do we need to enter page reclaim more often? > > > I think they're largely equivalent performance-wise in the fast path. > > > Another metric to consider is code complexity; he thinks his method > > > is easier to understand and I think mine is easier. To be expected, > > > I suppose ;-) > > > > I'm planning to suggest a cooperative project to my colleagues > > that would involve making __p?d_alloc() take gfp flags. > > > > Wondering if there was any progress or conclusion made on which > > approach is better for full RCU page faults, or was there another > > solution proposed? > > > > Asking this because I don't want to waste my time if the approach > > has been abandoned. > > Thanks for checking, but nobody's made any progress on this, that I know > of. Thanks for confirmation. then I think it's still worth trying. > (The __p?d_alloc() approach may also be useful to support vmalloc() > with flags that aren't GFP_KERNEL compatible) Is there any possible users of that, sounds like someone tries to call __vmalloc() in interrupt context or RCU read-side critical section? ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-02-22 11:38 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-12-28 12:48 [QUESTION] about the maple tree and current status of mmap_lock scalability Hyeonggon Yoo 2022-12-28 17:10 ` Suren Baghdasaryan 2022-12-29 11:33 ` Hyeonggon Yoo 2022-12-28 20:50 ` Matthew Wilcox 2022-12-29 14:22 ` Hyeonggon Yoo 2022-12-29 16:51 ` Matthew Wilcox 2022-12-29 17:10 ` Lorenzo Stoakes 2022-12-29 17:21 ` Suren Baghdasaryan 2022-12-29 17:31 ` Matthew Wilcox 2023-01-02 12:04 ` Hyeonggon Yoo 2023-01-02 14:37 ` Matthew Wilcox 2023-02-20 14:26 ` Hyeonggon Yoo 2023-02-20 14:43 ` Matthew Wilcox 2023-02-22 11:38 ` Hyeonggon Yoo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).