From: Baoquan He <bhe@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "Russell King (Oracle)" <linux@armlinux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
Uladzislau Rezki <urezki@gmail.com>,
Lorenzo Stoakes <lstoakes@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
John Ogness <jogness@linutronix.de>,
linux-arm-kernel@lists.infradead.org,
Mark Rutland <mark.rutland@arm.com>,
Marc Zyngier <maz@kernel.org>,
x86@kernel.org, Nadav Amit <nadav.amit@gmail.com>
Subject: Re: [RFC PATCH 3/3] mm/vmalloc.c: change _vm_unmap_aliases() to do purge firstly
Date: Tue, 23 May 2023 17:35:30 +0800 [thread overview]
Message-ID: <ZGyI4mXc+F2gVveR@MiWiFi-R3L-srv> (raw)
In-Reply-To: <87h6s4w20b.ffs@tglx>
On 05/22/23 at 10:21pm, Thomas Gleixner wrote:
> On Mon, May 22 2023 at 22:34, Baoquan He wrote:
> > On 05/22/23 at 02:02pm, Thomas Gleixner wrote:
> >> > @@ -1736,6 +1737,14 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
> >> > list_replace_init(&purge_vmap_area_list, &local_purge_list);
> >> > spin_unlock(&purge_vmap_area_lock);
> >> >
> >> > + vb = container_of(va, struct vmap_block, va);
> >>
> >> This cannot work vmap_area is not embedded in vmap_block. vmap_block::va
> >> is a pointer. vmap_area does not link back to vmap_block, so there is no
> >> way to find it based on a vmap_area.
> >
> > Oh, the code is buggy. va->flags can tell if it's vmap_block, then we
> > can deduce the vb pointer.
>
> No. It _CANNOT_ work whether you check the flags or not.
>
> struct foo {
> .....
> struct bar bar;
> };
>
> container_of(ptr_to_bar, struct foo, bar) returns the pointer to the
> struct foo which has struct bar embedded.
>
> But
>
> struct foo {
> .....
> struct bar *bar;
> };
>
> cannot do that because ptr_to_bar points to some object which is
> completely disconnected from struct foo.
>
> Care to look at the implementation of container_of()?
>
> Here is what it boils down to:
>
> void *member_pointer = bar;
>
> p = (struct foo *)(member_pointer - offsetof(struct foo, bar);
>
> So it uses the pointer to bar and subtracts the offset of bar in struct
> foo. This obviously can only work when struct bar is embedded in struct
> foo.
>
> Lets assume that *bar is the first member of foo, i.e. offset of *bar in
> struct foo is 0
>
> p = (struct foo *)(member_pointer - 0);
>
> So you end up with
>
> p == member_pointer == bar
>
> But you won't get there because the static_assert() in container_of()
> will catch that and the compiler will tell you in colourful ways.
Thanks a lot, learn it now. I never noticed container_of() is not
suitable for pointer member of struct case.
>
> Once the vmap area is handed over for cleaning up the vmap block is gone
> and even if you let it stay around then the vmap area does not have any
> information where to find the block.
>
> You'd need to have a pointer to the vmap block in vmap area or embed
> vmap area into vmap block.
Got it now. Embedding vmap_area into vmap_block seems not feasible
because va need be reused when inserting into free_vmap_area_root/list.
Adding a pointer to vmap_block looks do-able.
Since vm_map_ram area doesn't have vm_struct associated with it, we can
reuse the space of '->vm' to add vb pointer like below. Since in the
existing code there are places where we use 'if(!va->vm)' to check if
it's a normal va, we need be careful to find all of them out and replace
with new and tighter checking. Will give a draft code change after all
is done and testing is passed.
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c720be70c8dd..e2ba6d59d679 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -15,6 +15,7 @@
struct vm_area_struct; /* vma defining user mapping in mm_types.h */
struct notifier_block; /* in notifier.h */
struct iov_iter; /* in uio.h */
+struct vmap_block; /* in mm/vmalloc.c */
/* bits in flags of vmalloc's vm_struct below */
#define VM_IOREMAP 0x00000001 /* ioremap() and friends */
@@ -76,6 +77,7 @@ struct vmap_area {
union {
unsigned long subtree_max_size; /* in "free" tree */
struct vm_struct *vm; /* in "busy" tree */
+ struct vmap_block *vb; /* in "busy and purge" tree */
};
unsigned long flags; /* mark type of vm_map_ram area */
};
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index c0f80982eb06..d97343271e27 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2061,6 +2061,10 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
return ERR_PTR(err);
}
+ spin_lock(&vmap_area_lock);
+ va->vb = vb;
+ spin_unlock(&vmap_area_lock);
+
vbq = raw_cpu_ptr(&vmap_block_queue);
spin_lock(&vbq->lock);
list_add_tail_rcu(&vb->free_list, &vbq->free);
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Baoquan He <bhe@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "Russell King (Oracle)" <linux@armlinux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
Uladzislau Rezki <urezki@gmail.com>,
Lorenzo Stoakes <lstoakes@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
John Ogness <jogness@linutronix.de>,
linux-arm-kernel@lists.infradead.org,
Mark Rutland <mark.rutland@arm.com>,
Marc Zyngier <maz@kernel.org>,
x86@kernel.org, Nadav Amit <nadav.amit@gmail.com>
Subject: Re: [RFC PATCH 3/3] mm/vmalloc.c: change _vm_unmap_aliases() to do purge firstly
Date: Tue, 23 May 2023 17:35:30 +0800 [thread overview]
Message-ID: <ZGyI4mXc+F2gVveR@MiWiFi-R3L-srv> (raw)
In-Reply-To: <87h6s4w20b.ffs@tglx>
On 05/22/23 at 10:21pm, Thomas Gleixner wrote:
> On Mon, May 22 2023 at 22:34, Baoquan He wrote:
> > On 05/22/23 at 02:02pm, Thomas Gleixner wrote:
> >> > @@ -1736,6 +1737,14 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
> >> > list_replace_init(&purge_vmap_area_list, &local_purge_list);
> >> > spin_unlock(&purge_vmap_area_lock);
> >> >
> >> > + vb = container_of(va, struct vmap_block, va);
> >>
> >> This cannot work vmap_area is not embedded in vmap_block. vmap_block::va
> >> is a pointer. vmap_area does not link back to vmap_block, so there is no
> >> way to find it based on a vmap_area.
> >
> > Oh, the code is buggy. va->flags can tell if it's vmap_block, then we
> > can deduce the vb pointer.
>
> No. It _CANNOT_ work whether you check the flags or not.
>
> struct foo {
> .....
> struct bar bar;
> };
>
> container_of(ptr_to_bar, struct foo, bar) returns the pointer to the
> struct foo which has struct bar embedded.
>
> But
>
> struct foo {
> .....
> struct bar *bar;
> };
>
> cannot do that because ptr_to_bar points to some object which is
> completely disconnected from struct foo.
>
> Care to look at the implementation of container_of()?
>
> Here is what it boils down to:
>
> void *member_pointer = bar;
>
> p = (struct foo *)(member_pointer - offsetof(struct foo, bar);
>
> So it uses the pointer to bar and subtracts the offset of bar in struct
> foo. This obviously can only work when struct bar is embedded in struct
> foo.
>
> Lets assume that *bar is the first member of foo, i.e. offset of *bar in
> struct foo is 0
>
> p = (struct foo *)(member_pointer - 0);
>
> So you end up with
>
> p == member_pointer == bar
>
> But you won't get there because the static_assert() in container_of()
> will catch that and the compiler will tell you in colourful ways.
Thanks a lot, learn it now. I never noticed container_of() is not
suitable for pointer member of struct case.
>
> Once the vmap area is handed over for cleaning up the vmap block is gone
> and even if you let it stay around then the vmap area does not have any
> information where to find the block.
>
> You'd need to have a pointer to the vmap block in vmap area or embed
> vmap area into vmap block.
Got it now. Embedding vmap_area into vmap_block seems not feasible
because va need be reused when inserting into free_vmap_area_root/list.
Adding a pointer to vmap_block looks do-able.
Since vm_map_ram area doesn't have vm_struct associated with it, we can
reuse the space of '->vm' to add vb pointer like below. Since in the
existing code there are places where we use 'if(!va->vm)' to check if
it's a normal va, we need be careful to find all of them out and replace
with new and tighter checking. Will give a draft code change after all
is done and testing is passed.
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c720be70c8dd..e2ba6d59d679 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -15,6 +15,7 @@
struct vm_area_struct; /* vma defining user mapping in mm_types.h */
struct notifier_block; /* in notifier.h */
struct iov_iter; /* in uio.h */
+struct vmap_block; /* in mm/vmalloc.c */
/* bits in flags of vmalloc's vm_struct below */
#define VM_IOREMAP 0x00000001 /* ioremap() and friends */
@@ -76,6 +77,7 @@ struct vmap_area {
union {
unsigned long subtree_max_size; /* in "free" tree */
struct vm_struct *vm; /* in "busy" tree */
+ struct vmap_block *vb; /* in "busy and purge" tree */
};
unsigned long flags; /* mark type of vm_map_ram area */
};
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index c0f80982eb06..d97343271e27 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2061,6 +2061,10 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
return ERR_PTR(err);
}
+ spin_lock(&vmap_area_lock);
+ va->vb = vb;
+ spin_unlock(&vmap_area_lock);
+
vbq = raw_cpu_ptr(&vmap_block_queue);
spin_lock(&vbq->lock);
list_add_tail_rcu(&vb->free_list, &vbq->free);
next prev parent reply other threads:[~2023-05-23 9:36 UTC|newest]
Thread overview: 150+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-15 16:43 Excessive TLB flush ranges Thomas Gleixner
2023-05-15 16:43 ` Thomas Gleixner
2023-05-15 16:59 ` Russell King (Oracle)
2023-05-15 16:59 ` Russell King (Oracle)
2023-05-15 19:46 ` Thomas Gleixner
2023-05-15 19:46 ` Thomas Gleixner
2023-05-15 21:11 ` Thomas Gleixner
2023-05-15 21:11 ` Thomas Gleixner
2023-05-15 21:31 ` Russell King (Oracle)
2023-05-15 21:31 ` Russell King (Oracle)
2023-05-16 6:37 ` Thomas Gleixner
2023-05-16 6:37 ` Thomas Gleixner
2023-05-16 6:46 ` Thomas Gleixner
2023-05-16 6:46 ` Thomas Gleixner
2023-05-16 8:18 ` Thomas Gleixner
2023-05-16 8:18 ` Thomas Gleixner
2023-05-16 8:20 ` Thomas Gleixner
2023-05-16 8:20 ` Thomas Gleixner
2023-05-16 8:27 ` Russell King (Oracle)
2023-05-16 8:27 ` Russell King (Oracle)
2023-05-16 9:03 ` Thomas Gleixner
2023-05-16 9:03 ` Thomas Gleixner
2023-05-16 10:05 ` Baoquan He
2023-05-16 10:05 ` Baoquan He
2023-05-16 14:21 ` Thomas Gleixner
2023-05-16 14:21 ` Thomas Gleixner
2023-05-16 19:03 ` Thomas Gleixner
2023-05-16 19:03 ` Thomas Gleixner
2023-05-17 9:38 ` Thomas Gleixner
2023-05-17 9:38 ` Thomas Gleixner
2023-05-17 10:52 ` Baoquan He
2023-05-17 10:52 ` Baoquan He
2023-05-19 11:22 ` Thomas Gleixner
2023-05-19 11:22 ` Thomas Gleixner
2023-05-19 11:49 ` Baoquan He
2023-05-19 11:49 ` Baoquan He
2023-05-19 14:13 ` Thomas Gleixner
2023-05-19 14:13 ` Thomas Gleixner
2023-05-19 12:01 ` [RFC PATCH 1/3] mm/vmalloc.c: try to flush vmap_area one by one Baoquan He
2023-05-19 12:01 ` Baoquan He
2023-05-19 14:16 ` Thomas Gleixner
2023-05-19 14:16 ` Thomas Gleixner
2023-05-19 12:02 ` [RFC PATCH 2/3] mm/vmalloc.c: Only flush VM_FLUSH_RESET_PERMS area immediately Baoquan He
2023-05-19 12:02 ` Baoquan He
2023-05-19 12:03 ` [RFC PATCH 3/3] mm/vmalloc.c: change _vm_unmap_aliases() to do purge firstly Baoquan He
2023-05-19 12:03 ` Baoquan He
2023-05-19 14:17 ` Thomas Gleixner
2023-05-19 14:17 ` Thomas Gleixner
2023-05-19 18:38 ` Thomas Gleixner
2023-05-19 18:38 ` Thomas Gleixner
2023-05-19 23:46 ` Baoquan He
2023-05-19 23:46 ` Baoquan He
2023-05-21 23:10 ` Thomas Gleixner
2023-05-21 23:10 ` Thomas Gleixner
2023-05-22 11:21 ` Baoquan He
2023-05-22 11:21 ` Baoquan He
2023-05-22 12:02 ` Thomas Gleixner
2023-05-22 12:02 ` Thomas Gleixner
2023-05-22 14:34 ` Baoquan He
2023-05-22 14:34 ` Baoquan He
2023-05-22 20:21 ` Thomas Gleixner
2023-05-22 20:21 ` Thomas Gleixner
2023-05-22 20:44 ` Thomas Gleixner
2023-05-22 20:44 ` Thomas Gleixner
2023-05-23 9:35 ` Baoquan He [this message]
2023-05-23 9:35 ` Baoquan He
2023-05-19 13:49 ` Excessive TLB flush ranges Thomas Gleixner
2023-05-19 13:49 ` Thomas Gleixner
2023-05-16 8:21 ` Russell King (Oracle)
2023-05-16 8:21 ` Russell King (Oracle)
2023-05-16 8:19 ` Russell King (Oracle)
2023-05-16 8:19 ` Russell King (Oracle)
2023-05-16 8:44 ` Thomas Gleixner
2023-05-16 8:44 ` Thomas Gleixner
2023-05-16 8:48 ` Russell King (Oracle)
2023-05-16 8:48 ` Russell King (Oracle)
2023-05-16 12:09 ` Thomas Gleixner
2023-05-16 12:09 ` Thomas Gleixner
2023-05-16 13:42 ` Uladzislau Rezki
2023-05-16 13:42 ` Uladzislau Rezki
2023-05-16 14:38 ` Thomas Gleixner
2023-05-16 14:38 ` Thomas Gleixner
2023-05-16 15:01 ` Uladzislau Rezki
2023-05-16 15:01 ` Uladzislau Rezki
2023-05-16 17:04 ` Thomas Gleixner
2023-05-16 17:04 ` Thomas Gleixner
2023-05-17 11:26 ` Uladzislau Rezki
2023-05-17 11:26 ` Uladzislau Rezki
2023-05-17 11:58 ` Thomas Gleixner
2023-05-17 11:58 ` Thomas Gleixner
2023-05-17 12:15 ` Uladzislau Rezki
2023-05-17 12:15 ` Uladzislau Rezki
2023-05-17 16:32 ` Thomas Gleixner
2023-05-17 16:32 ` Thomas Gleixner
2023-05-19 10:01 ` Uladzislau Rezki
2023-05-19 10:01 ` Uladzislau Rezki
2023-05-19 14:56 ` Thomas Gleixner
2023-05-19 14:56 ` Thomas Gleixner
2023-05-19 15:14 ` Uladzislau Rezki
2023-05-19 15:14 ` Uladzislau Rezki
2023-05-19 16:32 ` Thomas Gleixner
2023-05-19 16:32 ` Thomas Gleixner
2023-05-19 17:02 ` Uladzislau Rezki
2023-05-19 17:02 ` Uladzislau Rezki
2023-05-16 17:56 ` Nadav Amit
2023-05-16 17:56 ` Nadav Amit
2023-05-16 19:32 ` Thomas Gleixner
2023-05-16 19:32 ` Thomas Gleixner
2023-05-17 0:23 ` Thomas Gleixner
2023-05-17 0:23 ` Thomas Gleixner
2023-05-17 1:23 ` Nadav Amit
2023-05-17 1:23 ` Nadav Amit
2023-05-17 10:31 ` Thomas Gleixner
2023-05-17 10:31 ` Thomas Gleixner
2023-05-17 11:47 ` Thomas Gleixner
2023-05-17 11:47 ` Thomas Gleixner
2023-05-17 22:41 ` Nadav Amit
2023-05-17 22:41 ` Nadav Amit
2023-05-17 14:43 ` Mark Rutland
2023-05-17 14:43 ` Mark Rutland
2023-05-17 16:41 ` Thomas Gleixner
2023-05-17 16:41 ` Thomas Gleixner
2023-05-17 22:57 ` Nadav Amit
2023-05-17 22:57 ` Nadav Amit
2023-05-19 11:49 ` Thomas Gleixner
2023-05-19 11:49 ` Thomas Gleixner
2023-05-17 12:12 ` Russell King (Oracle)
2023-05-17 12:12 ` Russell King (Oracle)
2023-05-17 23:14 ` Nadav Amit
2023-05-17 23:14 ` Nadav Amit
2023-05-15 18:17 ` Uladzislau Rezki
2023-05-15 18:17 ` Uladzislau Rezki
2023-05-16 2:26 ` Baoquan He
2023-05-16 2:26 ` Baoquan He
2023-05-16 6:40 ` Thomas Gleixner
2023-05-16 6:40 ` Thomas Gleixner
2023-05-16 8:07 ` Baoquan He
2023-05-16 8:07 ` Baoquan He
2023-05-16 8:10 ` Baoquan He
2023-05-16 8:10 ` Baoquan He
2023-05-16 8:45 ` Russell King (Oracle)
2023-05-16 8:45 ` Russell King (Oracle)
2023-05-16 9:13 ` Thomas Gleixner
2023-05-16 9:13 ` Thomas Gleixner
2023-05-16 8:54 ` Thomas Gleixner
2023-05-16 8:54 ` Thomas Gleixner
2023-05-16 9:48 ` Baoquan He
2023-05-16 9:48 ` Baoquan He
2023-05-15 20:02 ` Nadav Amit
2023-05-15 20:02 ` Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZGyI4mXc+F2gVveR@MiWiFi-R3L-srv \
--to=bhe@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hch@lst.de \
--cc=jogness@linutronix.de \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=linux@armlinux.org.uk \
--cc=lstoakes@gmail.com \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=nadav.amit@gmail.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.