* [LSF/MM TOPIC] Matthew's minor MM topics
@ 2018-01-16 14:13 Matthew Wilcox
2018-01-23 12:26 ` [Lsf-pc] " Michal Hocko
0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2018-01-16 14:13 UTC (permalink / raw)
To: lsf-pc; +Cc: linux-mm
(trying again with the right MM mailing list address. Sorry.)
I have a number of things I'd like to discuss that are purely MM related.
I don't know if any of them rise to the level of an entire session,
but maybe lightning talks, or maybe we can dispose of them on the list
before the summit.
1. GFP_DMA / GFP_HIGHMEM / GFP_DMA32
The documentation is clear that only one of these three bits is allowed
to be set. Indeed, we have code that checks that only one of these
three bits is set. So why do we have three bits? Surely this encoding
works better:
00b (normal)
01b GFP_DMA
10b GFP_DMA32
11b GFP_HIGHMEM
(or some other clever encoding that maps well to the zone_type index)
2. kvzalloc_ab_c()
We could bikeshed on this name all summit long, but the idea is to provide
an equivalent of kvmalloc_array() which works for array-plus-header.
These allocations are legion throughout the kernel. Here's the first
one I found with a grep:
drivers/vhost/vhost.c: newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
... and, yep, that one's a security hole.
The implementation is not hard, viz:
+static inline void *kvzalloc_ab_c(size_t n, size_t size, size_t c, gfp_t flags)
+{
+ if (size != 0 && n > (SIZE_MAX - c) / size)
+ return NULL;
+
+ return kvmalloc(n * size + c, flags);
+}
but the name will tie us in knots and getting people to actually use
it will be worse. (I actually stole the name from another project,
but I can't find it now).
We also need to go through and convert dozens of callers that are
doing kvzalloc(a * b) into kvzalloc_array(a, b). Maybe we can ask for
some coccinelle / smatch / checkpatch help here.
3. Maybe we could rename kvfree() to just free()? Please? There's
nothing special about it. One fewer thing for somebody to learn when
coming fresh to kernel programming.
4. vmf_insert_(page|pfn|mixed|...)
vm_insert_foo are invariably called from fault handlers, usually as
the last thing we do before returning a VM_FAULT code. As such, why do
they return an errno that has to be translated? We would be better off
returning VM_FAULT codes from these functions.
Related, I'd like to introduce a new vm_fault_t typedef for unsigned
int that indicates that the function returns VM_FAULT flags rather than
an errno. We've had so many mistakes in this area.
----- End forwarded message -----
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] Matthew's minor MM topics
2018-01-16 14:13 [LSF/MM TOPIC] Matthew's minor MM topics Matthew Wilcox
@ 2018-01-23 12:26 ` Michal Hocko
2018-01-23 20:48 ` Matthew Wilcox
2018-01-29 12:37 ` Matthew Wilcox
0 siblings, 2 replies; 6+ messages in thread
From: Michal Hocko @ 2018-01-23 12:26 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: lsf-pc, linux-mm
On Tue 16-01-18 06:13:54, Matthew Wilcox wrote:
> (trying again with the right MM mailing list address. Sorry.)
>
> I have a number of things I'd like to discuss that are purely MM related.
> I don't know if any of them rise to the level of an entire session,
> but maybe lightning talks, or maybe we can dispose of them on the list
> before the summit.
>
> 1. GFP_DMA / GFP_HIGHMEM / GFP_DMA32
>
> The documentation is clear that only one of these three bits is allowed
> to be set. Indeed, we have code that checks that only one of these
> three bits is set. So why do we have three bits? Surely this encoding
> works better:
>
> 00b (normal)
> 01b GFP_DMA
> 10b GFP_DMA32
> 11b GFP_HIGHMEM
> (or some other clever encoding that maps well to the zone_type index)
Didn't you forget about movable zone? Anyway, if you can simplify this
thing I would be more than happy. GFP_ZONE_TABLE makes my head spin
anytime I dare to look.
> 2. kvzalloc_ab_c()
>
> We could bikeshed on this name all summit long, but the idea is to provide
> an equivalent of kvmalloc_array() which works for array-plus-header.
> These allocations are legion throughout the kernel. Here's the first
> one I found with a grep:
>
> drivers/vhost/vhost.c: newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
>
> ... and, yep, that one's a security hole.
>
> The implementation is not hard, viz:
>
> +static inline void *kvzalloc_ab_c(size_t n, size_t size, size_t c, gfp_t flags)
> +{
> + if (size != 0 && n > (SIZE_MAX - c) / size)
> + return NULL;
> +
> + return kvmalloc(n * size + c, flags);
> +}
>
> but the name will tie us in knots and getting people to actually use
> it will be worse. (I actually stole the name from another project,
> but I can't find it now).
>
> We also need to go through and convert dozens of callers that are
> doing kvzalloc(a * b) into kvzalloc_array(a, b). Maybe we can ask for
> some coccinelle / smatch / checkpatch help here.
I do not see anything controversial here. Is there anything to be
discussed here? If there is a common pattern then a helper shouldn't be
a big deal, no?
> 3. Maybe we could rename kvfree() to just free()? Please? There's
> nothing special about it. One fewer thing for somebody to learn when
> coming fresh to kernel programming.
I guess one has to learn about kvmalloc already and kvfree is nicely
symmetric to it.
> 4. vmf_insert_(page|pfn|mixed|...)
>
> vm_insert_foo are invariably called from fault handlers, usually as
> the last thing we do before returning a VM_FAULT code. As such, why do
> they return an errno that has to be translated? We would be better off
> returning VM_FAULT codes from these functions.
Which tree are you looking at? git grep vmf_insert_ doesn't show much.
vmf_insert_pfn_p[mu]d and those already return VM_FAULT error code from
a quick look.
> Related, I'd like to introduce a new vm_fault_t typedef for unsigned
> int that indicates that the function returns VM_FAULT flags rather than
> an errno. We've had so many mistakes in this area.
This sounds like a reasonable thing to do.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] Matthew's minor MM topics
2018-01-23 12:26 ` [Lsf-pc] " Michal Hocko
@ 2018-01-23 20:48 ` Matthew Wilcox
2018-01-24 8:56 ` Michal Hocko
2018-01-29 12:37 ` Matthew Wilcox
1 sibling, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2018-01-23 20:48 UTC (permalink / raw)
To: Michal Hocko; +Cc: lsf-pc, linux-mm
On Tue, Jan 23, 2018 at 01:26:46PM +0100, Michal Hocko wrote:
> On Tue 16-01-18 06:13:54, Matthew Wilcox wrote:
> > 1. GFP_DMA / GFP_HIGHMEM / GFP_DMA32
> >
> > The documentation is clear that only one of these three bits is allowed
> > to be set. Indeed, we have code that checks that only one of these
> > three bits is set. So why do we have three bits? Surely this encoding
> > works better:
> >
> > 00b (normal)
> > 01b GFP_DMA
> > 10b GFP_DMA32
> > 11b GFP_HIGHMEM
> > (or some other clever encoding that maps well to the zone_type index)
>
> Didn't you forget about movable zone? Anyway, if you can simplify this
> thing I would be more than happy. GFP_ZONE_TABLE makes my head spin
> anytime I dare to look.
I didn't *forget* about it, exactly. I just didn't include it because
(as I understand it), it's legitimate to ask for GFP_DMA | GFP_MOVABLE.
To quote:
* The zone fallback order is MOVABLE=>HIGHMEM=>NORMAL=>DMA32=>DMA.
* But GFP_MOVABLE is not only a zone specifier but also an allocation
* policy. Therefore __GFP_MOVABLE plus another zone selector is valid.
* Only 1 bit of the lowest 3 bits (DMA,DMA32,HIGHMEM) can be set to "1".
I don't understand this, personally. I assumed it made sense to someone,
but if we can collapse GFP_MOVABLE into this and just use the bottom three
bits as the zone number, then that would be an even better cleanup.
> > 2. kvzalloc_ab_c()
> >
> > We also need to go through and convert dozens of callers that are
> > doing kvzalloc(a * b) into kvzalloc_array(a, b). Maybe we can ask for
> > some coccinelle / smatch / checkpatch help here.
>
> I do not see anything controversial here. Is there anything to be
> discussed here? If there is a common pattern then a helper shouldn't be
> a big deal, no?
Good! Let's try and dispose of this quickly.
> > 4. vmf_insert_(page|pfn|mixed|...)
> >
> > vm_insert_foo are invariably called from fault handlers, usually as
> > the last thing we do before returning a VM_FAULT code. As such, why do
> > they return an errno that has to be translated? We would be better off
> > returning VM_FAULT codes from these functions.
>
> Which tree are you looking at? git grep vmf_insert_ doesn't show much.
> vmf_insert_pfn_p[mu]d and those already return VM_FAULT error code from
> a quick look.
I didn't explain this well. Today, vm_insert_page() returns -EFAULT,
-EINVAL, -ENOMEM or -EBUSY. I'd like to see it replaced with a new
function called vmf_insert_page() which returns a vm_fault_t rather
than have every caller of vm_insert_page() convert that errno into
a vm_fault_t.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] Matthew's minor MM topics
2018-01-23 20:48 ` Matthew Wilcox
@ 2018-01-24 8:56 ` Michal Hocko
0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2018-01-24 8:56 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: lsf-pc, linux-mm
On Tue 23-01-18 12:48:27, Matthew Wilcox wrote:
> On Tue, Jan 23, 2018 at 01:26:46PM +0100, Michal Hocko wrote:
> > On Tue 16-01-18 06:13:54, Matthew Wilcox wrote:
> > > 1. GFP_DMA / GFP_HIGHMEM / GFP_DMA32
> > >
> > > The documentation is clear that only one of these three bits is allowed
> > > to be set. Indeed, we have code that checks that only one of these
> > > three bits is set. So why do we have three bits? Surely this encoding
> > > works better:
> > >
> > > 00b (normal)
> > > 01b GFP_DMA
> > > 10b GFP_DMA32
> > > 11b GFP_HIGHMEM
> > > (or some other clever encoding that maps well to the zone_type index)
> >
> > Didn't you forget about movable zone? Anyway, if you can simplify this
> > thing I would be more than happy. GFP_ZONE_TABLE makes my head spin
> > anytime I dare to look.
>
> I didn't *forget* about it, exactly. I just didn't include it because
> (as I understand it), it's legitimate to ask for GFP_DMA | GFP_MOVABLE.
> To quote:
>
> * The zone fallback order is MOVABLE=>HIGHMEM=>NORMAL=>DMA32=>DMA.
> * But GFP_MOVABLE is not only a zone specifier but also an allocation
> * policy. Therefore __GFP_MOVABLE plus another zone selector is valid.
> * Only 1 bit of the lowest 3 bits (DMA,DMA32,HIGHMEM) can be set to "1".
>
> I don't understand this, personally. I assumed it made sense to someone,
> but if we can collapse GFP_MOVABLE into this and just use the bottom three
> bits as the zone number, then that would be an even better cleanup.
Well, I always forget details because MOVABLE zone and its gfp flag are
just very special... If you can make it simple I am all for it. I am
just worried that this will be hard to discuss because most people just
do not have that code cached. So if you have some code to show and
discuss then why not I suspect discussing it on the mailing list might
result to be much more productive .
[...]
> > > 4. vmf_insert_(page|pfn|mixed|...)
> > >
> > > vm_insert_foo are invariably called from fault handlers, usually as
> > > the last thing we do before returning a VM_FAULT code. As such, why do
> > > they return an errno that has to be translated? We would be better off
> > > returning VM_FAULT codes from these functions.
> >
> > Which tree are you looking at? git grep vmf_insert_ doesn't show much.
> > vmf_insert_pfn_p[mu]d and those already return VM_FAULT error code from
> > a quick look.
>
> I didn't explain this well. Today, vm_insert_page() returns -EFAULT,
> -EINVAL, -ENOMEM or -EBUSY. I'd like to see it replaced with a new
> function called vmf_insert_page() which returns a vm_fault_t rather
> than have every caller of vm_insert_page() convert that errno into
> a vm_fault_t.
makes sense to me
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] Matthew's minor MM topics
2018-01-23 12:26 ` [Lsf-pc] " Michal Hocko
2018-01-23 20:48 ` Matthew Wilcox
@ 2018-01-29 12:37 ` Matthew Wilcox
2018-01-29 12:48 ` Michal Hocko
1 sibling, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2018-01-29 12:37 UTC (permalink / raw)
To: Michal Hocko; +Cc: lsf-pc, linux-mm
On Tue, Jan 23, 2018 at 01:26:46PM +0100, Michal Hocko wrote:
> On Tue 16-01-18 06:13:54, Matthew Wilcox wrote:
> > 3. Maybe we could rename kvfree() to just free()? Please? There's
> > nothing special about it. One fewer thing for somebody to learn when
> > coming fresh to kernel programming.
>
> I guess one has to learn about kvmalloc already and kvfree is nicely
> symmetric to it.
I'd really like to get to:
#define malloc(sz) kvmalloc(sz, GFP_KERNEL)
#define free(p) kvfree(p)
#define realloc(p, sz) kvrealloc(p, sz, GFP_KERNEL) /* Doesn't exist yet */
#define calloc(n, sz) kvmalloc_array(n, sz, GFP_KERNEL)
... or similar. I wouldn't be surprised if we currently spend more I$
marshalling arguments for kvmalloc than we would spend exporting a new
malloc() function.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] Matthew's minor MM topics
2018-01-29 12:37 ` Matthew Wilcox
@ 2018-01-29 12:48 ` Michal Hocko
0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2018-01-29 12:48 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: lsf-pc, linux-mm
On Mon 29-01-18 04:37:45, Matthew Wilcox wrote:
> On Tue, Jan 23, 2018 at 01:26:46PM +0100, Michal Hocko wrote:
> > On Tue 16-01-18 06:13:54, Matthew Wilcox wrote:
> > > 3. Maybe we could rename kvfree() to just free()? Please? There's
> > > nothing special about it. One fewer thing for somebody to learn when
> > > coming fresh to kernel programming.
> >
> > I guess one has to learn about kvmalloc already and kvfree is nicely
> > symmetric to it.
>
> I'd really like to get to:
>
> #define malloc(sz) kvmalloc(sz, GFP_KERNEL)
> #define free(p) kvfree(p)
> #define realloc(p, sz) kvrealloc(p, sz, GFP_KERNEL) /* Doesn't exist yet */
> #define calloc(n, sz) kvmalloc_array(n, sz, GFP_KERNEL)
Considering how many users we already have this is not really feasible.
I am also not sure we really want to mimic the userspace API. It is just
different with different consequences. You do not want to hide GFP
flags because this turned out to be just a bad idea in the past. Just
look at pte allocation functions which are unconditioanlly GFP_KERNEL
and all the pain that resulted in. We really want people to learn the
APIs and understand their limitations.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-01-29 12:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-16 14:13 [LSF/MM TOPIC] Matthew's minor MM topics Matthew Wilcox
2018-01-23 12:26 ` [Lsf-pc] " Michal Hocko
2018-01-23 20:48 ` Matthew Wilcox
2018-01-24 8:56 ` Michal Hocko
2018-01-29 12:37 ` Matthew Wilcox
2018-01-29 12:48 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).