* *alloc API changes @ 2018-05-05 1:08 Kees Cook 2018-05-05 3:46 ` Matthew Wilcox 0 siblings, 1 reply; 13+ messages in thread From: Kees Cook @ 2018-05-05 1:08 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Linux-MM, LKML, Rasmus Villemoes Hi, After writing up this email, I think I don't like this idea, but I'm still curious to see what people think of the general observations... The number of permutations for our various allocation function is rather huge. Currently, it is: system or wrapper: kmem_cache_alloc, kmalloc, vmalloc, kvmalloc, devm_kmalloc, dma_alloc_coherent, pci_alloc_consistent, kmem_alloc, f2fs_kvalloc, and probably others I haven't found yet. allocation method (not all available in all APIs): regular (kmalloc), zeroed (kzalloc), array (kmalloc_array), zeroed array (kcalloc) with or without node argument: +_node so, for example, we have things like: kmalloc_array_node() or pci_zalloc_consistent() Using some attempts at static analysis with git grep and coccinelle, nearly all of these take GFP flags, but something near 80% of functions are using GFP_KERNEL. Roughly half of the callers are including a sizeof(). Only 20% are using zeroing, 15% are using products as a size, and 3% are using ..._node, etc. I wonder if we might be able to rearrange our APIs for the general case and include a common "kitchen sink" API for the less common options. I.e. why do we have an entire set of _node() APIs, 2 sets for zeroing (kzalloc, kcalloc), etc. kmalloc()-family was meant to be a simplification of kmem_cache_alloc(). vmalloc() duplicated the kmalloc()-family, and kvmalloc() does too. Then we have "specialty" allocators (devm, dma, pci, f2fs, xfs's kmem) that have subsets and want to perform other actions around the base allocators or have their own entirely (e.g. dma). Instead of all the variations, it seems like we just want a per-family alloc() and alloc_attrs(), where alloc_attrs() could handle the less common stuff (e.g. gfp, zero, node). kmalloc(size, GFP_KERNEL) becomes a nice: kmalloc(size) However, then kzalloc(size, GFP_KERNEL) becomes the ugly: kmalloc_attrs(size, GFP_KERNEL, ALLOC_ZERO, NUMA_NO_NODE); (But I guess we could make macro wrappers for zeroing, or node?) But this doesn't solve the multiplication overflow case at all, which is my real goal. Trying to incorporate some of the ideas from other threads, maybe we could have a multiplication helper that would saturate and the allocator would see that as a signal to return NULL? e.g.: inline size_t mult(size_t a, size_t b) { if (b != 0 && a >= SIZE_MAX / b) return SIZE_MAX; return a * b; } (really, this kind of helper should be based on Rasmus's helpers which do correct type handling) void *kmalloc(size_t size) { if (size == SIZE_MAX) return NULL; kmalloc_attrs(size, GFP_KERNEL, ALLOC_NOZERO, NUMA_NO_NODE); } then we get: var = kmalloc(mult(num, sizeof(*var))); we could drop the *calloc(), *zalloc(), and *_array(), leaving only *alloc() and *alloc_attrs() for all the allocator families. I honestly can't tell if this is worse than doing all the family conversions to *calloc() and *_array() for the 1000ish instances of 2-factor products used for size arguments in the *alloc() functions. We could still nest them for the 3-factor ones? var = kmalloc(multi(row, mult(column, sizeof(*var)))); But now we're just pretending to be LISP. And really, I'd like to keep the nicer *alloc_struct() with all its type checking. But then do we do *zalloc_struct(), *alloc_struct_node(), etc, etc? Bleh. C sucks for this. -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-05 1:08 *alloc API changes Kees Cook @ 2018-05-05 3:46 ` Matthew Wilcox 2018-05-05 4:24 ` Kees Cook 2018-05-05 4:30 ` Matthew Wilcox 0 siblings, 2 replies; 13+ messages in thread From: Matthew Wilcox @ 2018-05-05 3:46 UTC (permalink / raw) To: Kees Cook; +Cc: Matthew Wilcox, Linux-MM, LKML, Rasmus Villemoes On Fri, May 04, 2018 at 06:08:23PM -0700, Kees Cook wrote: > The number of permutations for our various allocation function is > rather huge. Currently, it is: > > system or wrapper: > kmem_cache_alloc, kmalloc, vmalloc, kvmalloc, devm_kmalloc, > dma_alloc_coherent, pci_alloc_consistent, kmem_alloc, f2fs_kvalloc, > and probably others I haven't found yet. dma_pool_alloc, page_frag_alloc, gen_pool_alloc, __alloc_bootmem_node, cma_alloc, quicklist_alloc (deprecated), mempool_alloc and if you're counting f2fs_*alloc, there's a metric tonne of *alloc wrappers out there. > allocation method (not all available in all APIs): > regular (kmalloc), zeroed (kzalloc), array (kmalloc_array), zeroed > array (kcalloc) ... other initialiser (kmem_cache_alloc) > I wonder if we might be able to rearrange our APIs for the general > case and include a common "kitchen sink" API for the less common > options. I.e. why do we have an entire set of _node() APIs, 2 sets for > zeroing (kzalloc, kcalloc), etc. I'd love it if we had a common pattern for these things. A regular API appeals to me (I blame those RISC people in my formative years). > kmalloc()-family was meant to be a simplification of > kmem_cache_alloc(). That's a little revisionist ;-) We had kmalloc before we had the slab allocator (kernel 1.2, I think?). But I see your point, and that's certainly how it's implemented these days. > vmalloc() duplicated the kmalloc()-family, and > kvmalloc() does too. Then we have "specialty" allocators (devm, dma, > pci, f2fs, xfs's kmem) that have subsets and want to perform other > actions around the base allocators or have their own entirely (e.g. > dma). > > Instead of all the variations, it seems like we just want a per-family > alloc() and alloc_attrs(), where alloc_attrs() could handle the less > common stuff (e.g. gfp, zero, node). > > kmalloc(size, GFP_KERNEL) > becomes a nice: > kmalloc(size) I got shot down for proposing adding #define malloc(x) kmalloc(x, GFP_KERNEL) on the grounds that driver writers will then use malloc in interrupt context. So I think our base version has to be foo_alloc(size, gfp_t). > But this doesn't solve the multiplication overflow case at all, which > is my real goal. Trying to incorporate some of the ideas from other > threads, maybe we could have a multiplication helper that would > saturate and the allocator would see that as a signal to return NULL? > e.g.: > > inline size_t mult(size_t a, size_t b) > { > if (b != 0 && a >= SIZE_MAX / b) > return SIZE_MAX; > return a * b; > } > (really, this kind of helper should be based on Rasmus's helpers which > do correct type handling) Right, I was thinking: static inline size_t mul_ab(size_t a, size_t b) { #if COMPILER_SUPPORTS_OVERFLOW unsigned long c; if (__builtin_mul_overflow(a, b, &c)) return SIZE_MAX; return c; #else if (b != 0 && a >= SIZE_MAX / b) return SIZE_MAX; return a * b; #endif } > void *kmalloc(size_t size) > { > if (size == SIZE_MAX) > return NULL; > kmalloc_attrs(size, GFP_KERNEL, ALLOC_NOZERO, NUMA_NO_NODE); > } You don't need the size check here. We have the size check buried deep in alloc_pages (look for MAX_ORDER), so kmalloc and then alloc_pages will try a bunch of paths all of which fail before returning NULL. > then we get: > var = kmalloc(mult(num, sizeof(*var))); > > we could drop the *calloc(), *zalloc(), and *_array(), leaving only > *alloc() and *alloc_attrs() for all the allocator families. > > I honestly can't tell if this is worse than doing all the family > conversions to *calloc() and *_array() for the 1000ish instances of > 2-factor products used for size arguments in the *alloc() functions. > We could still nest them for the 3-factor ones? > var = kmalloc(multi(row, mult(column, sizeof(*var)))); > > But now we're just pretending to be LISP. I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest calls to mult(). > And really, I'd like to keep the nicer *alloc_struct() with all its > type checking. But then do we do *zalloc_struct(), > *alloc_struct_node(), etc, etc? Nono, Linus had the better proposal, struct_size(p, member, n). > Bleh. C sucks for this. Ooh, we could instantiate classes and ... yeah, no, not C++. We *could* abuse the C preprocessor to autogenerate every variant, but I hate that because you can't grep for it. One of the problems with having the single-argument foo_alloc be a static inline for foo_alloc_attrs is that you then have to marshall four arguments for the call instead of just one. I would have two exported symbols for each variant. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-05 3:46 ` Matthew Wilcox @ 2018-05-05 4:24 ` Kees Cook 2018-05-07 11:39 ` Matthew Wilcox 2018-05-07 21:41 ` Rasmus Villemoes 2018-05-05 4:30 ` Matthew Wilcox 1 sibling, 2 replies; 13+ messages in thread From: Kees Cook @ 2018-05-05 4:24 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Matthew Wilcox, Linux-MM, LKML, Rasmus Villemoes On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <willy@infradead.org> wrote: > and if you're counting f2fs_*alloc, there's a metric tonne of *alloc > wrappers out there. Yeah. *sob* > That's a little revisionist ;-) We had kmalloc before we had the slab > allocator (kernel 1.2, I think?). But I see your point, and that's > certainly how it's implemented these days. Okay, yes, that's true. I did think of that briefly. :) > I got shot down for proposing adding > #define malloc(x) kmalloc(x, GFP_KERNEL) > on the grounds that driver writers will then use malloc in interrupt > context. So I think our base version has to be foo_alloc(size, gfp_t). Okay, fair enough. > Right, I was thinking: > > static inline size_t mul_ab(size_t a, size_t b) > { > #if COMPILER_SUPPORTS_OVERFLOW > unsigned long c; > if (__builtin_mul_overflow(a, b, &c)) > return SIZE_MAX; > return c; > #else > if (b != 0 && a >= SIZE_MAX / b) > return SIZE_MAX; > return a * b; > #endif > } Rasmus, what do you think of a saturating version of your helpers? The only fear I have with the saturating helpers is that we'll end up using them in places that don't recognize SIZE_MAX. Like, say: size = mul(a, b) + 1; then *poof* size == 0. Now, I'd hope that code would use add(mul(a, b), 1), but still... it makes me nervous. > You don't need the size check here. We have the size check buried deep in > alloc_pages (look for MAX_ORDER), so kmalloc and then alloc_pages will try > a bunch of paths all of which fail before returning NULL. Good point. Though it does kind of creep me out to let a known-bad size float around in the allocator until it decides to reject it. I would think an early: if (unlikely(size == SIZE_MAX)) return NULL; would have virtually no cycle count difference... > I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest > calls to mult(). Agreed. I think having exactly those would cover almost everything, and the two places where a 4-factor product is needed could just nest them. (bikeshed: the very common mul_ab() should just be mul(), IMO.) > Nono, Linus had the better proposal, struct_size(p, member, n). Oh, yes! I totally missed that in the threads. > Ooh, we could instantiate classes and ... yeah, no, not C++. We *could* > abuse the C preprocessor to autogenerate every variant, but I hate that > because you can't grep for it. Right, no. I think if we can ditch *calloc() and _array() by using saturating helpers, we'll have the API in a much better form: kmalloc(foo * bar, GFP_KERNEL); into kmalloc_array(foo, bar, GFP_KERNEL); into kmalloc(mul(foo, bar), GFP_KERNEL); and kmalloc(foo * bar, GFP_KERNEL | __GFP_ZERO); into kzalloc(foo * bar, GFP_KERNEL); into kcalloc(foo, bar, GFP_KERNEL); into kzalloc(mul(foo, bar), GFP_KERNEL); and the fun kzalloc(sizeof(*header) + count * sizeof(*header->element), GFP_KERNEL); into kzalloc(struct_size(header, element, count), GFP_KERNEL); modulo all *alloc* families... ? -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-05 4:24 ` Kees Cook @ 2018-05-07 11:39 ` Matthew Wilcox 2018-05-07 16:03 ` Kees Cook 2018-05-07 21:41 ` Rasmus Villemoes 1 sibling, 1 reply; 13+ messages in thread From: Matthew Wilcox @ 2018-05-07 11:39 UTC (permalink / raw) To: Kees Cook; +Cc: Matthew Wilcox, Linux-MM, LKML, Rasmus Villemoes On Fri, May 04, 2018 at 09:24:56PM -0700, Kees Cook wrote: > On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <willy@infradead.org> wrote: > The only fear I have with the saturating helpers is that we'll end up > using them in places that don't recognize SIZE_MAX. Like, say: > > size = mul(a, b) + 1; > > then *poof* size == 0. Now, I'd hope that code would use add(mul(a, > b), 1), but still... it makes me nervous. That's reasonable. So let's add: #define ALLOC_TOO_BIG (PAGE_SIZE << MAX_ORDER) (there's a presumably somewhat obsolete CONFIG_FORCE_MAX_ZONEORDER on some architectures which allows people to configure MAX_ORDER all the way up to 64. That config option needs to go away, or at least be limited to a much lower value). On x86, that's 4k << 11 = 8MB. On PPC, that might be 64k << 9 == 32MB. Those values should be relatively immune to further arithmetic causing an additional overflow. > Good point. Though it does kind of creep me out to let a known-bad > size float around in the allocator until it decides to reject it. I > would think an early: > > if (unlikely(size == SIZE_MAX)) > return NULL; > > would have virtually no cycle count difference... I don't think it should go in the callers though ... where it goes in the allocator is up to the allocator maintainers ;-) > > I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest > > calls to mult(). > > Agreed. I think having exactly those would cover almost everything, > and the two places where a 4-factor product is needed could just nest > them. (bikeshed: the very common mul_ab() should just be mul(), IMO.) > > > Nono, Linus had the better proposal, struct_size(p, member, n). > > Oh, yes! I totally missed that in the threads. so we're agreed on struct_size(). I think rather than the explicit 'mul', perhaps we should have array_size() and array3_size(). > Right, no. I think if we can ditch *calloc() and _array() by using > saturating helpers, we'll have the API in a much better form: > > kmalloc(foo * bar, GFP_KERNEL); > into > kmalloc_array(foo, bar, GFP_KERNEL); > into > kmalloc(mul(foo, bar), GFP_KERNEL); kmalloc(array_size(foo, bar), GFP_KERNEL); > and the fun > > kzalloc(sizeof(*header) + count * sizeof(*header->element), GFP_KERNEL); > into > kzalloc(struct_size(header, element, count), GFP_KERNEL); > > modulo all *alloc* families... > > ? I think we're broadly in agreement here! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-07 11:39 ` Matthew Wilcox @ 2018-05-07 16:03 ` Kees Cook 2018-05-07 20:19 ` Matthew Wilcox 0 siblings, 1 reply; 13+ messages in thread From: Kees Cook @ 2018-05-07 16:03 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Matthew Wilcox, Linux-MM, LKML, Rasmus Villemoes On Mon, May 7, 2018 at 4:39 AM, Matthew Wilcox <willy@infradead.org> wrote: > On Fri, May 04, 2018 at 09:24:56PM -0700, Kees Cook wrote: >> On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <willy@infradead.org> wrote: >> The only fear I have with the saturating helpers is that we'll end up >> using them in places that don't recognize SIZE_MAX. Like, say: >> >> size = mul(a, b) + 1; >> >> then *poof* size == 0. Now, I'd hope that code would use add(mul(a, >> b), 1), but still... it makes me nervous. > > That's reasonable. So let's add: > > #define ALLOC_TOO_BIG (PAGE_SIZE << MAX_ORDER) > > (there's a presumably somewhat obsolete CONFIG_FORCE_MAX_ZONEORDER on some > architectures which allows people to configure MAX_ORDER all the way up > to 64. That config option needs to go away, or at least be limited to > a much lower value). > > On x86, that's 4k << 11 = 8MB. On PPC, that might be 64k << 9 == 32MB. > Those values should be relatively immune to further arithmetic causing > an additional overflow. But we can do larger than 8MB allocations with vmalloc, can't we? > I don't think it should go in the callers though ... where it goes in > the allocator is up to the allocator maintainers ;-) We need a self-test regardless, so checking that each allocator returns NULL with the saturated value can be done. >> > I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest >> > calls to mult(). >> >> Agreed. I think having exactly those would cover almost everything, >> and the two places where a 4-factor product is needed could just nest >> them. (bikeshed: the very common mul_ab() should just be mul(), IMO.) >> >> > Nono, Linus had the better proposal, struct_size(p, member, n). >> >> Oh, yes! I totally missed that in the threads. > > so we're agreed on struct_size(). I think rather than the explicit 'mul', > perhaps we should have array_size() and array3_size(). I do like the symmetry there. My earlier "what if someone does +1" continues to scratch at my brain, though I think it's likely unimportant: there's no indication (in the name) that these calls saturate. Will someone ever do something crazy like: array_size(a, b) / array_size(c, d) and they can, effectively, a truncated value (if "a, b" saturated and "c, d" didn't...)? >> Right, no. I think if we can ditch *calloc() and _array() by using >> saturating helpers, we'll have the API in a much better form: >> >> kmalloc(foo * bar, GFP_KERNEL); >> into >> kmalloc_array(foo, bar, GFP_KERNEL); >> into >> kmalloc(mul(foo, bar), GFP_KERNEL); > > kmalloc(array_size(foo, bar), GFP_KERNEL); I can't come up with a better name. :P When it was "mul()" I was thinking "smul()" for "saturating multiply". sarray_size() seems ... bonkers. > I think we're broadly in agreement here! Do we want addition helpers? (And division and subtraction?) -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-07 16:03 ` Kees Cook @ 2018-05-07 20:19 ` Matthew Wilcox 2018-05-07 20:27 ` Kees Cook 0 siblings, 1 reply; 13+ messages in thread From: Matthew Wilcox @ 2018-05-07 20:19 UTC (permalink / raw) To: Kees Cook; +Cc: Matthew Wilcox, Linux-MM, LKML, Rasmus Villemoes On Mon, May 07, 2018 at 09:03:54AM -0700, Kees Cook wrote: > On Mon, May 7, 2018 at 4:39 AM, Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, May 04, 2018 at 09:24:56PM -0700, Kees Cook wrote: > >> On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <willy@infradead.org> wrote: > >> The only fear I have with the saturating helpers is that we'll end up > >> using them in places that don't recognize SIZE_MAX. Like, say: > >> > >> size = mul(a, b) + 1; > >> > >> then *poof* size == 0. Now, I'd hope that code would use add(mul(a, > >> b), 1), but still... it makes me nervous. > > > > That's reasonable. So let's add: > > > > #define ALLOC_TOO_BIG (PAGE_SIZE << MAX_ORDER) > > > > (there's a presumably somewhat obsolete CONFIG_FORCE_MAX_ZONEORDER on some > > architectures which allows people to configure MAX_ORDER all the way up > > to 64. That config option needs to go away, or at least be limited to > > a much lower value). > > > > On x86, that's 4k << 11 = 8MB. On PPC, that might be 64k << 9 == 32MB. > > Those values should be relatively immune to further arithmetic causing > > an additional overflow. > > But we can do larger than 8MB allocations with vmalloc, can't we? Yes. And today with kvmalloc. However, I proposed to Linus that kvmalloc() shouldn't allow it -- we should have kvmalloc_large() which would, but kvmalloc wouldn't. He liked that idea, so I'm going with it. There are very, very few places which should need kvmalloc_large. That's one million 8-byte pointers. If you need more than that inside the kernel, you're doing something really damn weird and should do something that looks obviously different. > > I don't think it should go in the callers though ... where it goes in > > the allocator is up to the allocator maintainers ;-) > > We need a self-test regardless, so checking that each allocator > returns NULL with the saturated value can be done. Yes, absolutely. > >> > I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest > >> > calls to mult(). > >> > >> Agreed. I think having exactly those would cover almost everything, > >> and the two places where a 4-factor product is needed could just nest > >> them. (bikeshed: the very common mul_ab() should just be mul(), IMO.) > >> > >> > Nono, Linus had the better proposal, struct_size(p, member, n). > >> > >> Oh, yes! I totally missed that in the threads. > > > > so we're agreed on struct_size(). I think rather than the explicit 'mul', > > perhaps we should have array_size() and array3_size(). > > I do like the symmetry there. My earlier "what if someone does +1" > continues to scratch at my brain, though I think it's likely > unimportant: there's no indication (in the name) that these calls > saturate. Will someone ever do something crazy like: array_size(a, b) > / array_size(c, d) and they can, effectively, a truncated value (if > "a, b" saturated and "c, d" didn't...)? Without CPU support for a saturated value, there's no "safe" saturated value. You can always come up with some arithmetic that will bring it back into the valid range. All we can do is go "large enough" and hope. > >> Right, no. I think if we can ditch *calloc() and _array() by using > >> saturating helpers, we'll have the API in a much better form: > >> > >> kmalloc(foo * bar, GFP_KERNEL); > >> into > >> kmalloc_array(foo, bar, GFP_KERNEL); > >> into > >> kmalloc(mul(foo, bar), GFP_KERNEL); > > > > kmalloc(array_size(foo, bar), GFP_KERNEL); > > I can't come up with a better name. :P When it was "mul()" I was > thinking "smul()" for "saturating multiply". sarray_size() seems ... > bonkers. smul would mean 'signed multiply' to me, having read the GCC manual: -- Built-in Function: bool __builtin_smul_overflow (int a, int b, int *res) but I thought of another problem with array_size. We already have ARRAY_SIZE and it means "the number of elements in the array". so ... struct_bytes(), array_bytes(), array3_bytes()? > > I think we're broadly in agreement here! > > Do we want addition helpers? (And division and subtraction?) Keeping our focus on allocations ... do we have plain additions (as opposed to multiply-and-add?) And subtraction? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-07 20:19 ` Matthew Wilcox @ 2018-05-07 20:27 ` Kees Cook 2018-05-07 20:49 ` Matthew Wilcox 2018-05-07 21:48 ` John Johansen 0 siblings, 2 replies; 13+ messages in thread From: Kees Cook @ 2018-05-07 20:27 UTC (permalink / raw) To: Matthew Wilcox, John Johansen Cc: Matthew Wilcox, Linux-MM, LKML, Rasmus Villemoes On Mon, May 7, 2018 at 1:19 PM, Matthew Wilcox <willy@infradead.org> wrote: > On Mon, May 07, 2018 at 09:03:54AM -0700, Kees Cook wrote: >> On Mon, May 7, 2018 at 4:39 AM, Matthew Wilcox <willy@infradead.org> wrote: >> > On Fri, May 04, 2018 at 09:24:56PM -0700, Kees Cook wrote: >> >> On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <willy@infradead.org> wrote: >> >> The only fear I have with the saturating helpers is that we'll end up >> >> using them in places that don't recognize SIZE_MAX. Like, say: >> >> >> >> size = mul(a, b) + 1; >> >> >> >> then *poof* size == 0. Now, I'd hope that code would use add(mul(a, >> >> b), 1), but still... it makes me nervous. >> > >> > That's reasonable. So let's add: >> > >> > #define ALLOC_TOO_BIG (PAGE_SIZE << MAX_ORDER) >> > >> > (there's a presumably somewhat obsolete CONFIG_FORCE_MAX_ZONEORDER on some >> > architectures which allows people to configure MAX_ORDER all the way up >> > to 64. That config option needs to go away, or at least be limited to >> > a much lower value). >> > >> > On x86, that's 4k << 11 = 8MB. On PPC, that might be 64k << 9 == 32MB. >> > Those values should be relatively immune to further arithmetic causing >> > an additional overflow. >> >> But we can do larger than 8MB allocations with vmalloc, can't we? > > Yes. And today with kvmalloc. However, I proposed to Linus that > kvmalloc() shouldn't allow it -- we should have kvmalloc_large() which > would, but kvmalloc wouldn't. He liked that idea, so I'm going with it. How would we handle size calculations for _large? > There are very, very few places which should need kvmalloc_large. > That's one million 8-byte pointers. If you need more than that inside > the kernel, you're doing something really damn weird and should do > something that looks obviously different. I'm CCing John since I remember long ago running into problems loading the AppArmor DFA with kmalloc and switching it to kvmalloc. John, how large can the DFAs for AppArmor get? Would an 8MB limit be a problem? And do we have any large IO or network buffers >8MB? >> > I don't think it should go in the callers though ... where it goes in >> > the allocator is up to the allocator maintainers ;-) >> >> We need a self-test regardless, so checking that each allocator >> returns NULL with the saturated value can be done. > > Yes, absolutely. > >> >> > I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest >> >> > calls to mult(). >> >> >> >> Agreed. I think having exactly those would cover almost everything, >> >> and the two places where a 4-factor product is needed could just nest >> >> them. (bikeshed: the very common mul_ab() should just be mul(), IMO.) >> >> >> >> > Nono, Linus had the better proposal, struct_size(p, member, n). >> >> >> >> Oh, yes! I totally missed that in the threads. >> > >> > so we're agreed on struct_size(). I think rather than the explicit 'mul', >> > perhaps we should have array_size() and array3_size(). >> >> I do like the symmetry there. My earlier "what if someone does +1" >> continues to scratch at my brain, though I think it's likely >> unimportant: there's no indication (in the name) that these calls >> saturate. Will someone ever do something crazy like: array_size(a, b) >> / array_size(c, d) and they can, effectively, a truncated value (if >> "a, b" saturated and "c, d" didn't...)? > > Without CPU support for a saturated value, there's no "safe" saturated > value. You can always come up with some arithmetic that will bring it > back into the valid range. All we can do is go "large enough" and hope. > >> >> Right, no. I think if we can ditch *calloc() and _array() by using >> >> saturating helpers, we'll have the API in a much better form: >> >> >> >> kmalloc(foo * bar, GFP_KERNEL); >> >> into >> >> kmalloc_array(foo, bar, GFP_KERNEL); >> >> into >> >> kmalloc(mul(foo, bar), GFP_KERNEL); >> > >> > kmalloc(array_size(foo, bar), GFP_KERNEL); >> >> I can't come up with a better name. :P When it was "mul()" I was >> thinking "smul()" for "saturating multiply". sarray_size() seems ... >> bonkers. > > smul would mean 'signed multiply' to me, having read the GCC manual: > -- Built-in Function: bool __builtin_smul_overflow (int a, int b, int *res) > > but I thought of another problem with array_size. We already have > ARRAY_SIZE and it means "the number of elements in the array". > > so ... struct_bytes(), array_bytes(), array3_bytes()? Maybe "calc"? struct_calc(), array_calc(), array3_calc()? This has the benefit of actually saying more about what it is doing, rather than its return value... In the end, I don't care. :) >> > I think we're broadly in agreement here! >> >> Do we want addition helpers? (And division and subtraction?) > > Keeping our focus on allocations ... do we have plain additions (as > opposed to multiply-and-add?) And subtraction? All I've seen are just rare "weird" cases of lots of mult/add. Some are way worse than others: http://www.ozlabs.org/~akpm/mmotm/broken-out/exofs-avoid-vla-in-structures.patch Just having the mult/add saturation would be lovely. -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-07 20:27 ` Kees Cook @ 2018-05-07 20:49 ` Matthew Wilcox 2018-05-07 21:15 ` Kees Cook 2018-05-07 21:48 ` John Johansen 1 sibling, 1 reply; 13+ messages in thread From: Matthew Wilcox @ 2018-05-07 20:49 UTC (permalink / raw) To: Kees Cook; +Cc: John Johansen, Matthew Wilcox, Linux-MM, LKML, Rasmus Villemoes On Mon, May 07, 2018 at 01:27:38PM -0700, Kees Cook wrote: > On Mon, May 7, 2018 at 1:19 PM, Matthew Wilcox <willy@infradead.org> wrote: > > Yes. And today with kvmalloc. However, I proposed to Linus that > > kvmalloc() shouldn't allow it -- we should have kvmalloc_large() which > > would, but kvmalloc wouldn't. He liked that idea, so I'm going with it. > > How would we handle size calculations for _large? I'm not sure we should, at least initially. The very few places which need a large kvmalloc really are special and can do their own careful checking. Because, as Linus pointed out, we shouldn't be letting the user ask us to allocate a terabyte of RAM. We should just fail that. let's see how those users pan out, and then see what we can offer in terms of safety. > > There are very, very few places which should need kvmalloc_large. > > That's one million 8-byte pointers. If you need more than that inside > > the kernel, you're doing something really damn weird and should do > > something that looks obviously different. > > I'm CCing John since I remember long ago running into problems loading > the AppArmor DFA with kmalloc and switching it to kvmalloc. John, how > large can the DFAs for AppArmor get? Would an 8MB limit be a problem? Great! Opinions from people who'll use this interface are exceptionally useful. > And do we have any large IO or network buffers >8MB? Not that get allocated with kvmalloc ... because you can't DMA map vmalloc (without doing some unusual contortions). > > but I thought of another problem with array_size. We already have > > ARRAY_SIZE and it means "the number of elements in the array". > > > > so ... struct_bytes(), array_bytes(), array3_bytes()? > > Maybe "calc"? struct_calc(), array_calc(), array3_calc()? This has the > benefit of actually saying more about what it is doing, rather than > its return value... In the end, I don't care. :) I don't have a strong feeling on this either. > > Keeping our focus on allocations ... do we have plain additions (as > > opposed to multiply-and-add?) And subtraction? > > All I've seen are just rare "weird" cases of lots of mult/add. Some > are way worse than others: > http://www.ozlabs.org/~akpm/mmotm/broken-out/exofs-avoid-vla-in-structures.patch > > Just having the mult/add saturation would be lovely. Ow. My brain just oozed out of my ears. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-07 20:49 ` Matthew Wilcox @ 2018-05-07 21:15 ` Kees Cook 0 siblings, 0 replies; 13+ messages in thread From: Kees Cook @ 2018-05-07 21:15 UTC (permalink / raw) To: Matthew Wilcox Cc: John Johansen, Matthew Wilcox, Linux-MM, LKML, Rasmus Villemoes On Mon, May 7, 2018 at 1:49 PM, Matthew Wilcox <willy@infradead.org> wrote: > On Mon, May 07, 2018 at 01:27:38PM -0700, Kees Cook wrote: >> On Mon, May 7, 2018 at 1:19 PM, Matthew Wilcox <willy@infradead.org> wrote: >> > Yes. And today with kvmalloc. However, I proposed to Linus that >> > kvmalloc() shouldn't allow it -- we should have kvmalloc_large() which >> > would, but kvmalloc wouldn't. He liked that idea, so I'm going with it. >> >> How would we handle size calculations for _large? > > I'm not sure we should, at least initially. The very few places which > need a large kvmalloc really are special and can do their own careful > checking. Because, as Linus pointed out, we shouldn't be letting the > user ask us to allocate a terabyte of RAM. We should just fail that. > > let's see how those users pan out, and then see what we can offer in > terms of safety. > >> > There are very, very few places which should need kvmalloc_large. >> > That's one million 8-byte pointers. If you need more than that inside >> > the kernel, you're doing something really damn weird and should do >> > something that looks obviously different. >> >> I'm CCing John since I remember long ago running into problems loading >> the AppArmor DFA with kmalloc and switching it to kvmalloc. John, how >> large can the DFAs for AppArmor get? Would an 8MB limit be a problem? > > Great! Opinions from people who'll use this interface are exceptionally > useful. > >> And do we have any large IO or network buffers >8MB? > > Not that get allocated with kvmalloc ... because you can't DMA map vmalloc > (without doing some unusual contortions). Er, yes, right. I meant for _all_ allocators, though. If 8MB is going to be the new "saturated" value? Maybe I misunderstood? What are you proposing for the code of array_size()? >> > but I thought of another problem with array_size. We already have >> > ARRAY_SIZE and it means "the number of elements in the array". >> > >> > so ... struct_bytes(), array_bytes(), array3_bytes()? >> >> Maybe "calc"? struct_calc(), array_calc(), array3_calc()? This has the >> benefit of actually saying more about what it is doing, rather than >> its return value... In the end, I don't care. :) > > I don't have a strong feeling on this either. I lean ever so slightly towards *_size(). It'll be hard to mix up ARRAY_SIZE() and array_size(), given the parameters. -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-07 20:27 ` Kees Cook 2018-05-07 20:49 ` Matthew Wilcox @ 2018-05-07 21:48 ` John Johansen 1 sibling, 0 replies; 13+ messages in thread From: John Johansen @ 2018-05-07 21:48 UTC (permalink / raw) To: Kees Cook, Matthew Wilcox Cc: Matthew Wilcox, Linux-MM, LKML, Rasmus Villemoes On 05/07/2018 01:27 PM, Kees Cook wrote: > On Mon, May 7, 2018 at 1:19 PM, Matthew Wilcox <willy@infradead.org> wrote: >> On Mon, May 07, 2018 at 09:03:54AM -0700, Kees Cook wrote: >>> On Mon, May 7, 2018 at 4:39 AM, Matthew Wilcox <willy@infradead.org> wrote: >>>> On Fri, May 04, 2018 at 09:24:56PM -0700, Kees Cook wrote: >>>>> On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <willy@infradead.org> wrote: >>>>> The only fear I have with the saturating helpers is that we'll end up >>>>> using them in places that don't recognize SIZE_MAX. Like, say: >>>>> >>>>> size = mul(a, b) + 1; >>>>> >>>>> then *poof* size == 0. Now, I'd hope that code would use add(mul(a, >>>>> b), 1), but still... it makes me nervous. >>>> >>>> That's reasonable. So let's add: >>>> >>>> #define ALLOC_TOO_BIG (PAGE_SIZE << MAX_ORDER) >>>> >>>> (there's a presumably somewhat obsolete CONFIG_FORCE_MAX_ZONEORDER on some >>>> architectures which allows people to configure MAX_ORDER all the way up >>>> to 64. That config option needs to go away, or at least be limited to >>>> a much lower value). >>>> >>>> On x86, that's 4k << 11 = 8MB. On PPC, that might be 64k << 9 == 32MB. >>>> Those values should be relatively immune to further arithmetic causing >>>> an additional overflow. >>> >>> But we can do larger than 8MB allocations with vmalloc, can't we? >> >> Yes. And today with kvmalloc. However, I proposed to Linus that >> kvmalloc() shouldn't allow it -- we should have kvmalloc_large() which >> would, but kvmalloc wouldn't. He liked that idea, so I'm going with it. > > How would we handle size calculations for _large? > >> There are very, very few places which should need kvmalloc_large. >> That's one million 8-byte pointers. If you need more than that inside >> the kernel, you're doing something really damn weird and should do >> something that looks obviously different. > > I'm CCing John since I remember long ago running into problems loading > the AppArmor DFA with kmalloc and switching it to kvmalloc. John, how > large can the DFAs for AppArmor get? Would an 8MB limit be a problem? > theoretically yes, and I have done tests with policy larger than that, but in practice I have never seen it. The largest I have seen in practice is about 1.5MB. The policy container that wraps the dfa, could be larger if if its wrapping multiple policy sets (think pre-loading policy for multiple containers in one go), but we don't do that currently and there is no requirement for that to be handled with a single allocation. We have some improvements coming that will reduce our policy size, and enable it so that we can split some of the larger dfas into multiple allocations so I really don't expect this will be a problem. If it becomes an issue we know the size of the allocation needed and can just have a condition that calls vmalloc_large when needed. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-05 4:24 ` Kees Cook 2018-05-07 11:39 ` Matthew Wilcox @ 2018-05-07 21:41 ` Rasmus Villemoes 2018-05-07 22:56 ` Kees Cook 1 sibling, 1 reply; 13+ messages in thread From: Rasmus Villemoes @ 2018-05-07 21:41 UTC (permalink / raw) To: Kees Cook, Matthew Wilcox; +Cc: Matthew Wilcox, Linux-MM, LKML On 2018-05-05 06:24, Kees Cook wrote: > >> Right, I was thinking: >> >> static inline size_t mul_ab(size_t a, size_t b) >> { >> #if COMPILER_SUPPORTS_OVERFLOW >> unsigned long c; >> if (__builtin_mul_overflow(a, b, &c)) >> return SIZE_MAX; >> return c; >> #else >> if (b != 0 && a >= SIZE_MAX / b) >> return SIZE_MAX; >> return a * b; >> #endif >> } > > Rasmus, what do you think of a saturating version of your helpers? They'd be trivial to implement (using the __type_max that I had to introduce anyway), and I'd prefer they'd have sat_ in their name - not just smult, s would more likely be interpreted as "signed". On that note, I'd want to enforce the sat_ ones are only used for unsigned types, because there's no sane value to saturate to for signed operands. But I don't think we should do the allocation-overflow in terms of saturate-and-rely-on-allocator-rejecting it. I suppose we'll still have the computation done in a static inline (to let the compiler see if one or both operands are constants, and generate code accordingly). If we do static inline void *kmalloc_array(size_t n, size_t size, gfp_t flags) { size_t p; p = sat_mul(n, size); return __kmalloc(p, flags); } with sat_mul being implemented in terms of __builtin_mul_overflow(), gcc will probably waste a temp register on loading SIZE_MAX (or whatever sentinel we use), and do a pipeline-stalling cmov to the register used to pass p. If instead we do static inline void *kmalloc_array(size_t n, size_t size, gfp_t flags) { size_t p; if (check_mul_overflow(n, size, &p)) return NULL; return __kmalloc(p, flags); } we'd not get any extra code in the caller (that is, just a "mul" and "jo", instead of a load-immediate, mul, cmov), because gcc should be smart enough to combine the "return NULL" with the test for NULL which the caller code has, and thus make the jump go directly to the error handling (that error handling is likely itself a jump, but the "jo" will just get redirected to the target of that one). Also, I'd hate to have sat_mul not really saturating to type_max(t), but some large-enough-that-all-underlying-allocators-reject-it. > The only fear I have with the saturating helpers is that we'll end up > using them in places that don't recognize SIZE_MAX. Like, say: > > size = mul(a, b) + 1; > > then *poof* size == 0. Now, I'd hope that code would use add(mul(a, > b), 1), but still... it makes me nervous. > >> You don't need the size check here. We have the size check buried deep in >> alloc_pages (look for MAX_ORDER), so kmalloc and then alloc_pages will try >> a bunch of paths all of which fail before returning NULL. > > Good point. Though it does kind of creep me out to let a known-bad > size float around in the allocator until it decides to reject it. I > would think an early: > > if (unlikely(size == SIZE_MAX)) > return NULL; > > would have virtually no cycle count difference... All allocators still need to reject insane sizes, since those can happen without coming from a multiplication. So sure, some early size > MAX_SANE_SIZE check in the out-of-line functions should be rather cheap, and they most likely already exist in some form. But we don't _have_ to go out of our way to make the multiplication overflow handling depend on those. > > Right, no. I think if we can ditch *calloc() and _array() by using > saturating helpers, we'll have the API in a much better form: > > kmalloc(foo * bar, GFP_KERNEL); > into > kmalloc_array(foo, bar, GFP_KERNEL); > into > kmalloc(mul(foo, bar), GFP_KERNEL); Urgh. Do you want to get completely rid of kmalloc_array() and move the mul() into the call-sites? That obviously necessitates mul returning a big-enough sentinel. I'd hate that. Not just because of the code-gen, but also because of the problem with giving mul() sane semantics that still make it immune to the extra arithmetic that will inevitably be done. There's also the problem with foo and bar having different, possibly signed, types - how should mul() handle those? A nice benefit from having the static inline wrappers taking size_t is that a negative value gets converted to a huge positive value, and then the whole thing overflows. Sure, you can build that into mul() (maybe make that itself a static inline), but then it doesn't really deserve that generic name anymore. > and > > kmalloc(foo * bar, GFP_KERNEL | __GFP_ZERO); > into > kzalloc(foo * bar, GFP_KERNEL); > into > kcalloc(foo, bar, GFP_KERNEL); > into > kzalloc(mul(foo, bar), GFP_KERNEL); Yeah, part of the API mess is just copied from C (malloc vs calloc). We could make it a bit less messy by calling it kzalloc_array, but we have 1700 callers of kcalloc(), so... Rasmus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-07 21:41 ` Rasmus Villemoes @ 2018-05-07 22:56 ` Kees Cook 0 siblings, 0 replies; 13+ messages in thread From: Kees Cook @ 2018-05-07 22:56 UTC (permalink / raw) To: Rasmus Villemoes; +Cc: Matthew Wilcox, Matthew Wilcox, Linux-MM, LKML On Mon, May 7, 2018 at 2:41 PM, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > If instead we do > > static inline void *kmalloc_array(size_t n, size_t size, gfp_t flags) > { > size_t p; > if (check_mul_overflow(n, size, &p)) > return NULL; > return __kmalloc(p, flags); > } > > we'd not get any extra code in the caller (that is, just a "mul" and > "jo", instead of a load-immediate, mul, cmov), because gcc should be > smart enough to combine the "return NULL" with the test for NULL which > the caller code has, and thus make the jump go directly to the error > handling (that error handling is likely itself a jump, but the "jo" will > just get redirected to the target of that one). > > Also, I'd hate to have sat_mul not really saturating to type_max(t), but > some large-enough-that-all-underlying-allocators-reject-it. Yeah, this continues to worry me too. > All allocators still need to reject insane sizes, since those can happen > without coming from a multiplication. So sure, some early size > > MAX_SANE_SIZE check in the out-of-line functions should be rather cheap, > and they most likely already exist in some form. But we don't _have_ to > go out of our way to make the multiplication overflow handling depend on > those. > >> >> Right, no. I think if we can ditch *calloc() and _array() by using >> saturating helpers, we'll have the API in a much better form: >> >> kmalloc(foo * bar, GFP_KERNEL); >> into >> kmalloc_array(foo, bar, GFP_KERNEL); >> into >> kmalloc(mul(foo, bar), GFP_KERNEL); > > Urgh. Do you want to get completely rid of kmalloc_array() and move the > mul() into the call-sites? That obviously necessitates mul returning a > big-enough sentinel. I'd hate that. Not just because of the code-gen, > but also because of the problem with giving mul() sane semantics that > still make it immune to the extra arithmetic that will inevitably be > done. There's also the problem with foo and bar having different, > possibly signed, types - how should mul() handle those? A nice benefit > from having the static inline wrappers taking size_t is that a negative > value gets converted to a huge positive value, and then the whole thing > overflows. Sure, you can build that into mul() (maybe make that itself a > static inline), but then it doesn't really deserve that generic name > anymore. If the struct_size(), array_size(), and array3_size() all take size_t, then I think we gain the same protection, yes? i.e. they are built out of your overflow checking routines. Even though I'm nervous about SIZE_MAX wrapping on other uses, I do agree that doing early rejection in the allocators makes a lot more sense. I think we can have bot the SIZE_MAX early-abort of "something went very wrong", and the internal "I refuse to store that much" that is per-allocator-tuned. >> kmalloc(foo * bar, GFP_KERNEL | __GFP_ZERO); >> into >> kzalloc(foo * bar, GFP_KERNEL); >> into >> kcalloc(foo, bar, GFP_KERNEL); >> into >> kzalloc(mul(foo, bar), GFP_KERNEL); > > Yeah, part of the API mess is just copied from C (malloc vs calloc). We > could make it a bit less messy by calling it kzalloc_array, but we have > 1700 callers of kcalloc(), so... Coccinelle can fix those callers. ;) But we need to make sure we're still generating sane machine code before we do that... -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: *alloc API changes 2018-05-05 3:46 ` Matthew Wilcox 2018-05-05 4:24 ` Kees Cook @ 2018-05-05 4:30 ` Matthew Wilcox 1 sibling, 0 replies; 13+ messages in thread From: Matthew Wilcox @ 2018-05-05 4:30 UTC (permalink / raw) To: Kees Cook; +Cc: Matthew Wilcox, Linux-MM, LKML, Rasmus Villemoes On Fri, May 04, 2018 at 08:46:46PM -0700, Matthew Wilcox wrote: > On Fri, May 04, 2018 at 06:08:23PM -0700, Kees Cook wrote: > > The number of permutations for our various allocation function is > > rather huge. Currently, it is: > > > > system or wrapper: > > kmem_cache_alloc, kmalloc, vmalloc, kvmalloc, devm_kmalloc, > > dma_alloc_coherent, pci_alloc_consistent, kmem_alloc, f2fs_kvalloc, > > and probably others I haven't found yet. > > dma_pool_alloc, page_frag_alloc, gen_pool_alloc, __alloc_bootmem_node, > cma_alloc, quicklist_alloc (deprecated), mempool_alloc > > > allocation method (not all available in all APIs): > > regular (kmalloc), zeroed (kzalloc), array (kmalloc_array), zeroed > > array (kcalloc) > > ... other initialiser (kmem_cache_alloc) I meant to say that we have a shocking dearth of foo_realloc() functions. Instead we have drivers and core parts of the kernel implementing their own stupid slow alloc-a-new-chunk-of-memory-and-memcpy-from-the-old-then- free when the allocator can probably do a better job (eg vmalloc may be able to expand the existing are, and if it can't do that, it can at least remap the underlying pages; the slab allocator may be able to resize without growing, eg if you krealloc from 1200 bytes to 2000 bytes, that's going to come out of the same slab). So, yeah, adding those increases the API permutations even further. And don't ask about what happens if you allocate with GFP_DMA then realloc with GFP_HIGHMEM. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-05-07 22:56 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-05-05 1:08 *alloc API changes Kees Cook 2018-05-05 3:46 ` Matthew Wilcox 2018-05-05 4:24 ` Kees Cook 2018-05-07 11:39 ` Matthew Wilcox 2018-05-07 16:03 ` Kees Cook 2018-05-07 20:19 ` Matthew Wilcox 2018-05-07 20:27 ` Kees Cook 2018-05-07 20:49 ` Matthew Wilcox 2018-05-07 21:15 ` Kees Cook 2018-05-07 21:48 ` John Johansen 2018-05-07 21:41 ` Rasmus Villemoes 2018-05-07 22:56 ` Kees Cook 2018-05-05 4:30 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).