From: Al Viro <viro@ZenIV.linux.org.uk>
To: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] free_pages stuff
Date: Tue, 22 Dec 2015 21:04:35 +0000 [thread overview]
Message-ID: <20151222210435.GB20997@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CAMuHMdUGkVcUOH4VUXiuoa6eGVQEA+QRDEop3GrEOEWz8GeNig@mail.gmail.com>
On Tue, Dec 22, 2015 at 09:21:14AM +0100, Geert Uytterhoeven wrote:
> On Tue, Dec 22, 2015 at 3:22 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >> And if the code really explicitly wants a page (or set of aligned pages)
> >> for some vm reason, I suspect having the cast there isn't a bad thing. It's
> >> clearly not just a random pointer allocation if the bit pattern of the
> >> pointer matters.
> >
> > BTW, I'm not sure we don't have code that would assume that
> > kmalloc(PAGE_SIZE,...) always returns something PAGE_SIZE-aligned.
>
> Yeah, needs-to-be-PAGE_SIZE-aligned is probably one of the main
> reasons of not calling kmalloc().
FWIW, looking through the fs/* uses of __get_free_pages() and its wrappers...
* affs_grow_extcache() - might as well have been kmalloc (and
I'm not sure that fixed PAGE_SIZE is the best size there, actually).
* afs_mntpt_do_automount() - kmalloc().
* bfs_dump_imap() - kmalloc().
* bm_entry_read() - kmalloc(), and might be better off with
single_open-style seqfile.
* ceph_alloc_readdir_reply_buffer() - kmalloc().
* configfs fill_write_buffer() - kmalloc(), and it might be worth
a helper similar to memdup_user() that would allocate size + 1 bytes,
copy size bytes from user and put '\0' after them. There's a lot of
->write() instances open-coding that.
* configfs fill_read_buffer() - kmalloc().
* configfs_follow_link() - kmalloc().
* ext4_calculate_overhead() - kmalloc().
* fuse_follow_link() - kmalloc().
* fuse_do_ioctl() - kmalloc(), and it might as well do more accurate
size calculation
* hfs_mdb_get() - 8Kb kmalloc().
* isofs_readdir() - kmalloc().
* jbd2_alloc() - really wants alignment; might make sense to have
3 more private slabs there (they are using kmem_cache_alloc() for sub-page
allocation, with explicit alignments set when creating those; anything
from 8 pages and up goes to vmalloc()).
* jfs_readdir() - kmalloc().
* jfs lbmLogInit() - alloc_page(). _This_ is one that really wants
struct page (and uses virt_to_page() to get it after get_zeroed_page()).
* kernfs_iop_follow_link() - kmalloc().
* simple_transaction_get() - kmalloc().
* copy_mount_options() - kmalloc().
* nfs_do_submount() - kmalloc().
* nfs_follow_referral() - kmalloc().
* nfs4_replace_transport() - kmalloc().
* nfs_show_devname() - kmalloc().
* nfsd_buffered_readdir() - kmalloc().
* nilfs_ioctl_wrap_copy() - kmalloc() (and I'm not sure that PAGE_SIZE
is the best possible variant there).
* fs/ocfs2/dlm/dlmdebug.c ones - kmalloc(), all of them.
* dlm_alloc_pagevec() - used to allocate hash tables. kmalloc() will
definitely do, but I would consider using a single kmalloc or vmalloc instead
of an array of page-sized blocks. Guaranteed extra dereference on every
hash lookup is potentially painful.
* dlm_migrate_lockres() - kmalloc().
* dlm_request_all_locks_handler() - kmalloc().
* ovl_read_symlink() - kmalloc().
* do_proc_readlink() - kmalloc().
* proc_pid_cmdline_read() - kmalloc().
* proc_pid_attr_write() - memdup_user().
* mem_rw() - kmalloc().
* environ_read() - kmalloc().
* dquot_init() - hash allocation; alloc_large_system_hash(), unless
there's something I'm missing...
* poll_get_entry() - kmalloc(), probably.
* fs/proc/vmcore.c - interesting one; it allocates a buffer to
store elf headers from crashdump, and everything would be simple, expect
for mmap() support in there. Which wants it to be page-aligned and uses
remap_pfn_range() from that sucker.
IOW, there is one place that can't live with kmalloc() due to alignment
requirements (jbd2_alloc()), one place that wants struct page * (in jfs)
and one place that wants page-aligned buffer it will feed to remap_pfn_range().
And 35 places that have no good reason for using __get_free_pages() or
its wrappers.
So at least for fs/* the answer is definitely "almost all places
where we are using page allocator would be better off with something
else".
Documentation/which-allocator-should-I-use might be a good idea... Notes
below are just a skeleton - a lot of details need to be added; in particular,
there should be a part on "I have this kind of address and I want that;
when and how should that be done?", completely missing here. And there
should be a big scary warning along the lines of "this is NOT an invitation
for a flood of checkpatch-inspired patches"...
Comments, corrections and additions would be very welcome.
1) Most of the time kmalloc() is the right thing to use.
Limitations: alignment is no better than word, not available very early in
bootstrap, allocated memory is physically contiguous, so large allocations
are best avoided.
2) kmem_cache_alloc() allows to specify the alignment at cache creation
time. Otherwise it's similar to kmalloc(). Normally it's used for
situations where we have a lot of instances of some type and want dynamic
allocation of those.
3) vmalloc() is for large allocations. They will be page-aligned,
but *not* physically contiguous. OTOH, large physically contiguous
allocations are generally a bad idea. Unlike other allocators, there's
no variant that could be used in interrupt; freeing is possible there,
but allocation is not. Note that non-blocking variant *does* exist -
__vmalloc(size, GFP_ATOMIC, PAGE_KERNEL) can be used in atomic
contexts; it's the interrupt ones that are no-go.
4) if it's very early in bootstrap, alloc_bootmem() and friends
may be the only option. Rule of the thumb: if it's already printed
Memory: ...../..... available.....
you shouldn't be using that one. Allocations are physically contiguous
and at that point large physically contiguous allocations are still OK.
5) if you need to allocate memory for DMA, use dma_alloc_coherent()
and friends. They'll give you both the virtual address for your use
and DMA address refering to the same memory for use by device; do *NOT*
try to derive the latter from the former; use of virt_to_bus() et.al.
is a Bloody Bad Idea(tm).
6) if you need a reference to struct page, use alloc_page/alloc_pages.
7) in some cases (page tables, for the most obvious example), __get_free_page()
and friends might be the right answer. In principle, it's case (6), but
it returns page_address(page) instead of the page itself. Historically that
was the first API introduced, so a _lot_ of places that should've been using
something else ended up using that. Do not assume that being lower level
makes it faster than e.g. kmalloc() - this is simply not true.
next prev parent reply other threads:[~2015-12-22 21:04 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-21 23:46 [RFC] free_pages stuff Al Viro
2015-12-21 23:46 ` [POC][PATCH 01/83] switch free_page() from unsigned long to const void * Al Viro
2015-12-21 23:46 ` [POC][PATCH 02/83] switch free_pages() " Al Viro
2015-12-21 23:46 ` [POC][PATCH 03/83] switch get_zeroed_page() to returning " Al Viro
2015-12-21 23:46 ` [POC][PATCH 04/83] kill unused {get,free}_user_page() Al Viro
2015-12-21 23:46 ` [POC][PATCH 05/83] switch copy_mount_options to storing void * instead of unsigned long Al Viro
2015-12-21 23:46 ` [POC][PATCH 06/83] drivers/net/wireless/libertas/debugfs.c: get rid of pointless casts Al Viro
2015-12-21 23:46 ` [POC][PATCH 07/83] drivers/net/wireless/mwifiex/debugfs.c: " Al Viro
2015-12-21 23:46 ` [POC][PATCH 08/83] affs_evict_inode(): " Al Viro
2015-12-21 23:46 ` [POC][PATCH 09/83] configfs_follow_link(): " Al Viro
2015-12-21 23:46 ` [POC][PATCH 10/83] kernfs_iop_follow_link(): " Al Viro
2015-12-21 23:46 ` [POC][PATCH 11/83] sound/oss/vidc: keep dma_buf[] as pointers Al Viro
2015-12-21 23:46 ` [POC][PATCH 12/83] drivers/tty: get rid of pointless casts Al Viro
2015-12-21 23:46 ` [POC][PATCH 13/83] rds: keep pointers in ->m_page_addrs[] Al Viro
2015-12-21 23:46 ` [POC][PATCH 14/83] proc_dev_atm_read(): get rid of pointless casts Al Viro
2015-12-21 23:46 ` [POC][PATCH 15/83] qib get_map_page(): " Al Viro
2015-12-21 23:46 ` [POC][PATCH 16/83] user_namespace: " Al Viro
2015-12-21 23:46 ` [POC][PATCH 17/83] ftrace: " Al Viro
2015-12-21 23:46 ` [POC][PATCH 18/83] sysctl: " Al Viro
2015-12-21 23:46 ` [POC][PATCH 19/83] xenstored_local_init(): " Al Viro
2015-12-21 23:46 ` [POC][PATCH 20/83] staging/rdma: " Al Viro
2015-12-21 23:46 ` [POC][PATCH 21/83] c6x: remove unused macros Al Viro
2015-12-21 23:46 ` [POC][PATCH 22/83] [davinci] ccdc_update_raw_params() frees the wrong thing Al Viro
2015-12-21 23:46 ` [POC][PATCH 23/83] fd_dma_mem_free(): pass address as void * instead of unsigned long Al Viro
2015-12-21 23:46 ` [POC][PATCH 24/83] ppc: keep ->hpt_virt as a pointer Al Viro
2015-12-21 23:46 ` [POC][PATCH 25/83] dma_4u_alloc_coherent(): don't mix virtual and physical addresses Al Viro
2015-12-21 23:46 ` [POC][PATCH 26/83] dma_4v_alloc_coherent(): " Al Viro
2015-12-21 23:47 ` [POC][PATCH 27/83] new helper: get_dma_pages() Al Viro
2015-12-21 23:47 ` [POC][PATCH 28/83] make fd_dma_mem_alloc/nodma_mem_alloc return a pointer Al Viro
2015-12-21 23:47 ` [POC][PATCH 29/83] switch the remaining users of __get_dma_pages() to get_dma_pages() Al Viro
2015-12-21 23:47 ` [POC][PATCH 30/83] sparc: get rid of pointless casts Al Viro
2015-12-21 23:47 ` [POC][PATCH 31/83] efficeon: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 32/83] lguest: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 33/83] drivers/pci: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 34/83] drivers/s390: ger " Al Viro
2015-12-21 23:47 ` [POC][PATCH 35/83] s390 kvm: get " Al Viro
2015-12-21 23:47 ` [POC][PATCH 36/83] pcibios: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 37/83] ste_dma40: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 38/83] usb_mon: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 39/83] gnttab_end_foreign_access(): switch the last argument to void * Al Viro
2015-12-21 23:47 ` [POC][PATCH 40/83] nios2: dma_free_coherent(): get rid of pointless casts Al Viro
2015-12-21 23:47 ` [POC][PATCH 41/83] hsi: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 42/83] simserial: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 43/83] arm64 vdso: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 44/83] cris free_init_page(): switch to __free_page() Al Viro
2015-12-21 23:47 ` [POC][PATCH 45/83] arm: switch kvm_arm_hyp_stack_page to void * Al Viro
2015-12-21 23:47 ` [POC][PATCH 46/83] frv consistent_alloc(): switch page " Al Viro
2015-12-21 23:47 ` [POC][PATCH 47/83] ia64: uncached_add_chunk(): switch to __free_pages() Al Viro
2015-12-21 23:47 ` [POC][PATCH 48/83] jfs_readdir(): make dirent_buf a pointer Al Viro
2015-12-21 23:47 ` [POC][PATCH 49/83] get rid of casts in alloc_exact stuff Al Viro
2015-12-21 23:47 ` [POC][PATCH 50/83] s390 cmm: get rid of pointless casts Al Viro
2015-12-21 23:47 ` [POC][PATCH 51/83] microblaze consistent_alloc(): " Al Viro
2015-12-21 23:47 ` [POC][PATCH 52/83] devm_{get_}free_pages(): switch to pointers Al Viro
2015-12-21 23:47 ` [POC][PATCH 53/83] cavium: (partially) get rid of cargo-culting in allocator Al Viro
2015-12-21 23:47 ` [POC][PATCH 54/83] um: store stacks as pointers Al Viro
2015-12-21 23:47 ` [POC][PATCH 55/83] sh: get rid of pointless casts Al Viro
2015-12-21 23:47 ` [POC][PATCH 56/83] amd_gart_64: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 57/83] iwlegacy: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 58/83] iwlwifi: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 59/83] add pointer-returning variants of __get_free_pages/__get_free_page Al Viro
2015-12-21 23:47 ` [POC][PATCH 60/83] switch obvious cases to get_free_pages() Al Viro
2015-12-21 23:47 ` [POC][PATCH 61/83] switch obvious cases to get_free_page() Al Viro
2015-12-21 23:47 ` [POC][PATCH 62/83] m68k: switch pte_alloc_one_kernel() " Al Viro
2015-12-21 23:47 ` [POC][PATCH 63/83] switch kvmppc_core_vcpu_create_pr() " Al Viro
2015-12-21 23:47 ` [POC][PATCH 64/83] drivers/video/fbdev: switch to get_free_pages() Al Viro
2015-12-21 23:47 ` [POC][PATCH 65/83] um: switch to get_free_page() Al Viro
2015-12-21 23:47 ` [POC][PATCH 66/83] switch xen_get_swiotlb_free_pages() to returning a pointer Al Viro
2015-12-21 23:47 ` [POC][PATCH 67/83] mn10300 dma_alloc_coherent(): switch to get_free_pages() Al Viro
2015-12-21 23:47 ` [POC][PATCH 68/83] switch ps3_dma_map() and ps3_dma_region_ops->map() instances to physical address Al Viro
2015-12-21 23:47 ` [POC][PATCH 69/83] ps3_alloc_coherent(): get rid of pointless casts Al Viro
2015-12-21 23:47 ` [POC][PATCH 70/83] s390_dma_alloc(): page_to_phys() result is always a multiple of PAGE_SIZE Al Viro
2015-12-21 23:47 ` [POC][PATCH 71/83] s390_dma_alloc(): use page_address() Al Viro
2015-12-21 23:47 ` [POC][PATCH 72/83] [powerpc] switch cmm_page_array->pages[] to pointers Al Viro
2015-12-21 23:47 ` [POC][PATCH 73/83] niu.c: get rid of pointless casts Al Viro
2015-12-21 23:47 ` [POC][PATCH 74/83] [s390] switch pcpu_alloc_lowcore() to get_free_page() Al Viro
2015-12-21 23:47 ` [POC][PATCH 75/83] kill __get_free_page() Al Viro
2015-12-21 23:47 ` [POC][PATCH 76/83] [mips, s390, score] turn empty_zero_page into pointer Al Viro
2015-12-21 23:47 ` [POC][PATCH 77/83] sparc: switch to get_free_pages() Al Viro
2015-12-21 23:47 ` [POC][PATCH 78/83] x86: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 79/83] s390: turn suspend_zero_pages into a pointer Al Viro
2015-12-21 23:47 ` [POC][PATCH 80/83] [sun3] try to sort dvma types out Al Viro
2015-12-21 23:47 ` [POC][PATCH 81/83] sba_iommu: get rid of pointless casts Al Viro
2015-12-21 23:47 ` [POC][PATCH 82/83] media/platform/omap: " Al Viro
2015-12-21 23:47 ` [POC][PATCH 83/83] nios2: " Al Viro
2015-12-22 0:03 ` [RFC] free_pages stuff Linus Torvalds
2015-12-22 1:04 ` Al Viro
[not found] ` <CA+55aFy9NrV_RnziN9z3p5O6rv1A0mirhLD0hL7Wrb77+YyBeg@mail.gmail.com>
2015-12-22 1:23 ` Linus Torvalds
2015-12-22 3:10 ` Al Viro
2015-12-22 2:22 ` Al Viro
2015-12-22 8:21 ` Geert Uytterhoeven
2015-12-22 21:04 ` Al Viro [this message]
2016-01-05 13:59 ` Michal Hocko
2016-01-05 15:26 ` Al Viro
2016-01-05 15:42 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151222210435.GB20997@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=geert@linux-m68k.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox