linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Moyer <jmoyer@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave@sr71.net>, Toshi Kani <toshi.kani@hpe.com>,
	David Airlie <airlied@linux.ie>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dave Chinner <david@fromorbit.com>, Linux MM <linux-mm@kvack.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	kbuild test robot <lkp@intel.com>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>,
	Richard Weinberger <richard@nod.at>, X86 ML <x86@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jeff Dike <jdike@addtoit.com>, Jens Axboe <axboe@fb.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoffer Dall <christoffer.dall@linaro.org>,
	Jan Kara <jack@suse.com>, Paolo Bonzini <pbonzini@redhat.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [-mm PATCH v2 00/25] get_user_pages() for dax pte and pmd mappings
Date: Thu, 10 Dec 2015 14:20:06 -0500	[thread overview]
Message-ID: <x49fuzat8k9.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: <CAPcyv4gfMSW=x=LcZeEqX6hvO39Q2=nyUxq3FwMxaZ6PEGZtMg@mail.gmail.com> (Dan Williams's message of "Thu, 10 Dec 2015 10:56:17 -0800")

Dan Williams <dan.j.williams@intel.com> writes:

> On Thu, Dec 10, 2015 at 10:08 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> Summary:
>>>
>>> To date, we have implemented two I/O usage models for persistent memory,
>>> PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
>>> userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
>>> to be the target of direct-i/o.  It allows userspace to coordinate
>>> DMA/RDMA from/to persistent memory.
>>>
>>> The implementation leverages the ZONE_DEVICE mm-zone that went into
>>> 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
>>> and dynamically mapped by a device driver.  The pmem driver, after
>>> mapping a persistent memory range into the system memmap via
>>> devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
>>> page-backed pmem-pfns via flags in the new pfn_t type.
>>
>> So, this basically means that an admin has to decide whether or not DMA
>> will be used on a given device before making a file system on it.  That
>> seems like an odd requirement.  There's also a configuration option of
>> whether to put those backing struct pages into DRAM or PMEM (which, of
>> course, will be dictated by the size of pmem).  I really think we should
>> reconsider this approach.
>>
>> First, the admin shouldn't have to choose whether or not DMA will be
>> done on the file system.
>
> To be clear it's not "whether or not DMA will be done on the file
> system", it's whether or not both DMA and DAX will be done
> simultaneously on the filesystem.

Fair point, but I'd view one of those configurations as not recommended.
To be clear, if you're just going to use the device for block based
access, using btt is the safer option.

> DAX is already a capability that an admin can inadvertently disable by
> mis-configuring the alignment of a partition [1].

Heh, using my own commit against me? ;-) Anyway, the commit message
suggests that dax *could* be supported on misaligned partitions.

> Why not also disable it when DMA support is not configured and force
> the fs back to page-cache?  Namespace creation tooling in userspace
> can default to enabling DAX + DMA.

Well, the only reason I can come up with is manufactured:  we've forced
the admin to decide between having that extra space for storage and
doing DMA, and he or she opted for more space.

>> Second, eating up storage space to track
>> mostly unused struct pages seems like a waste.  Is there no future for
>> the "introduce __pfn_t, evacuate struct page from sgls"[1] approach?
>> And if not, is there some other way we can solve this problem?
>
> I'm still very much interested in revisiting the page-less mechanisms
> over time, but given comments like Dave's [2], it's not on any short
> term horizon.

OK.

>> I know dynamic allocation of struct pages is scary, but is it more tractable
>> than no pages for DMA?
>
> I wasn't convinced that it would be any better given the need to
> allocate at section granularity at fault time.  It would still require
> ZONE_DEVICE or something similar.  Waiting until get_user_pages() time
> to allocate pages means we don't get __get_user_pages_fast support.
> It also was not clear to that it would prevent exhaustion of DRAM for
> long-standing / large mappings.

Hmm, yeah, this does seem like a less attractive approach.  Thanks for
enumerating the issues.

-Jeff

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-12-10 19:20 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-10  2:37 [-mm PATCH v2 00/25] get_user_pages() for dax pte and pmd mappings Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 01/25] pmem, dax: clean up clear_pmem() Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 02/25] dax: increase granularity of dax_clear_blocks() operations Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 03/25] dax: guarantee page aligned results from bdev_direct_access() Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 04/25] dax: fix lifetime of in-kernel dax mappings with dax_map_atomic() Dan Williams
2015-12-11 18:11   ` [-mm PATCH v3 " Dan Williams
2015-12-17 22:00     ` Ross Zwisler
2015-12-17 22:16       ` Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 05/25] mm, dax: fix livelock, allow dax pmd mappings to become writeable Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 06/25] dax: Split pmd map when fallback on COW Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 07/25] um: kill pfn_t Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 08/25] kvm: rename pfn_t to kvm_pfn_t Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 09/25] mm, dax, pmem: introduce pfn_t Dan Williams
2015-12-11 18:22   ` [-mm PATCH v3 " Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 10/25] mm: introduce find_dev_pagemap() Dan Williams
2015-12-11 18:27   ` [-mm PATCH v3 " Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 11/25] x86, mm: introduce vmem_altmap to augment vmemmap_populate() Dan Williams
2015-12-15 16:50   ` Dan Williams
2015-12-15 23:28   ` Andrew Morton
2015-12-15 23:37     ` Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 12/25] libnvdimm, pfn, pmem: allocate memmap array in persistent memory Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 13/25] avr32: convert to asm-generic/memory_model.h Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 14/25] hugetlb: fix compile error on tile Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 15/25] frv: fix compiler warning from definition of __pmd() Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 16/25] x86, mm: introduce _PAGE_DEVMAP Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 17/25] mm, dax, gpu: convert vm_insert_mixed to pfn_t Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 18/25] mm, dax: convert vmf_insert_pfn_pmd() " Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 19/25] list: introduce list_del_poison() Dan Williams
2015-12-15 23:41   ` Andrew Morton
2015-12-16  0:17     ` Dan Williams
2015-12-10  2:39 ` [-mm PATCH v2 20/25] libnvdimm, pmem: move request_queue allocation earlier in probe Dan Williams
2015-12-10  2:39 ` [-mm PATCH v2 21/25] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup Dan Williams
2015-12-15 23:46   ` Andrew Morton
2015-12-10  2:39 ` [-mm PATCH v2 22/25] mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd Dan Williams
2015-12-10  2:39 ` [-mm PATCH v2 23/25] mm, x86: get_user_pages() for dax mappings Dan Williams
2015-12-16  0:14   ` Andrew Morton
2015-12-16  2:18     ` Dan Williams
2015-12-18  0:09       ` Dan Williams
2015-12-10  2:39 ` [-mm PATCH v2 24/25] dax: provide diagnostics for pmd mapping failures Dan Williams
2015-12-10  2:39 ` [-mm PATCH v2 25/25] dax: re-enable dax pmd mappings Dan Williams
2015-12-10 18:08 ` [-mm PATCH v2 00/25] get_user_pages() for dax pte and " Jeff Moyer
2015-12-10 18:56   ` Dan Williams
2015-12-10 19:20     ` Jeff Moyer [this message]
2015-12-11  2:03       ` Dan Williams
2015-12-14 14:52         ` Jeff Moyer
2015-12-14 16:44           ` Dan Williams
2015-12-11 18:44 ` Dan Williams
2015-12-15  1:59   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x49fuzat8k9.fsf@segfault.boston.devel.redhat.com \
    --to=jmoyer@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@fb.com \
    --cc=christoffer.dall@linaro.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@sr71.net \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=jack@suse.com \
    --cc=jdike@addtoit.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=lkp@intel.com \
    --cc=logang@deltatee.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richard@nod.at \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=toshi.kani@hpe.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).