* [PATCH 0/3] introduce get_user_pages_longterm()
@ 2017-11-07 0:57 Dan Williams
2017-11-07 0:57 ` [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support Dan Williams
0 siblings, 1 reply; 6+ messages in thread
From: Dan Williams @ 2017-11-07 0:57 UTC (permalink / raw)
To: akpm
Cc: Sean Hefty, Jan Kara, linux-rdma, linux-kernel, Doug Ledford,
stable, Hal Rosenstock, Jason Gunthorpe, linux-mm, Jeff Moyer,
Ross Zwisler, Mauro Carvalho Chehab, Christoph Hellwig,
linux-media
Andrew,
Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely. This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).
In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.
Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.
Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.
I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel. The behavior regression this policy change
implies is one of the reasons we maintain the "dax enabled. Warning:
EXPERIMENTAL, use at your own risk" notification when mounting a
filesystem in dax mode.
It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.
---
Dan Williams (3):
mm: introduce get_user_pages_longterm
IB/core: disable memory registration of fileystem-dax vmas
[media] v4l2: disable filesystem-dax mapping support
drivers/infiniband/core/umem.c | 2 -
drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +-
include/linux/mm.h | 3 +
mm/gup.c | 75 +++++++++++++++++++++++++++++
4 files changed, 82 insertions(+), 3 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
2017-11-07 0:57 [PATCH 0/3] introduce get_user_pages_longterm() Dan Williams
@ 2017-11-07 0:57 ` Dan Williams
2017-11-07 8:33 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 6+ messages in thread
From: Dan Williams @ 2017-11-07 0:57 UTC (permalink / raw)
To: akpm
Cc: Jan Kara, linux-kernel, stable, linux-mm, Mauro Carvalho Chehab,
linux-media
V4L2 memory registrations are incompatible with filesystem-dax that
needs the ability to revoke dma access to a mapping at will, or
otherwise allow the kernel to wait for completion of DMA. The
filesystem-dax implementation breaks the traditional solution of
truncate of active file backed mappings since there is no page-cache
page we can orphan to sustain ongoing DMA.
If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.
Reported-by: Jan Kara <jack@suse.cz>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-media@vger.kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 0b5c43f7e020..f412429cf5ba 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
data, size, dma->nr_pages);
- err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+ err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
flags, dma->pages, NULL);
if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
- dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
+ dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+ dma->nr_pages);
return err < 0 ? err : -EINVAL;
}
return 0;
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
2017-11-07 0:57 ` [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support Dan Williams
@ 2017-11-07 8:33 ` Mauro Carvalho Chehab
2017-11-07 17:43 ` Dan Williams
0 siblings, 1 reply; 6+ messages in thread
From: Mauro Carvalho Chehab @ 2017-11-07 8:33 UTC (permalink / raw)
To: Dan Williams
Cc: akpm, Jan Kara, linux-kernel, stable, linux-mm,
Mauro Carvalho Chehab, linux-media
Em Mon, 06 Nov 2017 16:57:28 -0800
Dan Williams <dan.j.williams@intel.com> escreveu:
> V4L2 memory registrations are incompatible with filesystem-dax that
> needs the ability to revoke dma access to a mapping at will, or
> otherwise allow the kernel to wait for completion of DMA. The
> filesystem-dax implementation breaks the traditional solution of
> truncate of active file backed mappings since there is no page-cache
> page we can orphan to sustain ongoing DMA.
>
> If v4l2 wants to support long lived DMA mappings it needs to arrange to
> hold a file lease or use some other mechanism so that the kernel can
> coordinate revoking DMA access when the filesystem needs to truncate
> mappings.
Not sure if I understand this your comment here... what happens
if FS_DAX is enabled? The new err = get_user_pages_longterm()
would cause DMA allocation to fail? If so, that doesn't sound
right. Instead, mm should somehow mark this mapping to be out
of FS_DAX control range.
Also, it is not only videobuf-dma-sg.c that does long lived
DMA mappings. VB2 also does that (and videobuf-vmalloc).
Regards,
Mauro
>
> Reported-by: Jan Kara <jack@suse.cz>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: linux-media@vger.kernel.org
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
> index 0b5c43f7e020..f412429cf5ba 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-sg.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
> @@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
> dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
> data, size, dma->nr_pages);
>
> - err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
> + err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
> flags, dma->pages, NULL);
>
> if (err != dma->nr_pages) {
> dma->nr_pages = (err >= 0) ? err : 0;
> - dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
> + dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
> + dma->nr_pages);
> return err < 0 ? err : -EINVAL;
> }
> return 0;
>
Thanks,
Mauro
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
2017-11-07 8:33 ` Mauro Carvalho Chehab
@ 2017-11-07 17:43 ` Dan Williams
2017-11-07 20:39 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 6+ messages in thread
From: Dan Williams @ 2017-11-07 17:43 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Andrew Morton, Jan Kara, linux-kernel@vger.kernel.org,
stable@vger.kernel.org, Linux MM, Mauro Carvalho Chehab,
Linux-media@vger.kernel.org
On Tue, Nov 7, 2017 at 12:33 AM, Mauro Carvalho Chehab
<mchehab@s-opensource.com> wrote:
> Em Mon, 06 Nov 2017 16:57:28 -0800
> Dan Williams <dan.j.williams@intel.com> escreveu:
>
>> V4L2 memory registrations are incompatible with filesystem-dax that
>> needs the ability to revoke dma access to a mapping at will, or
>> otherwise allow the kernel to wait for completion of DMA. The
>> filesystem-dax implementation breaks the traditional solution of
>> truncate of active file backed mappings since there is no page-cache
>> page we can orphan to sustain ongoing DMA.
>>
>> If v4l2 wants to support long lived DMA mappings it needs to arrange to
>> hold a file lease or use some other mechanism so that the kernel can
>> coordinate revoking DMA access when the filesystem needs to truncate
>> mappings.
>
>
> Not sure if I understand this your comment here... what happens
> if FS_DAX is enabled? The new err = get_user_pages_longterm()
> would cause DMA allocation to fail?
Correct, any attempt to specify a filesystem-dax mapping range to
get_user_pages_longterm will fail with EOPNOTSUPP. In the future we
want to add something like a 'struct file_lock *' argument to
get_user_pages_longterm so that the kernel has a handle to revoke
access to the returned pages. Once we have a safe way for the kernel
to undo elevated page counts we can stop failing the longterm vs
filesystem-dax case.
Here is more background on why _longterm gup is a problem for filesystem-dax:
https://lwn.net/Articles/737273/
> If so, that doesn't sound
> right. Instead, mm should somehow mark this mapping to be out
> of FS_DAX control range.
DAX is currently global setting for the entire backing device of the
filesystem, so any mapping of any file when the "-o dax" mount option
is set is in the "FS_DAX control range". In other words there's
currently no way to prevent FS_DAX mappings from being exposed to V4L2
outside of CONFIG_FS_DAX=n.
> Also, it is not only videobuf-dma-sg.c that does long lived
> DMA mappings. VB2 also does that (and videobuf-vmalloc).
Without finding the code videobuf-vmalloc sounds like it should be ok
if the kernel is allocating memory separate from a file-backed DAX
mapping. Where is the VB2 get_user_pages call?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
2017-11-07 17:43 ` Dan Williams
@ 2017-11-07 20:39 ` Mauro Carvalho Chehab
2017-11-08 0:13 ` Dan Williams
0 siblings, 1 reply; 6+ messages in thread
From: Mauro Carvalho Chehab @ 2017-11-07 20:39 UTC (permalink / raw)
To: Dan Williams
Cc: Andrew Morton, Jan Kara, linux-kernel@vger.kernel.org,
stable@vger.kernel.org, Linux MM, Mauro Carvalho Chehab,
Linux-media@vger.kernel.org
Em Tue, 7 Nov 2017 09:43:41 -0800
Dan Williams <dan.j.williams@intel.com> escreveu:
> On Tue, Nov 7, 2017 at 12:33 AM, Mauro Carvalho Chehab
> <mchehab@s-opensource.com> wrote:
> > Em Mon, 06 Nov 2017 16:57:28 -0800
> > Dan Williams <dan.j.williams@intel.com> escreveu:
> >
> >> V4L2 memory registrations are incompatible with filesystem-dax that
> >> needs the ability to revoke dma access to a mapping at will, or
> >> otherwise allow the kernel to wait for completion of DMA. The
> >> filesystem-dax implementation breaks the traditional solution of
> >> truncate of active file backed mappings since there is no page-cache
> >> page we can orphan to sustain ongoing DMA.
> >>
> >> If v4l2 wants to support long lived DMA mappings it needs to arrange to
> >> hold a file lease or use some other mechanism so that the kernel can
> >> coordinate revoking DMA access when the filesystem needs to truncate
> >> mappings.
> >
> >
> > Not sure if I understand this your comment here... what happens
> > if FS_DAX is enabled? The new err = get_user_pages_longterm()
> > would cause DMA allocation to fail?
>
> Correct, any attempt to specify a filesystem-dax mapping range to
> get_user_pages_longterm will fail with EOPNOTSUPP. In the future we
> want to add something like a 'struct file_lock *' argument to
> get_user_pages_longterm so that the kernel has a handle to revoke
> access to the returned pages. Once we have a safe way for the kernel
> to undo elevated page counts we can stop failing the longterm vs
> filesystem-dax case.
Argh! Perhaps we should make it depend on BROKEN while not fixed :-/
> Here is more background on why _longterm gup is a problem for filesystem-dax:
>
> https://lwn.net/Articles/737273/
>
> > If so, that doesn't sound
> > right. Instead, mm should somehow mark this mapping to be out
> > of FS_DAX control range.
>
> DAX is currently global setting for the entire backing device of the
> filesystem, so any mapping of any file when the "-o dax" mount option
> is set is in the "FS_DAX control range". In other words there's
> currently no way to prevent FS_DAX mappings from being exposed to V4L2
> outside of CONFIG_FS_DAX=n.
Grrr...
> > Also, it is not only videobuf-dma-sg.c that does long lived
> > DMA mappings. VB2 also does that (and videobuf-vmalloc).
>
> Without finding the code videobuf-vmalloc sounds like it should be ok
> if the kernel is allocating memory separate from a file-backed DAX
> mapping.
videobuf-vmalloc do DMA mapping for pages allocated via vmalloc(),
via vmalloc_user()/remap_vmalloc_range().
There aren't much drivers using VB1 anymore, but a change at VB2
will likely break support for almost all webcams if fs DAX is
in usage.
> Where is the VB2 get_user_pages call?
Before changeset 3336c24f25ec, the logic for get_user_pages() were
at drivers/media/v4l2-core/videobuf2-dma-sg.c. Now, the logic
it uses is inside mm/frame_vector.c.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
2017-11-07 20:39 ` Mauro Carvalho Chehab
@ 2017-11-08 0:13 ` Dan Williams
0 siblings, 0 replies; 6+ messages in thread
From: Dan Williams @ 2017-11-08 0:13 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Andrew Morton, Jan Kara, linux-kernel@vger.kernel.org,
stable@vger.kernel.org, Linux MM, Mauro Carvalho Chehab,
Linux-media@vger.kernel.org
On Tue, Nov 7, 2017 at 12:39 PM, Mauro Carvalho Chehab
<mchehab@s-opensource.com> wrote:
> Em Tue, 7 Nov 2017 09:43:41 -0800
> Dan Williams <dan.j.williams@intel.com> escreveu:
>
>> On Tue, Nov 7, 2017 at 12:33 AM, Mauro Carvalho Chehab
>> <mchehab@s-opensource.com> wrote:
>> > Em Mon, 06 Nov 2017 16:57:28 -0800
>> > Dan Williams <dan.j.williams@intel.com> escreveu:
>> >
>> >> V4L2 memory registrations are incompatible with filesystem-dax that
>> >> needs the ability to revoke dma access to a mapping at will, or
>> >> otherwise allow the kernel to wait for completion of DMA. The
>> >> filesystem-dax implementation breaks the traditional solution of
>> >> truncate of active file backed mappings since there is no page-cache
>> >> page we can orphan to sustain ongoing DMA.
>> >>
>> >> If v4l2 wants to support long lived DMA mappings it needs to arrange to
>> >> hold a file lease or use some other mechanism so that the kernel can
>> >> coordinate revoking DMA access when the filesystem needs to truncate
>> >> mappings.
>> >
>> >
>> > Not sure if I understand this your comment here... what happens
>> > if FS_DAX is enabled? The new err = get_user_pages_longterm()
>> > would cause DMA allocation to fail?
>>
>> Correct, any attempt to specify a filesystem-dax mapping range to
>> get_user_pages_longterm will fail with EOPNOTSUPP. In the future we
>> want to add something like a 'struct file_lock *' argument to
>> get_user_pages_longterm so that the kernel has a handle to revoke
>> access to the returned pages. Once we have a safe way for the kernel
>> to undo elevated page counts we can stop failing the longterm vs
>> filesystem-dax case.
>
> Argh! Perhaps we should make it depend on BROKEN while not fixed :-/
Small consolation, but we do warn that filesystem-dax is still
considered experimental when mounting a filesystem with "-o dax"
>> Here is more background on why _longterm gup is a problem for filesystem-dax:
>>
>> https://lwn.net/Articles/737273/
>>
>> > If so, that doesn't sound
>> > right. Instead, mm should somehow mark this mapping to be out
>> > of FS_DAX control range.
>>
>> DAX is currently global setting for the entire backing device of the
>> filesystem, so any mapping of any file when the "-o dax" mount option
>> is set is in the "FS_DAX control range". In other words there's
>> currently no way to prevent FS_DAX mappings from being exposed to V4L2
>> outside of CONFIG_FS_DAX=n.
>
> Grrr...
>
>> > Also, it is not only videobuf-dma-sg.c that does long lived
>> > DMA mappings. VB2 also does that (and videobuf-vmalloc).
>>
>> Without finding the code videobuf-vmalloc sounds like it should be ok
>> if the kernel is allocating memory separate from a file-backed DAX
>> mapping.
>
> videobuf-vmalloc do DMA mapping for pages allocated via vmalloc(),
> via vmalloc_user()/remap_vmalloc_range().
Ok, that's completely safe since filesystem-dax mappings are not
involved in a vmalloc backed virtual address range.
> There aren't much drivers using VB1 anymore, but a change at VB2
> will likely break support for almost all webcams if fs DAX is
> in usage.
Yes, unless / until we can switch userspace to using a new memory
registration api that includes a way for the kernel to revoke access
to a dax mapping. Another mitigation is following through on support
for moving dax support from a global mount flag to a per-inode flag to
at least prevent dax from leaking to use cases that need explicit
coordination.
>> Where is the VB2 get_user_pages call?
>
> Before changeset 3336c24f25ec, the logic for get_user_pages() were
> at drivers/media/v4l2-core/videobuf2-dma-sg.c. Now, the logic
> it uses is inside mm/frame_vector.c.
Ok, I'll take a look.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-11-08 0:13 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-07 0:57 [PATCH 0/3] introduce get_user_pages_longterm() Dan Williams
2017-11-07 0:57 ` [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support Dan Williams
2017-11-07 8:33 ` Mauro Carvalho Chehab
2017-11-07 17:43 ` Dan Williams
2017-11-07 20:39 ` Mauro Carvalho Chehab
2017-11-08 0:13 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox