From: Jan Kara <jack@suse.cz>
To: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Jan Kara <jack@suse.cz>, LKML <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org
Subject: Re: [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked()
Date: Mon, 7 Oct 2013 22:55:08 +0200 [thread overview]
Message-ID: <20131007205508.GE30441@quack.suse.cz> (raw)
In-Reply-To: <524DF246.9050309@gmail.com>
On Thu 03-10-13 18:40:06, KOSAKI Motohiro wrote:
> (10/2/13 3:36 PM), Jan Kara wrote:
> >On Wed 02-10-13 12:32:33, KOSAKI Motohiro wrote:
> >>(10/2/13 10:27 AM), Jan Kara wrote:
> >>>Signed-off-by: Jan Kara <jack@suse.cz>
> >>>---
> >>> mm/process_vm_access.c | 8 ++------
> >>> 1 file changed, 2 insertions(+), 6 deletions(-)
> >>>
> >>>diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
> >>>index fd26d0433509..c1bc47d8ed90 100644
> >>>--- a/mm/process_vm_access.c
> >>>+++ b/mm/process_vm_access.c
> >>>@@ -64,12 +64,8 @@ static int process_vm_rw_pages(struct task_struct *task,
> >>> *bytes_copied = 0;
> >>>
> >>> /* Get the pages we're interested in */
> >>>- down_read(&mm->mmap_sem);
> >>>- pages_pinned = get_user_pages(task, mm, pa,
> >>>- nr_pages_to_copy,
> >>>- vm_write, 0, process_pages, NULL);
> >>>- up_read(&mm->mmap_sem);
> >>>-
> >>>+ pages_pinned = get_user_pages_unlocked(task, mm, pa, nr_pages_to_copy,
> >>>+ vm_write, 0, process_pages);
> >>> if (pages_pinned != nr_pages_to_copy) {
> >>> rc = -EFAULT;
> >>> goto end;
> >>
> >>This is wrong because original code is wrong. In this function, page may
> >>be pointed to anon pages. Then, you should keep to take mmap_sem until
> >>finish to copying. Otherwise concurrent fork() makes nasty COW issue.
> > Hum, can you be more specific? I suppose you are speaking about situation
> >when the remote task we are copying data from/to does fork while
> >process_vm_rw_pages() runs. If we are copying data from remote task, I
> >don't see how COW could cause any problem. If we are copying to remote task
> >and fork happens after get_user_pages() but before copy_to_user() then I
> >can see we might be having some trouble. copy_to_user() would then copy
> >data into both original remote process and its child thus essentially
> >bypassing COW. If the child process manages to COW some of the pages before
> >copy_to_user() happens, it can even see only some of the pages. Is that what
> >you mean?
>
> scenario 1: vm_write==0
>
> Process P1 call get_user_pages(pa, process_pages) in process_vm_rw_pages
> P1 unlock mmap_sem.
> Process P2 call fork(). and make P3.
> P2 write memory pa. now the "process_pages" is owned by P3 (the child process)
> P3 write memory pa. and then the content of "process_pages" is changed.
> P1 read process_pages as P2's page. but actually, it is P3's data. Then,
> P1 observe garbage, at least unintended, data was read.
Yeah, this really looks buggy because P1 can see data in (supposedly)
P2's address space which P2 never wrote there.
> scenario 2: vm_write==1
>
> Process P1 call get_user_pages(pa, process_pages) in process_vm_rw_pages.
> It makes COW break and any anon page sharing broke.
> P1 unlock mmap_sem.
> P2 call fork(). and make P3. And, now COW page sharing is restored.
> P2 write memory pa. now the "process_pages" is owned by P3.
> P3 write memory pa. it mean P3 changes "process_pages".
> P1 write process_pages as P2's page. but actually, it is P3's. It
> override above P3's write and then P3 observe data corruption.
Yep, this is a similar problem as above. Thanks for pointing this out.
> The solution is to keep holding mmap_sem until finishing process_pages
> access because mmap_sem prevent fork. and then race never be happen. I
> mean you cann't use get_user_pages_unlock() if target address point to
> anon pages.
Yeah, if you are accessing third party mm, you've convinced me you
currently need mmap_sem to avoid problems with COW on anon pages. I'm just
thinking that this "hold mmap_sem to prevent fork" is somewhat subtle
(definitely would deserve a comment) and if it would be needed in more
places we might be better off if we have a more explicit mechanism for that
(like a special lock, fork sequence count, or something like that). Anyway
I'll have this in mind and if I see other places that need this, I'll try
to come up with some solution. Thanks again for explanation.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Jan Kara <jack@suse.cz>, LKML <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org
Subject: Re: [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked()
Date: Mon, 7 Oct 2013 22:55:08 +0200 [thread overview]
Message-ID: <20131007205508.GE30441@quack.suse.cz> (raw)
In-Reply-To: <524DF246.9050309@gmail.com>
On Thu 03-10-13 18:40:06, KOSAKI Motohiro wrote:
> (10/2/13 3:36 PM), Jan Kara wrote:
> >On Wed 02-10-13 12:32:33, KOSAKI Motohiro wrote:
> >>(10/2/13 10:27 AM), Jan Kara wrote:
> >>>Signed-off-by: Jan Kara <jack@suse.cz>
> >>>---
> >>> mm/process_vm_access.c | 8 ++------
> >>> 1 file changed, 2 insertions(+), 6 deletions(-)
> >>>
> >>>diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
> >>>index fd26d0433509..c1bc47d8ed90 100644
> >>>--- a/mm/process_vm_access.c
> >>>+++ b/mm/process_vm_access.c
> >>>@@ -64,12 +64,8 @@ static int process_vm_rw_pages(struct task_struct *task,
> >>> *bytes_copied = 0;
> >>>
> >>> /* Get the pages we're interested in */
> >>>- down_read(&mm->mmap_sem);
> >>>- pages_pinned = get_user_pages(task, mm, pa,
> >>>- nr_pages_to_copy,
> >>>- vm_write, 0, process_pages, NULL);
> >>>- up_read(&mm->mmap_sem);
> >>>-
> >>>+ pages_pinned = get_user_pages_unlocked(task, mm, pa, nr_pages_to_copy,
> >>>+ vm_write, 0, process_pages);
> >>> if (pages_pinned != nr_pages_to_copy) {
> >>> rc = -EFAULT;
> >>> goto end;
> >>
> >>This is wrong because original code is wrong. In this function, page may
> >>be pointed to anon pages. Then, you should keep to take mmap_sem until
> >>finish to copying. Otherwise concurrent fork() makes nasty COW issue.
> > Hum, can you be more specific? I suppose you are speaking about situation
> >when the remote task we are copying data from/to does fork while
> >process_vm_rw_pages() runs. If we are copying data from remote task, I
> >don't see how COW could cause any problem. If we are copying to remote task
> >and fork happens after get_user_pages() but before copy_to_user() then I
> >can see we might be having some trouble. copy_to_user() would then copy
> >data into both original remote process and its child thus essentially
> >bypassing COW. If the child process manages to COW some of the pages before
> >copy_to_user() happens, it can even see only some of the pages. Is that what
> >you mean?
>
> scenario 1: vm_write==0
>
> Process P1 call get_user_pages(pa, process_pages) in process_vm_rw_pages
> P1 unlock mmap_sem.
> Process P2 call fork(). and make P3.
> P2 write memory pa. now the "process_pages" is owned by P3 (the child process)
> P3 write memory pa. and then the content of "process_pages" is changed.
> P1 read process_pages as P2's page. but actually, it is P3's data. Then,
> P1 observe garbage, at least unintended, data was read.
Yeah, this really looks buggy because P1 can see data in (supposedly)
P2's address space which P2 never wrote there.
> scenario 2: vm_write==1
>
> Process P1 call get_user_pages(pa, process_pages) in process_vm_rw_pages.
> It makes COW break and any anon page sharing broke.
> P1 unlock mmap_sem.
> P2 call fork(). and make P3. And, now COW page sharing is restored.
> P2 write memory pa. now the "process_pages" is owned by P3.
> P3 write memory pa. it mean P3 changes "process_pages".
> P1 write process_pages as P2's page. but actually, it is P3's. It
> override above P3's write and then P3 observe data corruption.
Yep, this is a similar problem as above. Thanks for pointing this out.
> The solution is to keep holding mmap_sem until finishing process_pages
> access because mmap_sem prevent fork. and then race never be happen. I
> mean you cann't use get_user_pages_unlock() if target address point to
> anon pages.
Yeah, if you are accessing third party mm, you've convinced me you
currently need mmap_sem to avoid problems with COW on anon pages. I'm just
thinking that this "hold mmap_sem to prevent fork" is somewhat subtle
(definitely would deserve a comment) and if it would be needed in more
places we might be better off if we have a more explicit mechanism for that
(like a special lock, fork sequence count, or something like that). Anyway
I'll have this in mind and if I see other places that need this, I'll try
to come up with some solution. Thanks again for explanation.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2013-10-07 20:55 UTC|newest]
Thread overview: 142+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-02 14:27 [PATCH 0/26] get_user_pages() cleanup Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 01/26] cris: Convert cryptocop to use get_user_pages_fast() Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 02/26] ia64: Use get_user_pages_fast() in err_inject.c Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 03/26] dma: Use get_user_pages_fast() in dma_pin_iovec_pages() Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 04/26] drm: Convert via driver to use get_user_pages_fast() Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 05/26] omap3isp: Make isp_video_buffer_prepare_user() " Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 19:41 ` Laurent Pinchart
2013-10-02 19:41 ` Laurent Pinchart
2013-10-02 20:18 ` Jan Kara
2013-10-02 20:18 ` Jan Kara
2013-10-02 14:27 ` [PATCH 06/26] vmw_vmci: Convert driver to " Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 07/26] st: Convert sgl_map_user_pages() " Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 08/26] ced1401: Convert driver " Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 09/26] crystalhd: Convert crystalhd_map_dio() " Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 10/26] lustre: Convert ll_get_user_pages() " Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-05 6:27 ` Dilger, Andreas
2013-10-05 6:27 ` Dilger, Andreas
2013-10-02 14:27 ` [PATCH 11/26] sep: Convert sep_lock_user_pages() to get_user_pages_fast() Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 12/26] pvr2fb: Convert pvr2fb_write() to use get_user_pages_fast() Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 13/26] fsl_hypervisor: Convert ioctl_memcpy() " Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-04 2:38 ` Timur Tabi
2013-10-04 2:38 ` Timur Tabi
2013-10-02 14:27 ` [PATCH 14/26] nfs: Convert direct IO " Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 15/26] ceph: Convert ceph_get_direct_page_vector() to get_user_pages_fast() Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:27 ` [PATCH 16/26] mm: Provide get_user_pages_unlocked() Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 16:25 ` Christoph Hellwig
2013-10-02 16:25 ` Christoph Hellwig
2013-10-02 16:28 ` KOSAKI Motohiro
2013-10-02 16:28 ` KOSAKI Motohiro
2013-10-02 19:39 ` Jan Kara
2013-10-02 19:39 ` Jan Kara
2013-10-02 14:27 ` [PATCH 17/26] kvm: Use get_user_pages_unlocked() in async_pf_execute() Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 14:59 ` Gleb Natapov
2013-10-02 14:59 ` Gleb Natapov
2013-10-02 14:27 ` [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked() Jan Kara
2013-10-02 14:27 ` Jan Kara
2013-10-02 16:32 ` KOSAKI Motohiro
2013-10-02 16:32 ` KOSAKI Motohiro
2013-10-02 19:36 ` Jan Kara
2013-10-02 19:36 ` Jan Kara
2013-10-03 22:40 ` KOSAKI Motohiro
2013-10-03 22:40 ` KOSAKI Motohiro
2013-10-07 20:55 ` Jan Kara [this message]
2013-10-07 20:55 ` Jan Kara
2013-10-08 0:10 ` KOSAKI Motohiro
2013-10-08 0:10 ` KOSAKI Motohiro
2013-10-02 14:28 ` [PATCH 19/26] ivtv: Convert driver " Jan Kara
2013-10-02 14:28 ` Jan Kara
2013-10-05 12:02 ` Andy Walls
2013-10-05 12:02 ` Andy Walls
2013-10-07 17:22 ` Jan Kara
2013-10-07 17:22 ` Jan Kara
2013-10-02 14:28 ` [PATCH 20/26] ib: Convert ib_umem_get() to get_user_pages_unlocked() Jan Kara
2013-10-02 14:28 ` Jan Kara
2013-10-02 14:28 ` [PATCH 21/26] ib: Convert ipath_get_user_pages() " Jan Kara
2013-10-02 14:28 ` Jan Kara
2013-10-02 14:28 ` [PATCH 22/26] ib: Convert ipath_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
2013-10-02 14:28 ` Jan Kara
2013-10-02 14:28 ` [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked() Jan Kara
2013-10-02 14:28 ` Jan Kara
[not found] ` <1380724087-13927-24-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
2013-10-02 14:54 ` Marciniszyn, Mike
2013-10-02 14:54 ` Marciniszyn, Mike
2013-10-02 14:54 ` Marciniszyn, Mike
[not found] ` <32E1700B9017364D9B60AED9960492BC211AEF75-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-10-02 15:28 ` Jan Kara
2013-10-02 15:28 ` Jan Kara
2013-10-02 15:28 ` Jan Kara
[not found] ` <20131002152811.GC32181-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2013-10-02 15:32 ` Marciniszyn, Mike
2013-10-02 15:32 ` Marciniszyn, Mike
2013-10-02 15:32 ` Marciniszyn, Mike
[not found] ` <32E1700B9017364D9B60AED9960492BC211AF005-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-10-02 15:38 ` Jan Kara
2013-10-02 15:38 ` Jan Kara
2013-10-02 15:38 ` Jan Kara
[not found] ` <20131002153842.GD32181-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2013-10-04 13:39 ` Marciniszyn, Mike
2013-10-04 13:39 ` Marciniszyn, Mike
2013-10-04 13:39 ` Marciniszyn, Mike
[not found] ` <32E1700B9017364D9B60AED9960492BC211B0123-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-10-04 13:46 ` Marciniszyn, Mike
2013-10-04 13:46 ` Marciniszyn, Mike
2013-10-04 13:46 ` Marciniszyn, Mike
2013-10-04 13:44 ` Marciniszyn, Mike
2013-10-04 13:44 ` Marciniszyn, Mike
2013-10-04 13:52 ` Marciniszyn, Mike
2013-10-04 13:52 ` Marciniszyn, Mike
2013-10-04 13:52 ` Marciniszyn, Mike
[not found] ` <32E1700B9017364D9B60AED9960492BC211B0176-AtyAts71sc9zLByeVOV5+bfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-10-04 18:33 ` Jan Kara
2013-10-04 18:33 ` Jan Kara
2013-10-04 18:33 ` Jan Kara
[not found] ` <20131004183315.GA19557-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2013-10-07 15:20 ` Marciniszyn, Mike
2013-10-07 15:20 ` Marciniszyn, Mike
2013-10-07 15:20 ` Marciniszyn, Mike
2013-10-07 15:38 ` Marciniszyn, Mike
2013-10-07 15:38 ` Marciniszyn, Mike
2013-10-07 17:26 ` Jan Kara
2013-10-07 17:26 ` Jan Kara
2013-10-08 19:06 ` Jan Kara
2013-10-08 19:06 ` Jan Kara
[not found] ` <20131008190604.GB14223-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2013-10-16 21:39 ` Jan Kara
2013-10-16 21:39 ` Jan Kara
2013-10-16 21:39 ` Jan Kara
2013-10-02 14:28 ` [PATCH 24/26] ib: Convert qib_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
2013-10-02 14:28 ` Jan Kara
2013-10-02 14:28 ` [PATCH 25/26] ib: Convert mthca_map_user_db() to use get_user_pages_fast() Jan Kara
2013-10-02 14:28 ` Jan Kara
2013-10-02 14:28 ` [PATCH 26/26] aio: Remove useless get_user_pages() call Jan Kara
2013-10-02 14:28 ` Jan Kara
2013-10-02 14:28 ` Jan Kara
2013-10-02 16:20 ` [PATCH 0/26] get_user_pages() cleanup Christoph Hellwig
2013-10-02 16:20 ` Christoph Hellwig
2013-10-02 20:29 ` Jan Kara
2013-10-02 20:29 ` Jan Kara
2013-10-04 20:31 ` KOSAKI Motohiro
2013-10-04 20:31 ` KOSAKI Motohiro
2013-10-04 20:42 ` KOSAKI Motohiro
2013-10-04 20:42 ` KOSAKI Motohiro
2013-10-07 21:18 ` Jan Kara
2013-10-07 21:18 ` Jan Kara
2013-10-07 21:18 ` Jan Kara
2013-10-08 0:27 ` KOSAKI Motohiro
2013-10-08 0:27 ` KOSAKI Motohiro
2013-10-08 0:27 ` KOSAKI Motohiro
2013-10-08 6:06 ` Jan Kara
2013-10-08 6:06 ` Jan Kara
2013-10-08 6:06 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131007205508.GE30441@quack.suse.cz \
--to=jack@suse.cz \
--cc=kosaki.motohiro@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.