* Flushing whole page instead of work for ptrace @ 2010-11-26 12:31 Michal Simek 2010-11-30 11:07 ` Flushing whole page instead of exact address " Michal Simek 2010-11-30 23:32 ` Flushing whole page instead of work " Roland McGrath 0 siblings, 2 replies; 9+ messages in thread From: Michal Simek @ 2010-11-26 12:31 UTC (permalink / raw) To: Oleg Nesterov, Roland McGrath, Andrew Morton Cc: LKML, linux-mm, John Williams, Edgar E. Iglesias Hi, I have found one problem when I debug multithread application on Microblaze. Let me describe what I discovered. GDB has internal timeout which is setup to value 3. Which should mean if GDB sends packet and doesn't receive answer for it then after 3 internal timeouts GDB announces "Ignoring packet error, continuing..." and then fail. (communication is done over TCP). In any older version we could debug multithread application that's why I bisected all new patches which I have added to the kernel and I identify that the problem is caused by my patch. microblaze: Implement flush_dcache_page macro sha1(79e87830faf22ca636b1a1d8f4deb430ea6e1c8b) I had to implemented flush_dcache_page macro for new systems with write-back(WB) cache which is important for several components (for example jffs2 rootfs) to get it work on WB. BTW: For systems with write-through(WT) caches I don't need to implement this macro because flushing is done automatically. Then I replaced macro on WT by udelay loop to find out if the problem is time dependent. I tested it on two hw designs(on the same HZ and cache size) with two different network IPs/drivers (one with DMA and second without) and I found that system with dma network driver can spend more time on dcache flushing before GDB timeout happens because TCP communication is faster. Which means that the problem also depends on cpu speed and cache configuration - size, cache line length. Then I traced kernel part and I was focused why this macro is causing this problem. GDB sends symbol-lookup command (qSymbol) and I see a lot of kernel ptrace PEEKTEXT requests. I parse it and here is calling sequence. (kernel/ptrace.c) sys_ptrace -> (arch/microblaze/kernel/ptrace.c)arch_ptrace -> (kernel/ptrace.c)ptrace_request -> generic_ptrace_peek/poke data/text -> (mm/memory.c) access_process_vm -> get_user_pages -> __get_user_pages -> flush_dcache_page Function access_process_vm calls __get_user_pages which doesn't work with buffer len (which is for PEEK/POKE TEXT/DATA just 32 bit - for 32bit Microblaze) but only with start and PAGE size. There is also called flush_dcache_page macro which takes more time than in past, because was empty. Macro flushes whole page but it is necessary, for this case, just flush one address if is called from ptrace. What is the best way how to ensure that there will be flush only address instead of whole page for ptrace requests? I think that there shouldn't be a reason to flush whole page for ptraces. Please correct me if I am wrong somewhere. Thanks, Michal -- Michal Simek, Ing. (M.Eng) PetaLogix - Linux Solutions for a Reconfigurable World w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Flushing whole page instead of exact address for ptrace 2010-11-26 12:31 Flushing whole page instead of work for ptrace Michal Simek @ 2010-11-30 11:07 ` Michal Simek 2010-11-30 23:32 ` Flushing whole page instead of work " Roland McGrath 1 sibling, 0 replies; 9+ messages in thread From: Michal Simek @ 2010-11-30 11:07 UTC (permalink / raw) To: michal.simek Cc: Oleg Nesterov, Roland McGrath, Andrew Morton, LKML, linux-mm, John Williams, Edgar E. Iglesias, Arnd Bergmann Hi, Michal Simek wrote: > Hi, > > I have found one problem when I debug multithread application on > Microblaze. Let me describe what I discovered. > > GDB has internal timeout which is setup to value 3. Which should mean if > GDB sends packet and doesn't receive answer for it then after 3 internal > timeouts GDB announces "Ignoring packet error, continuing..." and then > fail. (communication is done over TCP). > > In any older version we could debug multithread application that's why > I bisected all new patches which I have added to the kernel and I > identify that the problem is caused by my patch. > > microblaze: Implement flush_dcache_page macro > sha1(79e87830faf22ca636b1a1d8f4deb430ea6e1c8b) > > I had to implemented flush_dcache_page macro for new systems with > write-back(WB) cache which is important for several components (for > example jffs2 rootfs) to get it work on WB. > BTW: For systems with write-through(WT) caches I don't need to implement > this macro because flushing is done automatically. > > Then I replaced macro on WT by udelay loop to find out if the problem is > time dependent. I tested it on two hw designs(on the same HZ and cache > size) with two different network IPs/drivers (one with DMA and second > without) and I found that system with dma network driver can spend more > time on dcache flushing before GDB timeout happens because TCP > communication is faster. Which means that the problem also depends on > cpu speed and cache configuration - size, cache line length. > > Then I traced kernel part and I was focused why this macro is causing > this problem. > > GDB sends symbol-lookup command (qSymbol) and I see a lot of kernel > ptrace PEEKTEXT requests. I parse it and here is calling sequence. > > (kernel/ptrace.c) sys_ptrace -> > (arch/microblaze/kernel/ptrace.c)arch_ptrace -> > (kernel/ptrace.c)ptrace_request -> generic_ptrace_peek/poke data/text -> > (mm/memory.c) access_process_vm -> get_user_pages -> __get_user_pages -> > flush_dcache_page > > Function access_process_vm calls __get_user_pages which doesn't work > with buffer len (which is for PEEK/POKE TEXT/DATA just 32 bit - for > 32bit Microblaze) but only with start and PAGE size. There is also > called flush_dcache_page macro which takes more time than in past, > because was empty. Macro flushes whole page but it is necessary, for > this case, just flush one address if is called from ptrace. > > What is the best way how to ensure that there will be flush only address > instead of whole page for ptrace requests? > I think that there shouldn't be a reason to flush whole page for ptraces. > > Please correct me if I am wrong somewhere. Any suggestions? Michal -- Michal Simek, Ing. (M.Eng) w: www.monstr.eu p: +42-0-721842854 Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/ Microblaze U-BOOT custodian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Flushing whole page instead of work for ptrace 2010-11-26 12:31 Flushing whole page instead of work for ptrace Michal Simek 2010-11-30 11:07 ` Flushing whole page instead of exact address " Michal Simek @ 2010-11-30 23:32 ` Roland McGrath 2010-12-01 17:10 ` Michal Simek 2010-12-03 15:00 ` Oleg Nesterov 1 sibling, 2 replies; 9+ messages in thread From: Roland McGrath @ 2010-11-30 23:32 UTC (permalink / raw) To: michal.simek Cc: Oleg Nesterov, Andrew Morton, LKML, linux-mm, John Williams, Edgar E. Iglesias This is a VM question more than a ptrace question. I can't give you any authoritative answers about the VM issues. Documentation/cachetlb.txt says: Any time the kernel writes to a page cache page, _OR_ the kernel is about to read from a page cache page and user space shared/writable mappings of this page potentially exist, this routine is called. In your case, the kernel is only reading (write=0 passed to access_process_vm and get_user_pages). In normal situations, the page in question will have only a private and read-only mapping in user space. So the call should not be required in these cases--if the code can tell that's so. Perhaps something like the following would be safe. But you really need some VM folks to tell you for sure. diff --git a/mm/memory.c b/mm/memory.c index 02e48aa..2864ee7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1484,7 +1484,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, pages[i] = page; flush_anon_page(vma, page, start); - flush_dcache_page(page); + if ((vm_flags & VM_WRITE) || (vma->vm_flags & VM_SHARED) + flush_dcache_page(page); } if (vmas) vmas[i] = vma; Thanks, Roland -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: Flushing whole page instead of work for ptrace 2010-11-30 23:32 ` Flushing whole page instead of work " Roland McGrath @ 2010-12-01 17:10 ` Michal Simek 2010-12-01 17:57 ` David Miller 2010-12-03 15:00 ` Oleg Nesterov 1 sibling, 1 reply; 9+ messages in thread From: Michal Simek @ 2010-12-01 17:10 UTC (permalink / raw) To: Roland McGrath Cc: Oleg Nesterov, Andrew Morton, LKML, linux-mm, John Williams, Edgar E. Iglesias Roland McGrath wrote: > This is a VM question more than a ptrace question. > I can't give you any authoritative answers about the VM issues. > > Documentation/cachetlb.txt says: > > Any time the kernel writes to a page cache page, _OR_ > the kernel is about to read from a page cache page and > user space shared/writable mappings of this page potentially > exist, this routine is called. > > In your case, the kernel is only reading (write=0 passed to > access_process_vm and get_user_pages). In normal situations, > the page in question will have only a private and read-only > mapping in user space. So the call should not be required in > these cases--if the code can tell that's so. > > Perhaps something like the following would be safe. > But you really need some VM folks to tell you for sure. > > diff --git a/mm/memory.c b/mm/memory.c > index 02e48aa..2864ee7 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1484,7 +1484,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > pages[i] = page; > > flush_anon_page(vma, page, start); > - flush_dcache_page(page); > + if ((vm_flags & VM_WRITE) || (vma->vm_flags & VM_SHARED) > + flush_dcache_page(page); > } > if (vmas) > vmas[i] = vma; > > > Thanks, > Roland Andrew any comment? Thanks, Michal -- Michal Simek, Ing. (M.Eng) PetaLogix - Linux Solutions for a Reconfigurable World w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Flushing whole page instead of work for ptrace 2010-12-01 17:10 ` Michal Simek @ 2010-12-01 17:57 ` David Miller 0 siblings, 0 replies; 9+ messages in thread From: David Miller @ 2010-12-01 17:57 UTC (permalink / raw) To: michal.simek Cc: roland, oleg, akpm, linux-kernel, linux-mm, john.williams, edgar.iglesias From: Michal Simek <michal.simek@petalogix.com> Date: Wed, 01 Dec 2010 18:10:12 +0100 > Roland McGrath wrote: >> This is a VM question more than a ptrace question. I can't give you >> any authoritative answers about the VM issues. >> Documentation/cachetlb.txt says: >> Any time the kernel writes to a page cache page, _OR_ >> the kernel is about to read from a page cache page and >> user space shared/writable mappings of this page potentially >> exist, this routine is called. >> In your case, the kernel is only reading (write=0 passed to >> access_process_vm and get_user_pages). In normal situations, >> the page in question will have only a private and read-only >> mapping in user space. So the call should not be required in >> these cases--if the code can tell that's so. >> Perhaps something like the following would be safe. >> But you really need some VM folks to tell you for sure. >> diff --git a/mm/memory.c b/mm/memory.c >> index 02e48aa..2864ee7 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -1484,7 +1484,8 @@ int __get_user_pages(struct task_struct *tsk, >> struct mm_struct *mm, >> pages[i] = page; >> flush_anon_page(vma, page, start); >> - flush_dcache_page(page); >> + if ((vm_flags & VM_WRITE) || (vma->vm_flags & VM_SHARED) >> + flush_dcache_page(page); >> } >> if (vmas) >> vmas[i] = vma; >> Thanks, >> Roland > > Andrew any comment? I don't have any comments on this specific patch but I will note that special care is needed _after_ the access to kick out aliases so that other future accesses to this page, which are oblivious to what ptrace did, don't see illegal D-cache aliases. Have a look at arch/sparc/kernel/ptrace_64.c:flush_ptrace_access(). Also, another issue is that much of the time ptrace() is just fetching very small chunks (perhaps, a stack frame, or some variable in the program image), so doing an entire page flush when we only copy a few bytes out of the page is overkill. Sparc64's flush_ptrace_access() tries to address this as well. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Flushing whole page instead of work for ptrace 2010-11-30 23:32 ` Flushing whole page instead of work " Roland McGrath 2010-12-01 17:10 ` Michal Simek @ 2010-12-03 15:00 ` Oleg Nesterov 2010-12-03 16:28 ` Minchan Kim 1 sibling, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2010-12-03 15:00 UTC (permalink / raw) To: Roland McGrath Cc: michal.simek, Andrew Morton, LKML, linux-mm, John Williams, Edgar E. Iglesias On 11/30, Roland McGrath wrote: > > Documentation/cachetlb.txt says: > > Any time the kernel writes to a page cache page, _OR_ > the kernel is about to read from a page cache page and > user space shared/writable mappings of this page potentially > exist, this routine is called. > > In your case, the kernel is only reading (write=0 passed to > access_process_vm and get_user_pages). In normal situations, > the page in question will have only a private and read-only > mapping in user space. So the call should not be required in > these cases--if the code can tell that's so. > > Perhaps something like the following would be safe. > But you really need some VM folks to tell you for sure. > > diff --git a/mm/memory.c b/mm/memory.c > index 02e48aa..2864ee7 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1484,7 +1484,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > pages[i] = page; > > flush_anon_page(vma, page, start); > - flush_dcache_page(page); > + if ((vm_flags & VM_WRITE) || (vma->vm_flags & VM_SHARED) > + flush_dcache_page(page); First of all, I know absolutely nothing about D-cache aliasing. My poor understanding of flush_dcache_page() is: synchronize the kernel/user vision of this memory, in the case when either side can change it. If this is true, then this change doesn't look right in general. Even if (vma->vm_flags & VM_SHARED) == 0, it is possible that tsk can write to this memory, this mapping can be writable and private. Even if we ensure that this mapping is readonly/private, another user-space process can write to this page via shared/writable mapping. I'd like to know if my understanding is correct, I am just curious. Oleg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Flushing whole page instead of work for ptrace 2010-12-03 15:00 ` Oleg Nesterov @ 2010-12-03 16:28 ` Minchan Kim 2010-12-03 17:07 ` Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Minchan Kim @ 2010-12-03 16:28 UTC (permalink / raw) To: Oleg Nesterov Cc: Roland McGrath, michal.simek, Andrew Morton, LKML, linux-mm, John Williams, Edgar E. Iglesias, Hugh Dickins, Nick Piggin On Fri, Dec 03, 2010 at 04:00:21PM +0100, Oleg Nesterov wrote: > On 11/30, Roland McGrath wrote: > > > > Documentation/cachetlb.txt says: > > > > Any time the kernel writes to a page cache page, _OR_ > > the kernel is about to read from a page cache page and > > user space shared/writable mappings of this page potentially > > exist, this routine is called. > > > > In your case, the kernel is only reading (write=0 passed to > > access_process_vm and get_user_pages). In normal situations, > > the page in question will have only a private and read-only > > mapping in user space. So the call should not be required in > > these cases--if the code can tell that's so. > > > > Perhaps something like the following would be safe. > > But you really need some VM folks to tell you for sure. > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 02e48aa..2864ee7 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -1484,7 +1484,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > > pages[i] = page; > > > > flush_anon_page(vma, page, start); > > - flush_dcache_page(page); > > + if ((vm_flags & VM_WRITE) || (vma->vm_flags & VM_SHARED) > > + flush_dcache_page(page); > > First of all, I know absolutely nothing about D-cache aliasing. > My poor understanding of flush_dcache_page() is: synchronize the > kernel/user vision of this memory, in the case when either side > can change it. > > If this is true, then this change doesn't look right in general. > > Even if (vma->vm_flags & VM_SHARED) == 0, it is possible that > tsk can write to this memory, this mapping can be writable and > private. > > Even if we ensure that this mapping is readonly/private, another > user-space process can write to this page via shared/writable > mapping. > I think you're right. It has a portential that other processes have a such mapping. > > I'd like to know if my understanding is correct, I am just curious. > > Oleg. How about this? Maybe this patch would mitigate the overhead. But I am not sure this patch. Cced GUP experts. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Flushing whole page instead of work for ptrace 2010-12-03 16:28 ` Minchan Kim @ 2010-12-03 17:07 ` Oleg Nesterov 2010-12-04 14:57 ` Minchan Kim 0 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2010-12-03 17:07 UTC (permalink / raw) To: Minchan Kim Cc: Roland McGrath, michal.simek, Andrew Morton, LKML, linux-mm, John Williams, Edgar E. Iglesias, Hugh Dickins, Nick Piggin On 12/04, Minchan Kim wrote: > > On Fri, Dec 03, 2010 at 04:00:21PM +0100, Oleg Nesterov wrote: > > On 11/30, Roland McGrath wrote: > > > > > > Documentation/cachetlb.txt says: > > > > > > Any time the kernel writes to a page cache page, _OR_ > > > the kernel is about to read from a page cache page and > > > user space shared/writable mappings of this page potentially > > > exist, this routine is called. > > > > > > In your case, the kernel is only reading (write=0 passed to > > > access_process_vm and get_user_pages). In normal situations, > > > the page in question will have only a private and read-only > > > mapping in user space. So the call should not be required in > > > these cases--if the code can tell that's so. > > > > > > Perhaps something like the following would be safe. > > > But you really need some VM folks to tell you for sure. > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 02e48aa..2864ee7 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -1484,7 +1484,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > > > pages[i] = page; > > > > > > flush_anon_page(vma, page, start); > > > - flush_dcache_page(page); > > > + if ((vm_flags & VM_WRITE) || (vma->vm_flags & VM_SHARED) > > > + flush_dcache_page(page); > > > > First of all, I know absolutely nothing about D-cache aliasing. > > My poor understanding of flush_dcache_page() is: synchronize the > > kernel/user vision of this memory, in the case when either side > > can change it. > > > > If this is true, then this change doesn't look right in general. > > > > Even if (vma->vm_flags & VM_SHARED) == 0, it is possible that > > tsk can write to this memory, this mapping can be writable and > > private. > > > > Even if we ensure that this mapping is readonly/private, another > > user-space process can write to this page via shared/writable > > mapping. > > > > I think you're right. It has a portential that other processes have > a such mapping. > > > > > I'd like to know if my understanding is correct, I am just curious. > > > > Oleg. > > How about this? > Maybe this patch would mitigate the overhead. > But I am not sure this patch. Cced GUP experts. > > From 8fb3d84c7bb32c4ba9c4a0063198ce7cfcca6b37 Mon Sep 17 00:00:00 2001 > From: Minchan Kim <minchan.kim@gmail.com> > Date: Sat, 4 Dec 2010 01:19:43 +0900 > Subject: [PATCH] Remove redundant flush_dcache_page in GUP > > If we get the page with handle_mm_fault, it already handled > page flush. So GUP's flush_dcache_page call is redundant. Oh, I am not sure. Say, do_wp_page() can only clear !pte_write(), but let me remind I do not understand this magic. However, evem if this change is correct, I am not sure it can solve the original problem. Debugger issues a lot of short reads, I don't think follow_page() fails that often. But this is only my guess. > Cc: Hugh Dickins <hughd@google.com> > Cc: Nick Piggin <npiggin@kernel.dk> > Signed-off-by: Minchan Kim <minchan.kim@gmail.com> > > --- > mm/memory.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index ebfeedf..9166f4b 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1430,6 +1430,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > do { > struct page *page; > unsigned int foll_flags = gup_flags; > + bool dcache_flushed = false; > > /* > * If we have a pending SIGKILL, don't keep faulting > @@ -1464,6 +1465,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > tsk->maj_flt++; > else > tsk->min_flt++; > + dcache_flushed = true; > > /* > * The VM_FAULT_WRITE bit tells us that > @@ -1489,7 +1491,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > pages[i] = page; > > flush_anon_page(vma, page, start); > - flush_dcache_page(page); > + if (!dcache_flushed) > + flush_dcache_page(page); > } > if (vmas) > vmas[i] = vma; > -- > 1.7.0.4 > > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-mm' in > > the body to majordomo@kvack.org. For more info on Linux MM, > > see: http://www.linux-mm.org/ . > > Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ > > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > -- > Kind regards, > Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Flushing whole page instead of work for ptrace 2010-12-03 17:07 ` Oleg Nesterov @ 2010-12-04 14:57 ` Minchan Kim 0 siblings, 0 replies; 9+ messages in thread From: Minchan Kim @ 2010-12-04 14:57 UTC (permalink / raw) To: Oleg Nesterov Cc: Roland McGrath, michal.simek, Andrew Morton, LKML, linux-mm, John Williams, Edgar E. Iglesias, Hugh Dickins, Nick Piggin On Fri, Dec 03, 2010 at 06:07:12PM +0100, Oleg Nesterov wrote: > On 12/04, Minchan Kim wrote: > > > > On Fri, Dec 03, 2010 at 04:00:21PM +0100, Oleg Nesterov wrote: > > > On 11/30, Roland McGrath wrote: > > > > > > > > Documentation/cachetlb.txt says: > > > > > > > > Any time the kernel writes to a page cache page, _OR_ > > > > the kernel is about to read from a page cache page and > > > > user space shared/writable mappings of this page potentially > > > > exist, this routine is called. > > > > > > > > In your case, the kernel is only reading (write=0 passed to > > > > access_process_vm and get_user_pages). In normal situations, > > > > the page in question will have only a private and read-only > > > > mapping in user space. So the call should not be required in > > > > these cases--if the code can tell that's so. > > > > > > > > Perhaps something like the following would be safe. > > > > But you really need some VM folks to tell you for sure. > > > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > > index 02e48aa..2864ee7 100644 > > > > --- a/mm/memory.c > > > > +++ b/mm/memory.c > > > > @@ -1484,7 +1484,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > > > > pages[i] = page; > > > > > > > > flush_anon_page(vma, page, start); > > > > - flush_dcache_page(page); > > > > + if ((vm_flags & VM_WRITE) || (vma->vm_flags & VM_SHARED) > > > > + flush_dcache_page(page); > > > > > > First of all, I know absolutely nothing about D-cache aliasing. > > > My poor understanding of flush_dcache_page() is: synchronize the > > > kernel/user vision of this memory, in the case when either side > > > can change it. > > > > > > If this is true, then this change doesn't look right in general. > > > > > > Even if (vma->vm_flags & VM_SHARED) == 0, it is possible that > > > tsk can write to this memory, this mapping can be writable and > > > private. > > > > > > Even if we ensure that this mapping is readonly/private, another > > > user-space process can write to this page via shared/writable > > > mapping. > > > > > > > I think you're right. It has a portential that other processes have > > a such mapping. > > > > > > > > I'd like to know if my understanding is correct, I am just curious. > > > > > > Oleg. > > > > How about this? > > Maybe this patch would mitigate the overhead. > > But I am not sure this patch. Cced GUP experts. > > > > From 8fb3d84c7bb32c4ba9c4a0063198ce7cfcca6b37 Mon Sep 17 00:00:00 2001 > > From: Minchan Kim <minchan.kim@gmail.com> > > Date: Sat, 4 Dec 2010 01:19:43 +0900 > > Subject: [PATCH] Remove redundant flush_dcache_page in GUP > > > > If we get the page with handle_mm_fault, it already handled > > page flush. So GUP's flush_dcache_page call is redundant. > > Oh, I am not sure. Say, do_wp_page() can only clear !pte_write(), > but let me remind I do not understand this magic. You're right. I missed that. In dirty/young bit emulation arch, it doesn't call flush_dcache_page. > > However, evem if this change is correct, I am not sure it can solve > the original problem. Debugger issues a lot of short reads, I don't > think follow_page() fails that often. Absolutely. I was in a hurry. Anyway, It's a interesting. I agree whole page flush for just short some bytes is overkill. But in concept of VM, page flush of GUP is right. Do we really need new interface for this problem? It avoids page flush and just returns the page which includes data for ptrace. Then, arch_ptrace have to flush the data(page or bytes) before some operation(ex, copy) It makes ptrace code very ugly. Hmm.. Sorry, I don't have no idea. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-12-04 14:57 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-26 12:31 Flushing whole page instead of work for ptrace Michal Simek 2010-11-30 11:07 ` Flushing whole page instead of exact address " Michal Simek 2010-11-30 23:32 ` Flushing whole page instead of work " Roland McGrath 2010-12-01 17:10 ` Michal Simek 2010-12-01 17:57 ` David Miller 2010-12-03 15:00 ` Oleg Nesterov 2010-12-03 16:28 ` Minchan Kim 2010-12-03 17:07 ` Oleg Nesterov 2010-12-04 14:57 ` Minchan Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).