linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/fs: don't keep pages when receiving a pending SIGKILL in __get_user_pages()
@ 2014-01-15  9:31 Xishi Qiu
  2014-01-15 23:15 ` David Rientjes
  0 siblings, 1 reply; 4+ messages in thread
From: Xishi Qiu @ 2014-01-15  9:31 UTC (permalink / raw)
  To: Li Zefan, robin.yb, Andrew Morton, Mel Gorman, riel
  Cc: Xishi Qiu, linux-fsdevel, Linux MM, LKML

In the process IO direction, dio_refill_pages will call get_user_pages_fast 
to map the page from user space. If ret is less than 0 and IO is write, the 
function will create a zero page to fill data. This may work for some file 
system, but in some device operate we prefer whole write or fail, not half 
data half zero, e.g. fs metadata, like inode, identy.
This happens often when kill a process which is doing direct IO. Consider 
the following cases, the process A is doing IO process, may enter __get_user_pages 
function, if other processes send process A SIG_KILL, A will enter the 
following branches 
		/*
		 * If we have a pending SIGKILL, don't keep faulting
		 * pages and potentially allocating memory.
		 */
		if (unlikely(fatal_signal_pending(current)))
			return i ? i : -ERESTARTSYS;
Return current pages. direct IO will write the pages, the subsequent pages 
which can’t get will use zero page instead. 
This patch will modify this judgment, if receive SIG_KILL, release pages and 
return an error. Direct IO will find no blocks_available and return error 
direct, rather than half IO data and half zero page.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Bin Yang <robin.yb@huawei.com>
---
 mm/memory.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 6768ce9..0568faa 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1799,8 +1799,14 @@ long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 			 * If we have a pending SIGKILL, don't keep faulting
 			 * pages and potentially allocating memory.
 			 */
-			if (unlikely(fatal_signal_pending(current)))
-				return i ? i : -ERESTARTSYS;
+			if (unlikely(fatal_signal_pending(current))) {
+				int j;
+				for (j = 0; j < i; j++) {
+					put_page(pages[j]);
+					pages[j] = NULL;
+				}
+				return  -ERESTARTSYS;
+			}
 
 			cond_resched();
 			while (!(page = follow_page_mask(vma, start,
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/fs: don't keep pages when receiving a pending SIGKILL in __get_user_pages()
  2014-01-15  9:31 [PATCH] mm/fs: don't keep pages when receiving a pending SIGKILL in __get_user_pages() Xishi Qiu
@ 2014-01-15 23:15 ` David Rientjes
  2014-01-16 12:59   ` Xishi Qiu
  0 siblings, 1 reply; 4+ messages in thread
From: David Rientjes @ 2014-01-15 23:15 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Li Zefan, robin.yb, Andrew Morton, Mel Gorman, Rik van Riel,
	linux-fsdevel, linux-mm, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1900 bytes --]

On Wed, 15 Jan 2014, Xishi Qiu wrote:

> In the process IO direction, dio_refill_pages will call get_user_pages_fast 
> to map the page from user space. If ret is less than 0 and IO is write, the 
> function will create a zero page to fill data. This may work for some file 
> system, but in some device operate we prefer whole write or fail, not half 
> data half zero, e.g. fs metadata, like inode, identy.
> This happens often when kill a process which is doing direct IO. Consider 
> the following cases, the process A is doing IO process, may enter __get_user_pages 
> function, if other processes send process A SIG_KILL, A will enter the 
> following branches 
> 		/*
> 		 * If we have a pending SIGKILL, don't keep faulting
> 		 * pages and potentially allocating memory.
> 		 */
> 		if (unlikely(fatal_signal_pending(current)))
> 			return i ? i : -ERESTARTSYS;
> Return current pages. direct IO will write the pages, the subsequent pages 
> which can’t get will use zero page instead. 
> This patch will modify this judgment, if receive SIG_KILL, release pages and 
> return an error. Direct IO will find no blocks_available and return error 
> direct, rather than half IO data and half zero page.
> 
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> Signed-off-by: Bin Yang <robin.yb@huawei.com>

It's scary to change the behavior of gup when some callers may want the 
exact opposite of what you're intending here, which is sane fallback by 
mapping the zero page.  In fact, gup never does put_page() itself and 
__get_user_pages() always returns the number of pages pinned and may not 
equal what is passed.

So, this definitely isn't the right solution for a special-case direct IO.  
Instead, it would be better to code this directly in the caller and 
compare the return value with nr_pages in dio_refill_pages() and then do 
the put_page() itself before falling back to ZERO_PAGE().

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/fs: don't keep pages when receiving a pending SIGKILL in __get_user_pages()
  2014-01-15 23:15 ` David Rientjes
@ 2014-01-16 12:59   ` Xishi Qiu
  2014-01-27 23:59     ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Xishi Qiu @ 2014-01-16 12:59 UTC (permalink / raw)
  To: David Rientjes
  Cc: Li Zefan, robin.yb, Andrew Morton, Mel Gorman, Rik van Riel,
	linux-fsdevel, linux-mm, linux-kernel

On 2014/1/16 7:15, David Rientjes wrote:

> On Wed, 15 Jan 2014, Xishi Qiu wrote:
> 
>> In the process IO direction, dio_refill_pages will call get_user_pages_fast 
>> to map the page from user space. If ret is less than 0 and IO is write, the 
>> function will create a zero page to fill data. This may work for some file 
>> system, but in some device operate we prefer whole write or fail, not half 
>> data half zero, e.g. fs metadata, like inode, identy.
>> This happens often when kill a process which is doing direct IO. Consider 
>> the following cases, the process A is doing IO process, may enter __get_user_pages 
>> function, if other processes send process A SIG_KILL, A will enter the 
>> following branches 
>> 		/*
>> 		 * If we have a pending SIGKILL, don't keep faulting
>> 		 * pages and potentially allocating memory.
>> 		 */
>> 		if (unlikely(fatal_signal_pending(current)))
>> 			return i ? i : -ERESTARTSYS;
>> Return current pages. direct IO will write the pages, the subsequent pages 
>> which can’t get will use zero page instead. 
>> This patch will modify this judgment, if receive SIG_KILL, release pages and 
>> return an error. Direct IO will find no blocks_available and return error 
>> direct, rather than half IO data and half zero page.
>>
>> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
>> Signed-off-by: Bin Yang <robin.yb@huawei.com>
> 
> It's scary to change the behavior of gup when some callers may want the 
> exact opposite of what you're intending here, which is sane fallback by 
> mapping the zero page.  In fact, gup never does put_page() itself and 
> __get_user_pages() always returns the number of pages pinned and may not 
> equal what is passed.
> 
> So, this definitely isn't the right solution for a special-case direct IO.  
> Instead, it would be better to code this directly in the caller and 
> compare the return value with nr_pages in dio_refill_pages() and then do 
> the put_page() itself before falling back to ZERO_PAGE().

Hi Rientjes,
You are right, we should not change the behavior of gup.
I have a question, if we only get a part of the pages from get_user_pages_fast(),
shall we write them to the disk? or add a check before write?
I'm not familiar with fs.

dio_refill_pages()
	get_user_pages_fast()

Thanks,
Xishi Qiu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/fs: don't keep pages when receiving a pending SIGKILL in __get_user_pages()
  2014-01-16 12:59   ` Xishi Qiu
@ 2014-01-27 23:59     ` Jan Kara
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2014-01-27 23:59 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: David Rientjes, Li Zefan, robin.yb, Andrew Morton, Mel Gorman,
	Rik van Riel, linux-fsdevel, linux-mm, linux-kernel

On Thu 16-01-14 20:59:26, Xishi Qiu wrote:
> On 2014/1/16 7:15, David Rientjes wrote:
> 
> > On Wed, 15 Jan 2014, Xishi Qiu wrote:
> > 
> >> In the process IO direction, dio_refill_pages will call get_user_pages_fast 
> >> to map the page from user space. If ret is less than 0 and IO is write, the 
> >> function will create a zero page to fill data. This may work for some file 
> >> system, but in some device operate we prefer whole write or fail, not half 
> >> data half zero, e.g. fs metadata, like inode, identy.
> >> This happens often when kill a process which is doing direct IO. Consider 
> >> the following cases, the process A is doing IO process, may enter __get_user_pages 
> >> function, if other processes send process A SIG_KILL, A will enter the 
> >> following branches 
> >> 		/*
> >> 		 * If we have a pending SIGKILL, don't keep faulting
> >> 		 * pages and potentially allocating memory.
> >> 		 */
> >> 		if (unlikely(fatal_signal_pending(current)))
> >> 			return i ? i : -ERESTARTSYS;
> >> Return current pages. direct IO will write the pages, the subsequent pages 
> >> which can’t get will use zero page instead. 
> >> This patch will modify this judgment, if receive SIG_KILL, release pages and 
> >> return an error. Direct IO will find no blocks_available and return error 
> >> direct, rather than half IO data and half zero page.
> >>
> >> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> >> Signed-off-by: Bin Yang <robin.yb@huawei.com>
> > 
> > It's scary to change the behavior of gup when some callers may want the 
> > exact opposite of what you're intending here, which is sane fallback by 
> > mapping the zero page.  In fact, gup never does put_page() itself and 
> > __get_user_pages() always returns the number of pages pinned and may not 
> > equal what is passed.
> > 
> > So, this definitely isn't the right solution for a special-case direct IO.  
> > Instead, it would be better to code this directly in the caller and 
> > compare the return value with nr_pages in dio_refill_pages() and then do 
> > the put_page() itself before falling back to ZERO_PAGE().
> 
> Hi Rientjes,
> You are right, we should not change the behavior of gup.
> I have a question, if we only get a part of the pages from get_user_pages_fast(),
> shall we write them to the disk? or add a check before write?
> I'm not familiar with fs.
  It is OK to write as many pages as you get and then bail out from direct
IO. OTOH if you are sending a SIGKILL to an application, you probably want
to kill it as soon as possible and sending IO can take some time. So in my
opinion it is more desirable to just drop page references we've got in
dio_refill_pages() and bail out immediately.

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-01-27 23:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-15  9:31 [PATCH] mm/fs: don't keep pages when receiving a pending SIGKILL in __get_user_pages() Xishi Qiu
2014-01-15 23:15 ` David Rientjes
2014-01-16 12:59   ` Xishi Qiu
2014-01-27 23:59     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).