From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xishi Qiu Subject: [PATCH] mm/fs: don't keep pages when receiving a pending SIGKILL in __get_user_pages() Date: Wed, 15 Jan 2014 17:31:20 +0800 Message-ID: <52D65568.6080106@huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Xishi Qiu , , Linux MM , LKML To: Li Zefan , , Andrew Morton , Mel Gorman , Return-path: Received: from szxga02-in.huawei.com ([119.145.14.65]:59015 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751004AbaAOJcB (ORCPT ); Wed, 15 Jan 2014 04:32:01 -0500 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: In the process IO direction, dio_refill_pages will call get_user_pages_= fast=20 to map the page from user space. If ret is less than 0 and IO is write,= the=20 function will create a zero page to fill data. This may work for some f= ile=20 system, but in some device operate we prefer whole write or fail, not h= alf=20 data half zero, e.g. fs metadata, like inode, identy. This happens often when kill a process which is doing direct IO. Consid= er=20 the following cases, the process A is doing IO process, may enter __get= _user_pages=20 function, if other processes send process A SIG_KILL, A will enter the=20 following branches=20 /* * If we have a pending SIGKILL, don't keep faulting * pages and potentially allocating memory. */ if (unlikely(fatal_signal_pending(current))) return i ? i : -ERESTARTSYS; Return current pages. direct IO will write the pages, the subsequent pa= ges=20 which can=92t get will use zero page instead.=20 This patch will modify this judgment, if receive SIG_KILL, release page= s and=20 return an error. Direct IO will find no blocks_available and return err= or=20 direct, rather than half IO data and half zero page. Signed-off-by: Xishi Qiu Signed-off-by: Bin Yang --- mm/memory.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 6768ce9..0568faa 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1799,8 +1799,14 @@ long __get_user_pages(struct task_struct *tsk, s= truct mm_struct *mm, * If we have a pending SIGKILL, don't keep faulting * pages and potentially allocating memory. */ - if (unlikely(fatal_signal_pending(current))) - return i ? i : -ERESTARTSYS; + if (unlikely(fatal_signal_pending(current))) { + int j; + for (j =3D 0; j < i; j++) { + put_page(pages[j]); + pages[j] =3D NULL; + } + return -ERESTARTSYS; + } =20 cond_resched(); while (!(page =3D follow_page_mask(vma, start, --=20 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f47.google.com (mail-pa0-f47.google.com [209.85.220.47]) by kanga.kvack.org (Postfix) with ESMTP id 5C06C6B0037 for ; Wed, 15 Jan 2014 04:32:20 -0500 (EST) Received: by mail-pa0-f47.google.com with SMTP id kp14so901945pab.34 for ; Wed, 15 Jan 2014 01:32:20 -0800 (PST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com. [119.145.14.65]) by mx.google.com with ESMTPS id qv10si3135907pbb.232.2014.01.15.01.32.11 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 15 Jan 2014 01:32:19 -0800 (PST) Message-ID: <52D65568.6080106@huawei.com> Date: Wed, 15 Jan 2014 17:31:20 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: [PATCH] mm/fs: don't keep pages when receiving a pending SIGKILL in __get_user_pages() Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Li Zefan , robin.yb@huawei.com, Andrew Morton , Mel Gorman , riel@redhat.com Cc: Xishi Qiu , linux-fsdevel@vger.kernel.org, Linux MM , LKML In the process IO direction, dio_refill_pages will call get_user_pages_fast to map the page from user space. If ret is less than 0 and IO is write, the function will create a zero page to fill data. This may work for some file system, but in some device operate we prefer whole write or fail, not half data half zero, e.g. fs metadata, like inode, identy. This happens often when kill a process which is doing direct IO. Consider the following cases, the process A is doing IO process, may enter __get_user_pages function, if other processes send process A SIG_KILL, A will enter the following branches /* * If we have a pending SIGKILL, don't keep faulting * pages and potentially allocating memory. */ if (unlikely(fatal_signal_pending(current))) return i ? i : -ERESTARTSYS; Return current pages. direct IO will write the pages, the subsequent pages which can?t get will use zero page instead. This patch will modify this judgment, if receive SIG_KILL, release pages and return an error. Direct IO will find no blocks_available and return error direct, rather than half IO data and half zero page. Signed-off-by: Xishi Qiu Signed-off-by: Bin Yang --- mm/memory.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 6768ce9..0568faa 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1799,8 +1799,14 @@ long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, * If we have a pending SIGKILL, don't keep faulting * pages and potentially allocating memory. */ - if (unlikely(fatal_signal_pending(current))) - return i ? i : -ERESTARTSYS; + if (unlikely(fatal_signal_pending(current))) { + int j; + for (j = 0; j < i; j++) { + put_page(pages[j]); + pages[j] = NULL; + } + return -ERESTARTSYS; + } cond_resched(); while (!(page = follow_page_mask(vma, start, -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751807AbaAOJcF (ORCPT ); Wed, 15 Jan 2014 04:32:05 -0500 Received: from szxga02-in.huawei.com ([119.145.14.65]:59015 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751004AbaAOJcB (ORCPT ); Wed, 15 Jan 2014 04:32:01 -0500 Message-ID: <52D65568.6080106@huawei.com> Date: Wed, 15 Jan 2014 17:31:20 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Li Zefan , , Andrew Morton , Mel Gorman , CC: Xishi Qiu , , Linux MM , LKML Subject: [PATCH] mm/fs: don't keep pages when receiving a pending SIGKILL in __get_user_pages() Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In the process IO direction, dio_refill_pages will call get_user_pages_fast to map the page from user space. If ret is less than 0 and IO is write, the function will create a zero page to fill data. This may work for some file system, but in some device operate we prefer whole write or fail, not half data half zero, e.g. fs metadata, like inode, identy. This happens often when kill a process which is doing direct IO. Consider the following cases, the process A is doing IO process, may enter __get_user_pages function, if other processes send process A SIG_KILL, A will enter the following branches /* * If we have a pending SIGKILL, don't keep faulting * pages and potentially allocating memory. */ if (unlikely(fatal_signal_pending(current))) return i ? i : -ERESTARTSYS; Return current pages. direct IO will write the pages, the subsequent pages which can’t get will use zero page instead. This patch will modify this judgment, if receive SIG_KILL, release pages and return an error. Direct IO will find no blocks_available and return error direct, rather than half IO data and half zero page. Signed-off-by: Xishi Qiu Signed-off-by: Bin Yang --- mm/memory.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 6768ce9..0568faa 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1799,8 +1799,14 @@ long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, * If we have a pending SIGKILL, don't keep faulting * pages and potentially allocating memory. */ - if (unlikely(fatal_signal_pending(current))) - return i ? i : -ERESTARTSYS; + if (unlikely(fatal_signal_pending(current))) { + int j; + for (j = 0; j < i; j++) { + put_page(pages[j]); + pages[j] = NULL; + } + return -ERESTARTSYS; + } cond_resched(); while (!(page = follow_page_mask(vma, start, -- 1.7.1