From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kara Subject: Re: [PATCH] mm/fs: don't keep pages when receiving a pending SIGKILL in __get_user_pages() Date: Tue, 28 Jan 2014 00:59:13 +0100 Message-ID: <20140127235913.GC7020@quack.suse.cz> References: <52D65568.6080106@huawei.com> <52D7D7AE.8070108@huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: David Rientjes , Li Zefan , robin.yb@huawei.com, Andrew Morton , Mel Gorman , Rik van Riel , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org To: Xishi Qiu Return-path: Content-Disposition: inline In-Reply-To: <52D7D7AE.8070108@huawei.com> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Thu 16-01-14 20:59:26, Xishi Qiu wrote: > On 2014/1/16 7:15, David Rientjes wrote: >=20 > > On Wed, 15 Jan 2014, Xishi Qiu wrote: > >=20 > >> In the process IO direction, dio_refill_pages will call get_user_pag= es_fast=20 > >> to map the page from user space. If ret is less than 0 and IO is wri= te, the=20 > >> function will create a zero page to fill data. This may work for som= e file=20 > >> system, but in some device operate we prefer whole write or fail, no= t half=20 > >> data half zero, e.g. fs metadata, like inode, identy. > >> This happens often when kill a process which is doing direct IO. Con= sider=20 > >> the following cases, the process A is doing IO process, may enter __= get_user_pages=20 > >> function, if other processes send process A SIG_KILL, A will enter t= he=20 > >> following branches=20 > >> /* > >> * If we have a pending SIGKILL, don't keep faulting > >> * pages and potentially allocating memory. > >> */ > >> if (unlikely(fatal_signal_pending(current))) > >> return i ? i : -ERESTARTSYS; > >> Return current pages. direct IO will write the pages, the subsequent= pages=20 > >> which can=E2=80=99t get will use zero page instead.=20 > >> This patch will modify this judgment, if receive SIG_KILL, release p= ages and=20 > >> return an error. Direct IO will find no blocks_available and return = error=20 > >> direct, rather than half IO data and half zero page. > >> > >> Signed-off-by: Xishi Qiu > >> Signed-off-by: Bin Yang > >=20 > > It's scary to change the behavior of gup when some callers may want t= he=20 > > exact opposite of what you're intending here, which is sane fallback = by=20 > > mapping the zero page. In fact, gup never does put_page() itself and= =20 > > __get_user_pages() always returns the number of pages pinned and may = not=20 > > equal what is passed. > >=20 > > So, this definitely isn't the right solution for a special-case direc= t IO. =20 > > Instead, it would be better to code this directly in the caller and=20 > > compare the return value with nr_pages in dio_refill_pages() and then= do=20 > > the put_page() itself before falling back to ZERO_PAGE(). >=20 > Hi Rientjes, > You are right, we should not change the behavior of gup. > I have a question, if we only get a part of the pages from get_user_pag= es_fast(), > shall we write them to the disk? or add a check before write? > I'm not familiar with fs. It is OK to write as many pages as you get and then bail out from direc= t IO. OTOH if you are sending a SIGKILL to an application, you probably wan= t to kill it as soon as possible and sending IO can take some time. So in m= y opinion it is more desirable to just drop page references we've got in dio_refill_pages() and bail out immediately. Honza --=20 Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org