From mboxrd@z Thu Jan  1 00:00:00 1970
From: Steve French <smfltc@us.ibm.com>
Subject: filesystem behavior when low on memory and PF_MEMALLOC
Date: 27 Apr 2004 11:20:53 -0500
Sender: linux-fsdevel-owner@vger.kernel.org
Message-ID: <1083082853.13165.15.camel@stevef95.austin.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from e6.ny.us.ibm.com ([32.97.182.106]:43168 "EHLO e6.ny.us.ibm.com")
	by vger.kernel.org with ESMTP id S264235AbUD0QVv (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 27 Apr 2004 12:21:51 -0400
Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150])
	by e6.ny.us.ibm.com (8.12.10/8.12.2) with ESMTP id i3RGLofQ727682
	for <linux-fsdevel@vger.kernel.org>; Tue, 27 Apr 2004 12:21:50 -0400
Received: from stevef95-009041091094.austin.ibm.com (d01av02.pok.ibm.com [9.56.224.216])
	by northrelay02.pok.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i3RGM18N085928
	for <linux-fsdevel@vger.kernel.org>; Tue, 27 Apr 2004 12:22:02 -0400
To: linux-fsdevel@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

Does PF_MEMALLOC have a similar effect to setting SLAB_NOFS and
equivalent on memory allocations? and prevent memory allocations in
critical code paths from blocking?

Sergey Vlasov recently made a good suggestion about fixing a problem
with very large file copy hangs via the use of the PF_MEMALLOC.

He noted that shrink_caches can cause writepage (cifs_writepage in my
case) to be invoked to write out dirty pages - but writepage needs to
allocate memory both explicitly (for each the 4.5K cifs write buffer)
and implicitly as a result of using the sockets API (sock_sendmsg can
allocate memory) but this presumably can block.  In addition the cifs
demultiplex thread needs to get an acknowledgement from the server to
before waking up the writepage thread - but the demultiplex thread can
allocate memory in some cases.

His suggested solution was to add the PF_MEMALLOC flag to the
current->flags for the demultiplex thread, which makes sense and seems
similar to what XFS and a few other filesystems do in some of their
daemons.   What was harder to evaluate though was how to fix the context
of the process doing writepage - is it ok to temporarily set PF_MEMALLOC
on entry to a filesystems writepage and writepages routines? Or would
this be redundant since the linux/mm code should already be doing this
in all low memory paths in the calling function? Is it ok to clear the
flag - always clearing PF_MEMALLOC on exit from cifs_writepage (and
eventually cifs_writepages when that is added).  The alternative is to
set SLAB_NOFS and equivalent on memory allocations on all calls in cifs
on behalf of writepages which would probably be ok but would hit more
code and make the codepaths trickier (figuring out if an smb buffer
allocation e.g. came from writepage).  My initial observations was that
there is a significant performance hit setting SLAB_NOFS on all cifs
buffer allocations (although I think that this is what at least one
other filesystem basically does) - it seems like overkill when writepage
(and possibly prepare_write/commit_write) are the ones that matter for
performance during low memory situations as pages are being freed.