From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Avi Kivity <avi@exanet.com>
Cc: Pavel Machek <pavel@ucw.cz>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Deadlock during heavy write activity to userspace NFS server on local NFS mount
Date: Wed, 28 Jul 2004 20:30:50 +1000 [thread overview]
Message-ID: <4107805A.3090609@yahoo.com.au> (raw)
In-Reply-To: <41077BB5.7050007@exanet.com>
Avi Kivity wrote:
> Nick Piggin wrote:
>>> What's stopping the NFS server from ooming the machine then? Every
>>> time some bit of memory becomes free, the server will consume it
>>> instantly. Eventually ext3 will not be able to write anything out
>>> because it is out of memory.
>>>
>> The NFS server should do the writeout a page at a time.
>
>
> The NFS server writes not only in response to page reclaim (as a local
> NFS client), but also in response to pressure from non-local clients. If
> both ext3 and NFS have the same allocation limits, NFS may starve out ext3.
>
What do you mean starve out ext3? ext3 gets written to *by the NFS server*
which is PF_MEMALLOC.
> (In my case the NFS server actually writes data asynchronously, so it
> doesn't really know it is responding to page reclaim, but the problem
> occurs even in a synchrounous NFS server.)
>
I can't see this being the responsibility of the kernel. The NFS server
could probably find out if it is servicing a loopback request or not.
Remote requests don't help to free memory... unless maybe you want a
filesystem on a remote nbd to be exported back to server via NFS or
something crazy.
>>
>>> An even more complex case is when ext3 depends on some other process,
>>> say it is mounted on a loopback nbd.
>>>
>>> dirty NFS data -> NFS server -> ext3 -> nbd -> nbd server on
>>> localhost -> ext3/raw device
>>>
>>> You can't have both the NFS server and the nbd server PF_MEMALLOC,
>>> since the NFS server may consume all memory, then wait for the nbd
>>> server to reclaim.
>>>
>> The memory allocators will block when memory reaches the reserved
>> mark. Page reclaim will ask NFS to free one page, so the server
>> will write something out to the filesystem, this will cause the nbd
>> server (also PF_MEMALLOC) to write out to its backing filesystem.
>
>
> If NFS and nbd have the same limit, then NFS may cause nbd to stall.
> We've already established that NFS must be PF_MEMALLOC, so nbd must be
> PF_MEMALLOC_HARDER or something like that.
No, your NFS server has to be coded differently. You can't allow it
to use up all PF_MEMALLOC memory just because it can.
>
>> The solution I have in mind is to replace the sync allocation logic from
>>
>>>
>>> if (free_mem() < some_global_limit && !current->PF_MEMALLOC)
>>> wait_for_kswapd()
>>>
>>> to
>>>
>>> if (free_mem() < current->limit)
>>> wait_for_kswapd()
>>>
>>> kswapd would have the lowest ->limit, other processes as their place
>>> in the food chain dictates.
>>
>>
>>
>> I think this is barking up the wrong tree. It really doesn't matter
>> what process is freeing memory. There isn't really anything special
>> about the way kswapd frees memory.
>
>
> To free memory you need (a) to allocate memory (b) possibly wait for
> some freeing process to make some progress. That means all processes in
> the freeing chain must be able to allocate at least some memory. If two
> processes in the chain share the same blocking logic, they may deadlock
> on each other.
>
The PF_MEMALLOC path isn't to be used like that. If a *single*
PF_MEMALLOC task were to allocate all its memory then that would
be a bug too.
next prev parent reply other threads:[~2004-07-28 10:31 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-07-26 13:11 [PATCH] Deadlock during heavy write activity to userspace NFS server on local NFS mount Avi Kivity
2004-07-26 21:02 ` Pavel Machek
2004-07-27 20:22 ` Avi Kivity
2004-07-27 20:34 ` Pavel Machek
2004-07-27 21:02 ` Avi Kivity
2004-07-28 1:29 ` Nick Piggin
2004-07-28 2:17 ` Trond Myklebust
2004-07-28 5:13 ` Avi Kivity
2004-07-28 5:11 ` Avi Kivity
2004-07-28 5:29 ` Nick Piggin
2004-07-28 7:05 ` Avi Kivity
2004-07-28 7:16 ` Nick Piggin
2004-07-28 7:45 ` Avi Kivity
2004-07-28 9:05 ` Nick Piggin
2004-07-28 10:11 ` Avi Kivity
2004-07-28 10:30 ` Nick Piggin [this message]
2004-07-28 11:48 ` Avi Kivity
2004-07-29 8:29 ` Nick Piggin
2004-07-29 12:19 ` Marcelo Tosatti
2004-07-29 16:09 ` Avi Kivity
2004-07-28 12:08 ` Mikulas Patocka
2004-07-28 12:18 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4107805A.3090609@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=avi@exanet.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pavel@ucw.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox