public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Avi Kivity <avi@exanet.com>
Cc: Pavel Machek <pavel@ucw.cz>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Deadlock during heavy write activity to userspace NFS server on local NFS mount
Date: Wed, 28 Jul 2004 20:30:50 +1000	[thread overview]
Message-ID: <4107805A.3090609@yahoo.com.au> (raw)
In-Reply-To: <41077BB5.7050007@exanet.com>

Avi Kivity wrote:
> Nick Piggin wrote:

>>> What's stopping the NFS server from ooming the machine then? Every 
>>> time some bit of memory becomes free, the server will consume it 
>>> instantly. Eventually ext3 will not be able to write anything out 
>>> because it is out of memory.
>>>
>> The NFS server should do the writeout a page at a time.
> 
> 
> The NFS server writes not only in response to page reclaim (as a local 
> NFS client), but also in response to pressure from non-local clients. If 
> both ext3 and NFS have the same allocation limits, NFS may starve out ext3.
> 

What do you mean starve out ext3? ext3 gets written to *by the NFS server*
which is PF_MEMALLOC.

> (In my case the NFS server actually writes data asynchronously, so it 
> doesn't really know it is responding to page reclaim, but the problem 
> occurs even in a synchrounous NFS server.)
> 

I can't see this being the responsibility of the kernel. The NFS server
could probably find out if it is servicing a loopback request or not.
Remote requests don't help to free memory... unless maybe you want a
filesystem on a remote nbd to be exported back to server via NFS or
something crazy.

>>
>>> An even more complex case is when ext3 depends on some other process, 
>>> say it is mounted on a loopback nbd.
>>>
>>>  dirty NFS data -> NFS server -> ext3 -> nbd -> nbd server on 
>>> localhost -> ext3/raw device
>>>
>>> You can't have both the NFS server and the nbd server PF_MEMALLOC, 
>>> since the NFS server may consume all memory, then wait for the nbd 
>>> server to reclaim.
>>>
>> The memory allocators will block when memory reaches the reserved
>> mark. Page reclaim will ask NFS to free one page, so the server
>> will write something out to the filesystem, this will cause the nbd
>> server (also PF_MEMALLOC) to write out to its backing filesystem.
> 
> 
> If NFS and nbd have the same limit, then NFS may cause nbd to stall. 
> We've already established that NFS must be PF_MEMALLOC, so nbd must be 
> PF_MEMALLOC_HARDER or something like that.

No, your NFS server has to be coded differently. You can't allow it
to use up all PF_MEMALLOC memory just because it can.

> 
>> The solution I have in mind is to replace the sync allocation logic from
>>
>>>
>>>    if (free_mem() < some_global_limit && !current->PF_MEMALLOC)
>>>        wait_for_kswapd()
>>>
>>> to
>>>
>>>    if (free_mem() < current->limit)
>>>        wait_for_kswapd()
>>>
>>> kswapd would have the lowest ->limit, other processes as their place 
>>> in the food chain dictates. 
>>
>>
>>
>> I think this is barking up the wrong tree. It really doesn't matter
>> what process is freeing memory. There isn't really anything special
>> about the way kswapd frees memory.
> 
> 
> To free memory you need (a) to allocate memory (b) possibly wait for 
> some freeing process to make some progress. That means all processes in 
> the freeing chain must be able to allocate at least some memory. If two 
> processes in the chain share the same blocking logic, they may deadlock 
> on each other.
> 

The PF_MEMALLOC path isn't to be used like that. If a *single*
PF_MEMALLOC task were to allocate all its memory then that would
be a bug too.

  reply	other threads:[~2004-07-28 10:31 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-26 13:11 [PATCH] Deadlock during heavy write activity to userspace NFS server on local NFS mount Avi Kivity
2004-07-26 21:02 ` Pavel Machek
2004-07-27 20:22   ` Avi Kivity
2004-07-27 20:34     ` Pavel Machek
2004-07-27 21:02       ` Avi Kivity
2004-07-28  1:29         ` Nick Piggin
2004-07-28  2:17           ` Trond Myklebust
2004-07-28  5:13             ` Avi Kivity
2004-07-28  5:11           ` Avi Kivity
2004-07-28  5:29             ` Nick Piggin
2004-07-28  7:05               ` Avi Kivity
2004-07-28  7:16                 ` Nick Piggin
2004-07-28  7:45                   ` Avi Kivity
2004-07-28  9:05                     ` Nick Piggin
2004-07-28 10:11                       ` Avi Kivity
2004-07-28 10:30                         ` Nick Piggin [this message]
2004-07-28 11:48                           ` Avi Kivity
2004-07-29  8:29                             ` Nick Piggin
2004-07-29 12:19                               ` Marcelo Tosatti
2004-07-29 16:09                               ` Avi Kivity
2004-07-28 12:08       ` Mikulas Patocka
2004-07-28 12:18         ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4107805A.3090609@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=avi@exanet.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox