All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Avi Kivity <avi@exanet.com>
Cc: Pavel Machek <pavel@ucw.cz>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Deadlock during heavy write activity to userspace NFS server on local NFS mount
Date: Wed, 28 Jul 2004 20:30:50 +1000	[thread overview]
Message-ID: <4107805A.3090609@yahoo.com.au> (raw)
In-Reply-To: <41077BB5.7050007@exanet.com>

Avi Kivity wrote:
> Nick Piggin wrote:

>>> What's stopping the NFS server from ooming the machine then? Every 
>>> time some bit of memory becomes free, the server will consume it 
>>> instantly. Eventually ext3 will not be able to write anything out 
>>> because it is out of memory.
>>>
>> The NFS server should do the writeout a page at a time.
> 
> 
> The NFS server writes not only in response to page reclaim (as a local 
> NFS client), but also in response to pressure from non-local clients. If 
> both ext3 and NFS have the same allocation limits, NFS may starve out ext3.
> 

What do you mean starve out ext3? ext3 gets written to *by the NFS server*
which is PF_MEMALLOC.

> (In my case the NFS server actually writes data asynchronously, so it 
> doesn't really know it is responding to page reclaim, but the problem 
> occurs even in a synchrounous NFS server.)
> 

I can't see this being the responsibility of the kernel. The NFS server
could probably find out if it is servicing a loopback request or not.
Remote requests don't help to free memory... unless maybe you want a
filesystem on a remote nbd to be exported back to server via NFS or
something crazy.

>>
>>> An even more complex case is when ext3 depends on some other process, 
>>> say it is mounted on a loopback nbd.
>>>
>>>  dirty NFS data -> NFS server -> ext3 -> nbd -> nbd server on 
>>> localhost -> ext3/raw device
>>>
>>> You can't have both the NFS server and the nbd server PF_MEMALLOC, 
>>> since the NFS server may consume all memory, then wait for the nbd 
>>> server to reclaim.
>>>
>> The memory allocators will block when memory reaches the reserved
>> mark. Page reclaim will ask NFS to free one page, so the server
>> will write something out to the filesystem, this will cause the nbd
>> server (also PF_MEMALLOC) to write out to its backing filesystem.
> 
> 
> If NFS and nbd have the same limit, then NFS may cause nbd to stall. 
> We've already established that NFS must be PF_MEMALLOC, so nbd must be 
> PF_MEMALLOC_HARDER or something like that.

No, your NFS server has to be coded differently. You can't allow it
to use up all PF_MEMALLOC memory just because it can.

> 
>> The solution I have in mind is to replace the sync allocation logic from
>>
>>>
>>>    if (free_mem() < some_global_limit && !current->PF_MEMALLOC)
>>>        wait_for_kswapd()
>>>
>>> to
>>>
>>>    if (free_mem() < current->limit)
>>>        wait_for_kswapd()
>>>
>>> kswapd would have the lowest ->limit, other processes as their place 
>>> in the food chain dictates. 
>>
>>
>>
>> I think this is barking up the wrong tree. It really doesn't matter
>> what process is freeing memory. There isn't really anything special
>> about the way kswapd frees memory.
> 
> 
> To free memory you need (a) to allocate memory (b) possibly wait for 
> some freeing process to make some progress. That means all processes in 
> the freeing chain must be able to allocate at least some memory. If two 
> processes in the chain share the same blocking logic, they may deadlock 
> on each other.
> 

The PF_MEMALLOC path isn't to be used like that. If a *single*
PF_MEMALLOC task were to allocate all its memory then that would
be a bug too.

  reply	other threads:[~2004-07-28 10:31 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-26 13:11 [PATCH] Deadlock during heavy write activity to userspace NFS server on local NFS mount Avi Kivity
2004-07-26 21:02 ` Pavel Machek
2004-07-27 20:22   ` Avi Kivity
2004-07-27 20:34     ` Pavel Machek
2004-07-27 21:02       ` Avi Kivity
2004-07-28  1:29         ` Nick Piggin
2004-07-28  2:17           ` Trond Myklebust
2004-07-28  5:13             ` Avi Kivity
2004-07-28  5:11           ` Avi Kivity
2004-07-28  5:29             ` Nick Piggin
2004-07-28  7:05               ` Avi Kivity
2004-07-28  7:16                 ` Nick Piggin
2004-07-28  7:45                   ` Avi Kivity
2004-07-28  9:05                     ` Nick Piggin
2004-07-28 10:11                       ` Avi Kivity
2004-07-28 10:30                         ` Nick Piggin [this message]
2004-07-28 11:48                           ` Avi Kivity
2004-07-29  8:29                             ` Nick Piggin
2004-07-29 12:19                               ` Marcelo Tosatti
2004-07-29 16:09                               ` Avi Kivity
2004-07-28 12:08       ` Mikulas Patocka
2004-07-28 12:18         ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4107805A.3090609@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=avi@exanet.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.