Linux NFS development
 help / color / mirror / Atom feed
* Is there a good reason that nfs4_state_manager should use a work_queue?
@ 2014-07-08  5:21 NeilBrown
  2014-07-08  6:46 ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2014-07-08  5:21 UTC (permalink / raw)
  To: Trond Myklebust, Chuck Lever; +Cc: NFS

[-- Attachment #1: Type: text/plain, Size: 1945 bytes --]


Hi,
 I came a across a machine recently which has multiple threads blocked in
 nfs_writedata_alloc().  They were waiting for mempool_alloc to provide an
 allocation but it never did.
 Memory was tight and all the pre-allocations were in use by pending requests.
 These requests were queued on "NFS client" which means they were waiting for
 the state manager to do something.
 But there was no state manager.  Presumably kthread_run failed when it tried
 to allocate some memory.
 I cannot see anything that would retry the attempt to start the thread, and
 even if there was, we probably need to complete some NFS writes before more
 memory comes available.

 In this particular case the main problem was quite separate.  Too many large
 processes and not enough swap space, and the OOM killer missed its target.
 So even if NFS had worked perfectly the machine would still have locked up.
 But it does suggest that there is a weakness here.

 As kthread_create used GFP_KERNEL to allocate a thread, and as writes can
 block waiting for the thread to be created, there appears to be room for a
 deadlock.

 My thought is that this could be fixed by using a WQ_MEM_RECLAIM work queue.
 The WQ_MEM_RECLAIM flag ensures there is always at least one thread running
 so no allocation is needed.  Before diving in and trying to implement this I
 thought it would be safest to ask as there are two issues that I'm not
 certain of.

 1/ nfs4_run_state_manager() explicitly allows SIGKILL.  Why is this?
    Is there some situation where it might be appropriate to kill the manager
    thread?

 2/ would it be reasonable to have a single work queue for all nfs clients?
    In the worst case this could serialise reclaim across all clients so we
    wouldn't want any reclaim attempt to block indefinitely.  Is that likely
    to be a big problem do you think?

Thanks for any hints or suggestions,

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Is there a good reason that nfs4_state_manager should use a work_queue?
  2014-07-08  5:21 Is there a good reason that nfs4_state_manager should use a work_queue? NeilBrown
@ 2014-07-08  6:46 ` Christoph Hellwig
  2014-07-08  6:57   ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2014-07-08  6:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: Trond Myklebust, Chuck Lever, NFS

On Tue, Jul 08, 2014 at 03:21:00PM +1000, NeilBrown wrote:
>  2/ would it be reasonable to have a single work queue for all nfs clients?
>     In the worst case this could serialise reclaim across all clients so we
>     wouldn't want any reclaim attempt to block indefinitely.  Is that likely
>     to be a big problem do you think?

A workqueue isn't serialized, it has a max_active paramater to control
the concurrency of execution.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Is there a good reason that nfs4_state_manager should use a work_queue?
  2014-07-08  6:46 ` Christoph Hellwig
@ 2014-07-08  6:57   ` NeilBrown
  0 siblings, 0 replies; 3+ messages in thread
From: NeilBrown @ 2014-07-08  6:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Trond Myklebust, Chuck Lever, NFS

[-- Attachment #1: Type: text/plain, Size: 741 bytes --]

On Mon, 7 Jul 2014 23:46:41 -0700 Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Jul 08, 2014 at 03:21:00PM +1000, NeilBrown wrote:
> >  2/ would it be reasonable to have a single work queue for all nfs clients?
> >     In the worst case this could serialise reclaim across all clients so we
> >     wouldn't want any reclaim attempt to block indefinitely.  Is that likely
> >     to be a big problem do you think?
> 
> A workqueue isn't serialized, it has a max_active paramater to control
> the concurrency of execution.

True, but if memory is tight it may not be able to create new threads.
WQ_MEM_RECLAIM guarantees there will be at least one, but worst-case
that might be all that we have.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-07-08  6:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-08  5:21 Is there a good reason that nfs4_state_manager should use a work_queue? NeilBrown
2014-07-08  6:46 ` Christoph Hellwig
2014-07-08  6:57   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox