public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "dai.ngo@oracle.com" <dai.ngo@oracle.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: extremely long cl_tasks list
Date: Sat, 9 Nov 2024 00:40:11 +0000	[thread overview]
Message-ID: <cb3c663f10633368a7026de64fd147cc06d4d86f.camel@hammerspace.com> (raw)
In-Reply-To: <c278cba3f388eafa578f82dfddb219ddbdd8c01b.camel@hammerspace.com>

On Sat, 2024-11-09 at 00:03 +0000, Trond Myklebust wrote:
> On Fri, 2024-11-08 at 15:20 -0800, Dai Ngo wrote:
> > Hi Trond,
> > 
> > Currently cl_tasks is used to maintain the list of all rpc_task's
> > for each rpc_clnt.
> > 
> > Under heavy write load, we've seen this list grows to millions
> > of entries. Even though the list is extremely long, the system
> > still runs fine until the user wants to get the information of
> > all active RPC tasks by doing:
> > 
> > #  cat /sys/kernel/debug/sunrpc/rpc_clnt/N/tasks
> > 
> > When this happens, tasks_start() is called and it acquires the
> > rpc_clnt.cl_lock to walk the cl_tasks list, returning one entry
> > at a time to the caller. The cl_lock is held until all tasks on
> > this list have been processed.
> >       
> > While the cl_lock is held, completed RPC tasks have to spin wait
> > in rpc_task_release_client for the cl_lock. If there are millions
> > of entries in the cl_tasks list it will take a long time before
> > tasks_stop is called and the cl_lock is released.
> > 
> > Under heavy load condition the rpc_task_release_client threads
> > will use up all the available CPUs in the system, preventing other
> > jobs to run and this causes the system to temporarily lock up.
> >   
> > I'm looking for suggestions on how to address this issue. I think
> > one option is to convert the cl_tasks list to use xarray to
> > eliminate
> > the contention on the cl_lock and would like to get the opinion
> > from the community.
> 
> 
> No. We are definitely not going to add a gravity-challenged solution
> like xarray to solve a corner-case problem of list iteration.
> 
> Firstly, this is really only a problem for NFSv3 and NFSv4.0 because
> they don't actually throttle at the NFS layer.

Actually. Let me correct that...

NFSv4.1 does throttle at the NFS layer, but does so in the RPC prepare
callback, so perhaps it is affected here too.
However we could reduce that problem by moving the addition of the
rpc_task to the cl_tasks list to the call_start() function. Doing so
leads to less visibility into the full workings of the system, however
the active tasks will still be fully documented by the list, and if we
need to, we could supplement that information with a total number of 
queued tasks.

> 
> Secondly, having millions of entries associated with a single struct
> rpc_clnt, means living in latency hell, where waking up a sleeping
> task
> can mean living on the rpciod queue for several 100ms before
> execution
> starts due to the shear volume of tasks in the queue.

This is still not a major problem for NFSv4.1 since we do have
throttling happening immediately once the RPC call starts, and the task
is never awakened until it can be accommodated with a session slot.

> 
> So IMHO a better question would be: "What is a sensible throttling
> scheme for NFSv3 and NFSv4.0?"

Still a problem.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2024-11-09  0:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-08 23:20 extremely long cl_tasks list Dai Ngo
2024-11-09  0:03 ` Trond Myklebust
2024-11-09  0:40   ` Trond Myklebust [this message]
2024-11-09 22:05     ` Dai Ngo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb3c663f10633368a7026de64fd147cc06d4d86f.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=dai.ngo@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox