From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC v2 0/5] cgroup-aware unbound workqueues Date: Tue, 11 Jun 2019 12:55:49 -0700 Message-ID: <20190611195549.GL3341036@devbig004.ftw2.facebook.com> References: <20190605133650.28545-1-daniel.m.jordan@oracle.com> <20190605135319.GK374014@devbig004.ftw2.facebook.com> <20190605153229.nvxr6j7tdzffwkgj@ca-dmjordan1.us.oracle.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=AwCIOFChcpLV9jmax1ImmjMnCxcc76+OXa2FfkpzRbI=; b=M9Z4LDyf9SZZLX+d8CLiaU1622gjvuImt3hgOgTYF2aM+GSl0AKsuxaM4G0Of4fQYH bV1tH5bOuSISTn6BvaZWTPMZ61j8tdlblPxANEmHu4dHb+K2/Nb8rXPQXf+d0yj/iA4G zdZXb+Tavk9kH5/1IApep03ogPUtZ3a8dntz14yOU4tABUdlci1Etq2OELbaeHGQxhdt 0AhAsXZHBu3i5/EHAds23Y5tIy86tRFCy7esVxwHdzhzrZ1+5l8/WwVhL1qdEe/mOG5r 8olwrJbO9oO1VXfufZSI7X/eJtOdsyQWzv9g+iTH6uKOQHboVP4PlxGpynBp/2YzJ1ek 2GmA== Content-Disposition: inline In-Reply-To: <20190605153229.nvxr6j7tdzffwkgj@ca-dmjordan1.us.oracle.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Daniel Jordan Cc: hannes@cmpxchg.org, jiangshanlai@gmail.com, lizefan@huawei.com, bsd@redhat.com, dan.j.williams@intel.com, dave.hansen@intel.com, juri.lelli@redhat.com, mhocko@kernel.org, peterz@infradead.org, steven.sistare@oracle.com, tglx@linutronix.de, tom.hromatka@oracle.com, vdavydov.dev@gmail.com, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, shakeelb@google.com Hello, Daniel. On Wed, Jun 05, 2019 at 11:32:29AM -0400, Daniel Jordan wrote: > Sure, quoting from the last ktask post: > > A single CPU can spend an excessive amount of time in the kernel operating > on large amounts of data. Often these situations arise during initialization- > and destruction-related tasks, where the data involved scales with system size. > These long-running jobs can slow startup and shutdown of applications and the > system itself while extra CPUs sit idle. > > To ensure that applications and the kernel continue to perform well as core > counts and memory sizes increase, harness these idle CPUs to complete such jobs > more quickly. > > ktask is a generic framework for parallelizing CPU-intensive work in the > kernel. The API is generic enough to add concurrency to many different kinds > of tasks--for example, zeroing a range of pages or evicting a list of > inodes--and aims to save its clients the trouble of splitting up the work, > choosing the number of threads to use, maintaining an efficient concurrency > level, starting these threads, and load balancing the work between them. Yeah, that rings a bell. > > For memory and io, we're generally going for remote charging, where a > > kthread explicitly says who the specific io or allocation is for, > > combined with selective back-charging, where the resource is charged > > and consumed unconditionally even if that would put the usage above > > the current limits temporarily. From what I've been seeing recently, > > combination of the two give us really good control quality without > > being too invasive across the stack. > > Yes, for memory I actually use remote charging. In patch 3 the worker's > current->active_memcg field is changed to match that of the cgroup associated > with the work. I see. > > CPU doesn't have a backcharging mechanism yet and depending on the use > > case, we *might* need to put kthreads in different cgroups. However, > > such use cases might not be that abundant and there may be gotaches > > which require them to be force-executed and back-charged (e.g. fs > > compression from global reclaim). > > The CPU-intensiveness of these works is one of the reasons for actually putting > the workers through the migration path. I don't know of a way to get the > workers to respect the cpu controller (and even cpuset for that matter) without > doing that. So, I still think it'd likely be better to go back-charging route than actually putting kworkers in non-root cgroups. That's gonna be way cheaper, simpler and makes avoiding inadvertent priority inversions trivial. Thanks. -- tejun