All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Xen-devel] [PATCH v6 1/6] libxl: add infrastructure to track and query 'recent' domids
From: Durrant, Paul @ 2020-02-20 16:54 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Anthony Perard, xen-devel@lists.xenproject.org, Wei Liu
In-Reply-To: <24142.47029.435605.456811@mariner.uk.xensource.com>

> -----Original Message-----
> From: Ian Jackson <ian.jackson@citrix.com>
> Sent: 20 February 2020 16:46
> To: Durrant, Paul <pdurrant@amazon.co.uk>
> Cc: xen-devel@lists.xenproject.org; Wei Liu <wl@xen.org>; Anthony Perard
> <anthony.perard@citrix.com>
> Subject: RE: [PATCH v6 1/6] libxl: add infrastructure to track and query
> 'recent' domids
> 
> Durrant, Paul writes ("RE: [PATCH v6 1/6] libxl: add infrastructure to
> track and query 'recent' domids"):
> > [Ian:]
> > > IMO the reuse timeout call and the clock_gettime call should be put in
> > > libxl__open_domid_history; and the time filtering check should be
> > > folded into libxl__read_recent.
> >
> > Ok. I was having a hard time guessing just exactly what you're looking
> for. I think that makes it a little clearer.
> ...
> > > In my review of v4 I wrote:
> > >
> > >   Do you think this can be combined somehow ?  Possibilities seem to
> > >   include: explicit read_recent_{init,entry,finish} functions; a
> single
> > >   function with a "per-entry" callback; same but with a macro.  If you
> > >   do that you can also have it have handle the "file does not exist"
> > >   special case.
> > >
> > > You've provided the read_recent_entry function but the "init" function
> > > libxl__open_domid_history does too little.  This series seems to be
> > > moving towards what I called read_recent_{init,entry,finish} (which
> > > should probably have the timestamp and FILE* in a struct together) but
> > > it seems to be doing so quite slowly.
> >
> > Now again I'm not sure *exactly* what you want me to do, but I'll have
> another guess.
> 
> Maybe it would be better for us to try to come to a shared
> understanding before you do another respin...
> 

Not being co-located makes this somewhat tricky; I think it will basically still come down to me writing some code and then you saying whether that's what you meant... unless you can write some (pseudo-)code to illustrate? I think, from what you say below, I might now have a better idea of what you want so let's have one more go-around of me writing the code first :-)

> > >  - It encourages vacuous log messages whose content is mainly in the
> > >    function and line number framing:
> > >        +    LOGE(ERROR, "failed");
> > >        +    return ERROR_FAIL;
> > >        +}
> > >    rather than
> > >        if (!*f) {
> > >            LOGE(ERROR, "failed to open recent domid file `%s'", path);
> > >            rc = ERROR_FAIL;
> > >            goto out;
> > >        }
> > >    (and the latter is to be preferred).
> >
> > But by asking me to put the error handling inside the sub-functions I
> lost context such as 'path' which was present in the previous structure. I
> could pass in an argument purely for the benefit of a log function but
> that seems excessive. I guess I will pull the error logging out again, but
> that seemed to be against your requirement to de-duplicate code.
> 
> I think the path needs to be passed into these functions.  This is why
> I think the functions need to take a struct* as an argument, for their
> shared state (including the path and the other stuff).
> 

Ok, if that's the style you prefer I'll re-structure it that way.

  Paul

> Thanks,
> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply

* Re: [Xen-devel] [PATCH] rwlock: allow recursive read locking when already locked in write mode
From: Roger Pau Monné @ 2020-02-20 16:54 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich, xen-devel
In-Reply-To: <01f6d0de-c36f-7506-f64a-48d8278df455@suse.com>

On Thu, Feb 20, 2020 at 05:00:37PM +0100, Jürgen Groß wrote:
> On 20.02.20 16:57, Roger Pau Monné wrote:
> > On Thu, Feb 20, 2020 at 04:50:22PM +0100, Jan Beulich wrote:
> > > On 20.02.2020 16:17, Roger Pau Monné wrote:
> > > > On Thu, Feb 20, 2020 at 04:02:55PM +0100, Jan Beulich wrote:
> > > > > On 20.02.2020 15:11, Roger Pau Monné wrote:
> > > > > > On Thu, Feb 20, 2020 at 01:48:54PM +0100, Jan Beulich wrote:
> > > > > > > Another option is to use the recurse_cpu field of the
> > > > > > > associated spin lock: The field is used for recursive locks
> > > > > > > only, and hence the only conflict would be with
> > > > > > > _spin_is_locked(), which we don't (and in the future then
> > > > > > > also shouldn't) use on this lock.
> > > > > > 
> > > > > > I looked into that also, but things get more complicated AFAICT, as it's
> > > > > > not possible to atomically fetch the state of the lock and the owner
> > > > > > CPU at the same time. Neither you could set the LOCKED bit and the CPU
> > > > > > at the same time.
> > > > > 
> > > > > There's no need to atomically fetch both afaics: The field is
> > > > > valid only if the LOCKED bit is set. And when reading the
> > > > > field you only care about the value being equal to
> > > > > smp_processor_id(), i.e. it is fine to set LOCKED before
> > > > > updating the CPU field on lock, and to reset the CPU field to
> > > > > SPINLOCK_NO_CPU (or whatever it's called) before clearing
> > > > > LOCKED.
> > > > 
> > > > Yes that would require the usage of a sentinel value as 0 would be a
> > > > valid processor ID.
> > > > 
> > > > I would try to refrain from (abu)sing internal spinlock fields, as I
> > > > think we can solve this just by using the cnts field on the current
> > > > rwlock implementation.
> > > > 
> > > > What issue do you have in mind that would prevent storing the CPU
> > > > write locked in the cnts field?
> > > 
> > > The reduction of the number of bits used for other purposes.
> > 
> > I think it should be fine for now, as we would support systems with up
> > to 16384 CPU IDs, and 65536 concurrent read lockers, which mean each
> > CPU could take the lock up to 4 times recursively. Note that
> > supporting 16384 CPUs is still very, very far off the radar.
> 
> Current spinlock implementation limits NR_CPUS to 4096.

Right, but here we can use up to 14 bits, so I think it's best to
assign them now than leave them as padding?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply

* [MODERATED] Re: [PATCH 0/2] more sampling fun 0
From: Andi Kleen @ 2020-02-20 16:55 UTC (permalink / raw)
  To: speck
In-Reply-To: <20200220150549.GA3437955@kroah.com>

On Thu, Feb 20, 2020 at 04:05:49PM +0100, speck for Greg KH wrote:
> On Thu, Feb 20, 2020 at 06:55:10AM -0800, speck for Andi Kleen wrote:
> > > Then we need to stop using RDRAND internally for our "give me a random
> > > number api" which has spread to more and more parts of the kernel.
> > 
> > Only if that API is called frequently enough. AFAIK it is not. 
> 
> It's called by all sorts of places in the kernel today.
> 
> > Normally it's used for rare rekeying of hash tables etc., which
> > doesn't happen very often.
> 
> "normally" :)

I looked through the users. The ones that weren't clearly slow path
were SCTP connection setup, exec, and some strange abuse in bcache
that should probably be fixed anyways.

I suppose maybe for exec, but still not sure it's worth it.

But the exec use is for seeding user space RNG, so that's the one
that might actually be worth it.

-Andi

^ permalink raw reply

* Re: [PATCHv5 00/34] Add AFBC support for Rockchip
From: Daniel Vetter @ 2020-02-20 16:54 UTC (permalink / raw)
  To: Andrzej Pietrasiewicz
  Cc: Ayan Halder, kernel, David Airlie, Liviu Dudau, Sandy Huang,
	James Wang, dri-devel, Mihail Atanassov, Sean Paul
In-Reply-To: <20191217145020.14645-1-andrzej.p@collabora.com>

On Tue, Dec 17, 2019 at 03:49:46PM +0100, Andrzej Pietrasiewicz wrote:
> This series adds AFBC support for Rockchip. It is inspired by:
> 
> https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/factory-gru-9017.B-chromeos-4.4/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> 
> This is the fifth iteration of the afbc series. Between v3 and v4 a lot of
> rework has been done, the main goal of which was to move all afbc-related
> checks to helpers, so that core does not deal with it.
> 
> A new struct drm_afbc_framebuffer is added, which stores afbc-related
> driver-specific data. Because of that, in drivers that wish to
> use this feature, the struct must be allocated directly in the driver
> code rather than inside helpers, so the first portion of the patchset
> does the necessary refactoring.
> 
> Then, there are 3 users of afbc: komeda, malidp and, finally, rockchip,
> the latter being the ultimate purpose of this work and the 3 subsequent
> portions of the patchset move komeda and malidp to generic helpers and add
> afbc support to rockchip.
> 
> The idea is to make all afbc users follow a similar pattern. In fb_create()
> they allocate struct drm_afbc_framebuffer, do their specific checks which
> can be done before object lookups, do object lookups and a special version
> of a size check, which understands struct drm_afbc_framebuffer, followed
> by any other driver-specific checks and initializing the gem object.
> The helpers for the common parts are factored out so that drivers
> can use them.
> 
> The komeda driver has been the farthest away from such a pattern, so it
> required most changes. However, due to the fact that I don't have any
> komeda hardware I did the changes to komeda in an incremental fashion with
> a series of (usually) very small, easy to understand steps. malidp was
> pretty straightforward, and rockchip's afbc checks follow the pattern.
> 
> I kindly ask for reviewing the series. I need to mention that my ultimate
> goal is merging afbc for rockchip and I don't have other hardware, so some
> help from malidp and komeda developers/maintainers would be appreciated.
> 
> @Liviu, @James, @Mihail, @Brian: a kind request to you to have a look and
> test the patchset, as I don't have appropriate hardware.
> 
> Rebased onto drm-misc-next.
> 
> v4..v5:
> - used proper way of subclassing drm_framebuffer (Daniel Vetter)
> - added documentation to exported functions (Liviu Dudau)
> - reordered new functions in drm_gem_framebuffer_helper.c to make a saner
> diff (Liviu Dudau)
> - used "2" suffix instead of "_special" for the special version of size
> checks (Liviu Dudau)
> - dropped unnecessarily added condition in drm_get_format_info() (Liviu
> Dudau)
> - dropped "block_size = 0;" trick in framebuffer_check() (Daniel Vetter)
> - relaxed sticking to 80 characters per line rule in some cases
> 
> v3..v4:
> 
> - addressed (some) comments from Daniel Stone, Ezequiel Garcia, Daniel
> Vetter and James Qian Wang - thank you for input
> - refactored helpers to ease accommodating drivers with afbc needs
> - moved afbc checks to helpers
> - converted komeda, malidp and (the newly added) rockchip to use the afbc
> helpers
> - eliminated a separate, dedicated source code file
> 
> v2..v3:
> 
> - addressed (some) comments from Daniel Stone, Liviu Dudau, Daniel Vetter
> and Brian Starkey - thank you all
> 
> In this iteration some rework has been done. The checking logic is now moved
> to framebuffer_check() so it is common to all drivers. But the common part
> is not good for komeda, so this series is not good for merging yet.
> I kindly ask for feedback whether the changes are in the right direction.
> I also kindly ask for input on how to accommodate komeda.
> 
> The CONFIG_DRM_AFBC option has been eliminated in favour of adding
> drm_afbc.c to drm_kms_helper.
> 
> v1..v2:
> 
> - addressed comments from Daniel Stone, Ayan Halder, Mihail Atanassov
> - coding style fixes** BLURB HERE ***
> 
> 
> Andrzej Pietrasiewicz (34):
>   drm/core: Add afbc helper functions
>   drm/gem-fb-helper: Allow drivers to allocate struct drm_framebuffer on
>     their own
>   drm/gem-fb-helper: Add special version of drm_gem_fb_size_check
>   drm/gem-fb-helper: Add generic afbc size checks
>   drm/komeda: Use afbc helper
>   drm/komeda: Move checking src coordinates to komeda_fb_create
>   drm/komeda: Use the already available local variable
>   drm/komeda: Retrieve drm_format_info once
>   drm/komeda: Explicitly require 1 plane for AFBC
>   drm/komeda: Move pitches comparison to komeda_fb_create
>   drm/komeda: Provide and use komeda_fb_get_pixel_addr variant not
>     requiring a fb
>   drm/komeda: Factor out object lookups for non-afbc case
>   drm/komeda: Make komeda_fb_none_size_check independent from
>     framebuffer
>   drm/komeda: Factor out object lookups for afbc case
>   drm/komeda: Free komeda_fb_afbc_size_check from framebuffer dependency
>   drm/komeda: Simplify error handling
>   drm/komeda: Move object lookup before size checks
>   drm/komeda: Move object assignments to framebuffer to after size
>     checks
>   drm/komeda: Make the size checks independent from framebuffer
>     structure
>   drm/komeda: Move helper invocation to after size checks
>   drm/komeda: Use helper for common tasks
>   drm/komeda: Use return value of drm_gem_fb_lookup
>   drm/komeda: Use special helper for non-afbc size checks
>   drm/komeda: Factor in the invocation of special helper
>   drm/komeda: Use special helper for afbc case size check
>   drm/komeda: Factor in the invocation of special helper, afbc case
>   drm/komeda: Move special helper invocation outside if-else
>   drm/komeda: Move to helper checking afbc buffer size
>   drm/arm/malidp: Make verify funcitons invocations independent
>   drm/arm/malidp: Integrate verify functions
>   drm/arm/malidp: Factor in afbc framebuffer verification
>   drm/arm/malidp: Use generic helpers for afbc checks
>   drm/rockchip: Use helper for common task
>   drm/rockchip: Add support for afbc
> 
>  .../arm/display/komeda/d71/d71_component.c    |   6 +-
>  .../arm/display/komeda/komeda_framebuffer.c   | 273 ++++++++---------
>  .../arm/display/komeda/komeda_framebuffer.h   |  21 +-
>  .../display/komeda/komeda_pipeline_state.c    |  14 +-
>  drivers/gpu/drm/arm/malidp_drv.c              | 155 ++++------
>  drivers/gpu/drm/drm_fourcc.c                  |  53 ++++
>  drivers/gpu/drm/drm_gem_framebuffer_helper.c  | 287 ++++++++++++++----
>  drivers/gpu/drm/rockchip/rockchip_drm_fb.c    | 111 ++++++-
>  drivers/gpu/drm/rockchip/rockchip_drm_vop.c   | 147 ++++++++-
>  drivers/gpu/drm/rockchip/rockchip_drm_vop.h   |  12 +
>  drivers/gpu/drm/rockchip/rockchip_vop_reg.c   |  83 ++++-
>  include/drm/drm_fourcc.h                      |   4 +
>  include/drm/drm_framebuffer.h                 |  50 +++
>  include/drm/drm_gem_framebuffer_helper.h      |  34 +++
>  14 files changed, 907 insertions(+), 343 deletions(-)

I think this isn't achieving it's goal. Even if we take out the rockchip
enabling patch at the ent it's still like 200 lines more for something
that's supposed to unify and clean code up.

Plus it looks enormously complicated, something that I missed in my
previous quick glance. Hence proposal for all the things you're going to
add to drm core/helpers, and not a bit more :-)

int
drm_gem_fb_init_with_funcs(struct drm_framebuffer *fb,
			   struct drm_device *dev, struct drm_file *file,
			   const struct drm_mode_fb_cmd2 *mode_cmd,
			   const struct drm_framebuffer_funcs *funcs);

This is going to do _exactly_ what drm_gem_fb_create_with_funcs already
does, except it doesn't do the kzalloc (so that would need to be moved out
so we can share code). No other additional sub-parts exposed, I think
that's just not worth it in this case. So none of this size check stuff.

2nd piece, your drm_afbc_framebuffer as in patch 4, with the subclassing.

3rd piece, again in drm_gem_framebuffer_helper.c:

int drm_gem_afbc_init(struct drm_afbc_framebuffer *afbc_fb);

Drivers are supposed to call this after they've a) allocated their fb
structure, containing the drm_afbc_framebuffer somewhere and b) called
drm_gem_fb_init_with_funcs(). This function is going to fill out all the
additional fields, and this function is also going to do all the size
validation and everything else.

Nothing else, so no finer split up of helper check functions, or of afbc
computation functions, or of anything else. That mix of split-out stuff
and mix of computed values in drm_afbc_framebuffer but also functions that
compute afbc values from modifiers and fb sizes seems to just lead to a
huge confusion and not actually to a code reduction. So
- none of the functions exported in patch 1, just stuff them into
  drm_gem_framebuffer_helper.c.
- none of the helper subfunctions you export in patch 2, or adapt in patch
  3
- Also not this size check structure with the data pointer you add in
  patch 4.

The bikeshed I'm still seeing here is whether drm_afbc_framebuffer and the
drm_gem_afbc_init() function should be considered core stuff or not, I
guess you can make an argument for either.

This should also make conversion easier since as a first steps you'd do:

- Put the new drm_afbc_framebuffer in place and adjust all tha
  fb_to_komeda functions and upcasting (this should be mechanically)

- Add the call to drm_gem_afbc_init().

- Starting using the new values in drm_afbc_framebuffer and slowly delete
  code

- Optional, but would be nice to do: Convert driver over to
  drm_gem_fb_init_with_funcs().

Thoughts?

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* [Intel-gfx] [PATCH i-g-t] i915/gem_eio: Trim kms workload
From: Chris Wilson @ 2020-02-20 16:53 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

We don't need to try reset-stress on every engine with the display, just
once is enough to stress any interlinkage.

This should reduce the runtime to 10s.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Martin Peres <martin.peres@linux.intel.com>
---
 tests/i915/gem_eio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/i915/gem_eio.c b/tests/i915/gem_eio.c
index 0fe51efeb..1ec609410 100644
--- a/tests/i915/gem_eio.c
+++ b/tests/i915/gem_eio.c
@@ -898,8 +898,14 @@ static void test_kms(int i915, igt_display_t *dpy)
 
 	test_inflight(i915, 0);
 	if (gem_has_contexts(i915)) {
-		test_reset_stress(i915, 0);
-		test_reset_stress(i915, TEST_WEDGE);
+		uint32_t ctx = context_create_safe(i915);
+
+		reset_stress(i915, ctx,
+			     "default", I915_EXEC_DEFAULT, 0);
+		reset_stress(i915, ctx,
+			     "default", I915_EXEC_DEFAULT, TEST_WEDGE);
+
+		gem_context_destroy(i915, ctx);
 	}
 
 	*shared = 1;
-- 
2.25.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related

* Re:
From: Wayne Li @ 2020-02-20 16:52 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers
In-Reply-To: <CAFEAcA_A5J-j2cZN98j_9G49PAMauHGF47QBoeMK8y_RbMp5FA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3111 bytes --]

Thanks for the KVM_RUN explanation, that makes a lot of sense.  Though I'm
pretty sure I'm getting the live program counter value because I use the
"info registers" command in QEMU monitor and that calls
kvm_cpu_synchronize_state().  And also to be more sure, I added a few
assembly instructions into the kernel code that stored a value that
constantly incremented and the program counter into two general purpose
registers.  I was able to see both of these in QEMU monitor so the program
counter must actually be stuck at 0xfffffffc.

On Thu, Feb 20, 2020 at 3:57 AM Peter Maydell <peter.maydell@linaro.org>
wrote:

> On Thu, 20 Feb 2020 at 05:41, Wayne Li <waynli329@gmail.com> wrote:
> > Also, I do have another side question.  When running with KVM enabled, I
> >see the kernel-level ioctl call KVM_RUN running and then returning over
> >and over again (by the way before the VM kinda grinds to a halt I only see
> >QEMU make the KVM_RUN call twice, but the kernel-level ioctl function
> >is being called over and over again for some reason).  And each time the
> >KVM_RUN call returns, the return-from-interrupt takes the VM to the
> >address 0xFFFFFFFC.  What is the KVM_RUN ioctl call used for?  Why
> >is it being called over and over again?  Maybe if I understood this better
> >I'd be able to figure out what's stopping my program counter from
> incrementing.
>
> KVM_RUN is the ioctl which QEMU uses to tell the kernel "hey, please
> start running the guest". The kernel will not return control to QEMU
> (ie the syscall will not return) until something happens that needs
> QEMU's intervention, of which the main one is "guest made a device
> access that QEMU must handle". (You can see this run loop in
> accel/kvm/kvm-all.c:kvm_cpu_exec()). So in normal operation,
> QEMU will do a bunch of setup, and then call KVM_RUN, which
> will cause the guest to run, probably for quite a long time. The
> ioctl will return when the guest does some IO a QEMU-provided
> device, and QEMU will then do that device emulation, set up the
> kvm_run struct fields to hold the results of the device access,
> and call KVM_RUN again. The cycle continues like this until the
> VM is shut down. If the guest goes into an infinite loop, you
> should expect that KVM_RUN will never return, as there's never
> any need for the kernel to ask QEMU to do anything.
>
> Also important to note: before we call KVM_RUN we push
> the CPU register state up to the kernel (using various arch-specific
> ioctls), and after that the 'live copy' of the data is in the kernel,
> and values in the CPU state struct in QEMU are stale. We only
> get the up to date data out of the kernel when we need it, by
> calling kvm_cpu_synchronize_state(). So if your 0xfffffffc value
> comes from the CPU state struct and you're reading it at a
> point before the state has been synchronized back from the
> kernel it's not going to be the actual PC value.
>
> The KVM kernel API is comparatively well documented at
> https://www.kernel.org/doc/html/latest/virt/kvm/api.html
> though the section on KVM_RUN is pretty weak.
>
> thanks
> -- PMM
>

[-- Attachment #2: Type: text/html, Size: 3811 bytes --]

^ permalink raw reply

* [PATCH 2/2] dnf, libdnf: Ignore if PACKAGE_CLASSES does not have rpm
From: Khem Raj @ 2020-02-20 16:53 UTC (permalink / raw)
  To: openembedded-core
In-Reply-To: <20200220165318.3677793-1-raj.khem@gmail.com>

dnf does not work with opkg or dpkg/apt anyway

Signed-off-by: Khem Raj <raj.khem@gmail.com>
---
 meta/recipes-devtools/dnf/dnf_4.2.2.bb        | 2 ++
 meta/recipes-devtools/libdnf/libdnf_0.28.1.bb | 1 +
 2 files changed, 3 insertions(+)

diff --git a/meta/recipes-devtools/dnf/dnf_4.2.2.bb b/meta/recipes-devtools/dnf/dnf_4.2.2.bb
index f38167f1ad..220f1aabbd 100644
--- a/meta/recipes-devtools/dnf/dnf_4.2.2.bb
+++ b/meta/recipes-devtools/dnf/dnf_4.2.2.bb
@@ -84,3 +84,5 @@ SYSTEMD_SERVICE_${PN} = "dnf-makecache.service dnf-makecache.timer \
                          dnf-automatic-notifyonly.service dnf-automatic-notifyonly.timer \
 "
 SYSTEMD_AUTO_ENABLE ?= "disable"
+
+PNBLACKLIST[dnf] ?= "${@bb.utils.contains('PACKAGE_CLASSES', 'package_rpm', '', 'does not build correctly without package_rpm in PACKAGE_CLASSES', d)}"
diff --git a/meta/recipes-devtools/libdnf/libdnf_0.28.1.bb b/meta/recipes-devtools/libdnf/libdnf_0.28.1.bb
index 882c435b32..49afa04812 100644
--- a/meta/recipes-devtools/libdnf/libdnf_0.28.1.bb
+++ b/meta/recipes-devtools/libdnf/libdnf_0.28.1.bb
@@ -26,4 +26,5 @@ EXTRA_OECMAKE_append_class-native = " -DWITH_GIR=OFF"
 EXTRA_OECMAKE_append_class-nativesdk = " -DWITH_GIR=OFF"
 
 BBCLASSEXTEND = "native nativesdk"
+PNBLACKLIST[libdnf] ?= "${@bb.utils.contains('PACKAGE_CLASSES', 'package_rpm', '', 'does not build correctly without package_rpm in PACKAGE_CLASSES', d)}"
 
-- 
2.25.1



^ permalink raw reply related

* [PATCH 1/2] libsolv: Enable rpm packageconfig by default only if rpm O_P_M is enabled
From: Khem Raj @ 2020-02-20 16:53 UTC (permalink / raw)
  To: openembedded-core

opkg also depends on libsolv and can needlessly pull in rpm even if
the O_P_M does not desire to use rpm

Signed-off-by: Khem Raj <raj.khem@gmail.com>
---
 meta/recipes-extended/libsolv/libsolv_0.7.10.bb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/recipes-extended/libsolv/libsolv_0.7.10.bb b/meta/recipes-extended/libsolv/libsolv_0.7.10.bb
index 502f4e0e85..265a27c00d 100644
--- a/meta/recipes-extended/libsolv/libsolv_0.7.10.bb
+++ b/meta/recipes-extended/libsolv/libsolv_0.7.10.bb
@@ -18,7 +18,7 @@ S = "${WORKDIR}/git"
 
 inherit cmake
 
-PACKAGECONFIG ??= "rpm"
+PACKAGECONFIG ??= "${@bb.utils.contains('PACKAGE_CLASSES','package_rpm','rpm','',d)}"
 PACKAGECONFIG[rpm] = "-DENABLE_RPMMD=ON -DENABLE_RPMDB=ON,,rpm"
 
 EXTRA_OECMAKE = "-DMULTI_SEMANTICS=ON -DENABLE_COMPLEX_DEPS=ON"
-- 
2.25.1



^ permalink raw reply related

* [igt-dev] [PATCH i-g-t] i915/gem_eio: Trim kms workload
From: Chris Wilson @ 2020-02-20 16:53 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

We don't need to try reset-stress on every engine with the display, just
once is enough to stress any interlinkage.

This should reduce the runtime to 10s.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Martin Peres <martin.peres@linux.intel.com>
---
 tests/i915/gem_eio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/i915/gem_eio.c b/tests/i915/gem_eio.c
index 0fe51efeb..1ec609410 100644
--- a/tests/i915/gem_eio.c
+++ b/tests/i915/gem_eio.c
@@ -898,8 +898,14 @@ static void test_kms(int i915, igt_display_t *dpy)
 
 	test_inflight(i915, 0);
 	if (gem_has_contexts(i915)) {
-		test_reset_stress(i915, 0);
-		test_reset_stress(i915, TEST_WEDGE);
+		uint32_t ctx = context_create_safe(i915);
+
+		reset_stress(i915, ctx,
+			     "default", I915_EXEC_DEFAULT, 0);
+		reset_stress(i915, ctx,
+			     "default", I915_EXEC_DEFAULT, TEST_WEDGE);
+
+		gem_context_destroy(i915, ctx);
 	}
 
 	*shared = 1;
-- 
2.25.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related

* Re: crash on connect
From: Glauber Costa @ 2020-02-20 16:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Avi Kivity
In-Reply-To: <ec76784f-d9fa-d5e3-fcf1-87c2754e419b@kernel.dk>

On Thu, Feb 20, 2020 at 11:39 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 2/20/20 9:34 AM, Glauber Costa wrote:
> > On Thu, Feb 20, 2020 at 11:29 AM Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> On 2/20/20 9:17 AM, Jens Axboe wrote:
> >>> On 2/20/20 7:19 AM, Glauber Costa wrote:
> >>>> Hi there, me again
> >>>>
> >>>> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
> >>>>
> >>>> This test is easier to explain: it essentially issues a connect and a
> >>>> shutdown right away.
> >>>>
> >>>> It currently fails due to no fault of io_uring. But every now and then
> >>>> it crashes (you may have to run more than once to get it to crash)
> >>>>
> >>>> Instructions are similar to my last test.
> >>>> Except the test to build is now "tests/unit/connect_test"
> >>>> Code is at git@github.com:glommer/seastar.git  branch io-uring-connect-crash
> >>>>
> >>>> Run it with ./build/release/tests/unit/connect_test -- -c1
> >>>> --reactor-backend=uring
> >>>>
> >>>> Backtrace attached
> >>>
> >>> Perfect thanks, I'll take a look!
> >>
> >> Haven't managed to crash it yet, but every run complains:
> >>
> >> got to shutdown of 10 with refcnt: 2
> >> Refs being all dropped, calling forget for 10
> >> terminate called after throwing an instance of 'fmt::v6::format_error'
> >>   what():  argument index out of range
> >> unknown location(0): fatal error: in "unixdomain_server": signal: SIGABRT (application abort requested)
> >>
> >> Not sure if that's causing it not to fail here.
> >
> > Ok, that means it "passed". (I was in the process of figuring out
> > where I got this wrong when I started seeing the crashes)
>
> Can you do, in your kernel dir:
>
> $ gdb vmlinux
> [...]
> (gdb) l *__io_queue_sqe+0x4a
>
> and see what it says?

0xffffffff81375ada is in __io_queue_sqe (fs/io_uring.c:4814).
4809 struct io_kiocb *linked_timeout;
4810 struct io_kiocb *nxt = NULL;
4811 int ret;
4812
4813 again:
4814 linked_timeout = io_prep_linked_timeout(req);
4815
4816 ret = io_issue_sqe(req, sqe, &nxt, true);
4817
4818 /*

(I am not using timeouts, just async_cancel)
>
> --
> Jens Axboe
>

^ permalink raw reply

* [PATCH v3 3/3] loop: Charge i/o to mem and blk cg
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP (CGROUP),
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
In-Reply-To: <cover.1582216294.git.schatzberg.dan@gmail.com>

The current code only associates with the existing blkcg when aio is
used to access the backing file. This patch covers all types of i/o to
the backing file and also associates the memcg so if the backing file is
on tmpfs, memory is charged appropriately.

This patch also exports cgroup_get_e_css so it can be used by the loop
module.

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
---
 drivers/block/loop.c       | 59 ++++++++++++++++++++++++--------------
 drivers/block/loop.h       |  3 +-
 include/linux/memcontrol.h |  6 ++++
 kernel/cgroup/cgroup.c     |  1 +
 4 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 78e5005c6742..cc091a66b0d0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -77,6 +77,7 @@
 #include <linux/uio.h>
 #include <linux/ioprio.h>
 #include <linux/blk-cgroup.h>
+#include <linux/sched/mm.h>
 
 #include "loop.h"
 
@@ -134,7 +135,7 @@ static struct loop_func_table xor_funcs = {
 	.number = LO_CRYPT_XOR,
 	.transfer = transfer_xor,
 	.init = xor_init
-}; 
+};
 
 /* xfer_funcs[0] is special - its release function is never called */
 static struct loop_func_table *xfer_funcs[MAX_LO_CRYPT] = {
@@ -504,8 +505,6 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
 {
 	struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);
 
-	if (cmd->css)
-		css_put(cmd->css);
 	cmd->ret = ret;
 	lo_rw_aio_do_completion(cmd);
 }
@@ -566,8 +565,6 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	cmd->iocb.ki_complete = lo_rw_aio_complete;
 	cmd->iocb.ki_flags = IOCB_DIRECT;
 	cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
-	if (cmd->css)
-		kthread_associate_blkcg(cmd->css);
 
 	if (rw == WRITE)
 		ret = call_write_iter(file, &cmd->iocb, &iter);
@@ -575,7 +572,6 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 		ret = call_read_iter(file, &cmd->iocb, &iter);
 
 	lo_rw_aio_do_completion(cmd);
-	kthread_associate_blkcg(NULL);
 
 	if (ret != -EIOCBQUEUED)
 		cmd->iocb.ki_complete(&cmd->iocb, ret, 0);
@@ -913,7 +909,7 @@ struct loop_worker {
 	struct list_head cmd_list;
 	struct list_head idle_list;
 	struct loop_device *lo;
-	struct cgroup_subsys_state *css;
+	struct cgroup_subsys_state *blkcg_css;
 	unsigned long last_ran_at;
 };
 
@@ -930,7 +926,7 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 
 	spin_lock_irq(&lo->lo_lock);
 
-	if (!cmd->css)
+	if (!cmd->blkcg_css)
 		goto queue_work;
 
 	node = &(lo->worker_tree.rb_node);
@@ -938,10 +934,10 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 	while (*node) {
 		parent = *node;
 		cur_worker = container_of(*node, struct loop_worker, rb_node);
-		if ((long)cur_worker->css == (long)cmd->css) {
+		if ((long)cur_worker->blkcg_css == (long)cmd->blkcg_css) {
 			worker = cur_worker;
 			break;
-		} else if ((long)cur_worker->css < (long)cmd->css) {
+		} else if ((long)cur_worker->blkcg_css < (long)cmd->blkcg_css) {
 			node = &((*node)->rb_left);
 		} else {
 			node = &((*node)->rb_right);
@@ -954,13 +950,16 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 			GFP_NOWAIT | __GFP_NOWARN);
 	/*
 	 * In the event we cannot allocate a worker, just queue on the
-	 * rootcg worker
+	 * rootcg worker and issue the I/O as the rootcg
 	 */
-	if (!worker)
+	if (!worker) {
+		cmd->blkcg_css = NULL;
+		cmd->memcg_css = NULL;
 		goto queue_work;
+	}
 
-	worker->css = cmd->css;
-	css_get(worker->css);
+	worker->blkcg_css = cmd->blkcg_css;
+	css_get(worker->blkcg_css);
 	INIT_WORK(&worker->work, loop_workfn);
 	INIT_LIST_HEAD(&worker->cmd_list);
 	INIT_LIST_HEAD(&worker->idle_list);
@@ -2007,13 +2006,18 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	}
 
 	/* always use the first bio's css */
+	cmd->blkcg_css = NULL;
+	cmd->memcg_css = NULL;
 #ifdef CONFIG_BLK_CGROUP
-	if (cmd->use_aio && rq->bio && rq->bio->bi_blkg) {
-		cmd->css = &bio_blkcg(rq->bio)->css;
-		css_get(cmd->css);
-	} else
+	if (rq->bio && rq->bio->bi_blkg) {
+		cmd->blkcg_css = &bio_blkcg(rq->bio)->css;
+#ifdef CONFIG_MEMCG
+		cmd->memcg_css =
+			cgroup_get_e_css(cmd->blkcg_css->cgroup,
+					&memory_cgrp_subsys);
+#endif
+	}
 #endif
-		cmd->css = NULL;
 	loop_queue_work(lo, cmd);
 
 	return BLK_STS_OK;
@@ -2031,8 +2035,21 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
 		goto failed;
 	}
 
+	if (cmd->blkcg_css)
+		kthread_associate_blkcg(cmd->blkcg_css);
+	if (cmd->memcg_css)
+		memalloc_use_memcg(mem_cgroup_from_css(cmd->memcg_css));
+
 	ret = do_req_filebacked(lo, rq);
- failed:
+
+	if (cmd->blkcg_css)
+		kthread_associate_blkcg(NULL);
+
+	if (cmd->memcg_css) {
+		memalloc_unuse_memcg();
+		css_put(cmd->memcg_css);
+	}
+failed:
 	/* complete non-aio request */
 	if (!cmd->use_aio || ret) {
 		cmd->ret = ret ? -EIO : 0;
@@ -2106,7 +2123,7 @@ static void loop_free_idle_workers(struct timer_list *timer)
 			break;
 		list_del(&worker->idle_list);
 		rb_erase(&worker->rb_node, &lo->worker_tree);
-		css_put(worker->css);
+		css_put(worker->blkcg_css);
 		kfree(worker);
 	}
 	if (!list_empty(&lo->idle_worker_list))
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 87fd0e372227..3e65acf7a0e9 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -74,7 +74,8 @@ struct loop_cmd {
 	long ret;
 	struct kiocb iocb;
 	struct bio_vec *bvec;
-	struct cgroup_subsys_state *css;
+	struct cgroup_subsys_state *blkcg_css;
+	struct cgroup_subsys_state *memcg_css;
 };
 
 /* Support for loadable transfer modules */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a7a0a1a5c8d5..aeb51f2ded46 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -922,6 +922,12 @@ static inline struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
 	return NULL;
 }
 
+static inline
+struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 }
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 75f687301bbf..c3896c2e0942 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -587,6 +587,7 @@ struct cgroup_subsys_state *cgroup_get_e_css(struct cgroup *cgrp,
 	rcu_read_unlock();
 	return css;
 }
+EXPORT_SYMBOL_GPL(cgroup_get_e_css);
 
 static void cgroup_get_live(struct cgroup *cgrp)
 {
-- 
2.17.1


^ permalink raw reply related

* [PATCH v3 3/3] loop: Charge i/o to mem and blk cg
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP (CGROUP),
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
In-Reply-To: <cover.1582216294.git.schatzberg.dan@gmail.com>

The current code only associates with the existing blkcg when aio is
used to access the backing file. This patch covers all types of i/o to
the backing file and also associates the memcg so if the backing file is
on tmpfs, memory is charged appropriately.

This patch also exports cgroup_get_e_css so it can be used by the loop
module.

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
---
 drivers/block/loop.c       | 59 ++++++++++++++++++++++++--------------
 drivers/block/loop.h       |  3 +-
 include/linux/memcontrol.h |  6 ++++
 kernel/cgroup/cgroup.c     |  1 +
 4 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 78e5005c6742..cc091a66b0d0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -77,6 +77,7 @@
 #include <linux/uio.h>
 #include <linux/ioprio.h>
 #include <linux/blk-cgroup.h>
+#include <linux/sched/mm.h>
 
 #include "loop.h"
 
@@ -134,7 +135,7 @@ static struct loop_func_table xor_funcs = {
 	.number = LO_CRYPT_XOR,
 	.transfer = transfer_xor,
 	.init = xor_init
-}; 
+};
 
 /* xfer_funcs[0] is special - its release function is never called */
 static struct loop_func_table *xfer_funcs[MAX_LO_CRYPT] = {
@@ -504,8 +505,6 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
 {
 	struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);
 
-	if (cmd->css)
-		css_put(cmd->css);
 	cmd->ret = ret;
 	lo_rw_aio_do_completion(cmd);
 }
@@ -566,8 +565,6 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	cmd->iocb.ki_complete = lo_rw_aio_complete;
 	cmd->iocb.ki_flags = IOCB_DIRECT;
 	cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
-	if (cmd->css)
-		kthread_associate_blkcg(cmd->css);
 
 	if (rw == WRITE)
 		ret = call_write_iter(file, &cmd->iocb, &iter);
@@ -575,7 +572,6 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 		ret = call_read_iter(file, &cmd->iocb, &iter);
 
 	lo_rw_aio_do_completion(cmd);
-	kthread_associate_blkcg(NULL);
 
 	if (ret != -EIOCBQUEUED)
 		cmd->iocb.ki_complete(&cmd->iocb, ret, 0);
@@ -913,7 +909,7 @@ struct loop_worker {
 	struct list_head cmd_list;
 	struct list_head idle_list;
 	struct loop_device *lo;
-	struct cgroup_subsys_state *css;
+	struct cgroup_subsys_state *blkcg_css;
 	unsigned long last_ran_at;
 };
 
@@ -930,7 +926,7 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 
 	spin_lock_irq(&lo->lo_lock);
 
-	if (!cmd->css)
+	if (!cmd->blkcg_css)
 		goto queue_work;
 
 	node = &(lo->worker_tree.rb_node);
@@ -938,10 +934,10 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 	while (*node) {
 		parent = *node;
 		cur_worker = container_of(*node, struct loop_worker, rb_node);
-		if ((long)cur_worker->css == (long)cmd->css) {
+		if ((long)cur_worker->blkcg_css == (long)cmd->blkcg_css) {
 			worker = cur_worker;
 			break;
-		} else if ((long)cur_worker->css < (long)cmd->css) {
+		} else if ((long)cur_worker->blkcg_css < (long)cmd->blkcg_css) {
 			node = &((*node)->rb_left);
 		} else {
 			node = &((*node)->rb_right);
@@ -954,13 +950,16 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 			GFP_NOWAIT | __GFP_NOWARN);
 	/*
 	 * In the event we cannot allocate a worker, just queue on the
-	 * rootcg worker
+	 * rootcg worker and issue the I/O as the rootcg
 	 */
-	if (!worker)
+	if (!worker) {
+		cmd->blkcg_css = NULL;
+		cmd->memcg_css = NULL;
 		goto queue_work;
+	}
 
-	worker->css = cmd->css;
-	css_get(worker->css);
+	worker->blkcg_css = cmd->blkcg_css;
+	css_get(worker->blkcg_css);
 	INIT_WORK(&worker->work, loop_workfn);
 	INIT_LIST_HEAD(&worker->cmd_list);
 	INIT_LIST_HEAD(&worker->idle_list);
@@ -2007,13 +2006,18 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	}
 
 	/* always use the first bio's css */
+	cmd->blkcg_css = NULL;
+	cmd->memcg_css = NULL;
 #ifdef CONFIG_BLK_CGROUP
-	if (cmd->use_aio && rq->bio && rq->bio->bi_blkg) {
-		cmd->css = &bio_blkcg(rq->bio)->css;
-		css_get(cmd->css);
-	} else
+	if (rq->bio && rq->bio->bi_blkg) {
+		cmd->blkcg_css = &bio_blkcg(rq->bio)->css;
+#ifdef CONFIG_MEMCG
+		cmd->memcg_css =
+			cgroup_get_e_css(cmd->blkcg_css->cgroup,
+					&memory_cgrp_subsys);
+#endif
+	}
 #endif
-		cmd->css = NULL;
 	loop_queue_work(lo, cmd);
 
 	return BLK_STS_OK;
@@ -2031,8 +2035,21 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
 		goto failed;
 	}
 
+	if (cmd->blkcg_css)
+		kthread_associate_blkcg(cmd->blkcg_css);
+	if (cmd->memcg_css)
+		memalloc_use_memcg(mem_cgroup_from_css(cmd->memcg_css));
+
 	ret = do_req_filebacked(lo, rq);
- failed:
+
+	if (cmd->blkcg_css)
+		kthread_associate_blkcg(NULL);
+
+	if (cmd->memcg_css) {
+		memalloc_unuse_memcg();
+		css_put(cmd->memcg_css);
+	}
+failed:
 	/* complete non-aio request */
 	if (!cmd->use_aio || ret) {
 		cmd->ret = ret ? -EIO : 0;
@@ -2106,7 +2123,7 @@ static void loop_free_idle_workers(struct timer_list *timer)
 			break;
 		list_del(&worker->idle_list);
 		rb_erase(&worker->rb_node, &lo->worker_tree);
-		css_put(worker->css);
+		css_put(worker->blkcg_css);
 		kfree(worker);
 	}
 	if (!list_empty(&lo->idle_worker_list))
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 87fd0e372227..3e65acf7a0e9 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -74,7 +74,8 @@ struct loop_cmd {
 	long ret;
 	struct kiocb iocb;
 	struct bio_vec *bvec;
-	struct cgroup_subsys_state *css;
+	struct cgroup_subsys_state *blkcg_css;
+	struct cgroup_subsys_state *memcg_css;
 };
 
 /* Support for loadable transfer modules */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a7a0a1a5c8d5..aeb51f2ded46 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -922,6 +922,12 @@ static inline struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
 	return NULL;
 }
 
+static inline
+struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 }
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 75f687301bbf..c3896c2e0942 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -587,6 +587,7 @@ struct cgroup_subsys_state *cgroup_get_e_css(struct cgroup *cgrp,
 	rcu_read_unlock();
 	return css;
 }
+EXPORT_SYMBOL_GPL(cgroup_get_e_css);
 
 static void cgroup_get_live(struct cgroup *cgrp)
 {
-- 
2.17.1



^ permalink raw reply related

* [PATCH v3 2/3] mm: Charge active memcg when no mm is set
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP (CGROUP),
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
In-Reply-To: <cover.1582216294.git.schatzberg.dan@gmail.com>

memalloc_use_memcg() worked for kernel allocations but was silently
ignored for user pages.

This patch establishes a precedence order for who gets charged:

1. If there is a memcg associated with the page already, that memcg is
   charged. This happens during swapin.

2. If an explicit mm is passed, mm->memcg is charged. This happens
   during page faults, which can be triggered in remote VMs (eg gup).

3. Otherwise consult the current process context. If it has configured
   a current->active_memcg, use that. Otherwise, current->mm->memcg.

Previously, if a NULL mm was passed to mem_cgroup_try_charge (case 3) it
would always charge the root cgroup. Now it looks up the current
active_memcg first (falling back to charging the root cgroup if not
set).

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
---
 mm/memcontrol.c | 11 ++++++++---
 mm/shmem.c      |  2 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6f6dc8712e39..b174aff4f069 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6317,7 +6317,8 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
  * @compound: charge the page as compound or small page
  *
  * Try to charge @page to the memcg that @mm belongs to, reclaiming
- * pages according to @gfp_mask if necessary.
+ * pages according to @gfp_mask if necessary. If @mm is NULL, try to
+ * charge to the active memcg.
  *
  * Returns 0 on success, with *@memcgp pointing to the charged memcg.
  * Otherwise, an error code is returned.
@@ -6361,8 +6362,12 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
 		}
 	}
 
-	if (!memcg)
-		memcg = get_mem_cgroup_from_mm(mm);
+	if (!memcg) {
+		if (!mm)
+			memcg = get_mem_cgroup_from_current();
+		else
+			memcg = get_mem_cgroup_from_mm(mm);
+	}
 
 	ret = try_charge(memcg, gfp_mask, nr_pages);
 
diff --git a/mm/shmem.c b/mm/shmem.c
index c8f7540ef048..7c7f5acf89d6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1766,7 +1766,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 	}
 
 	sbinfo = SHMEM_SB(inode->i_sb);
-	charge_mm = vma ? vma->vm_mm : current->mm;
+	charge_mm = vma ? vma->vm_mm : NULL;
 
 	page = find_lock_entry(mapping, index);
 	if (xa_is_value(page)) {
-- 
2.17.1



^ permalink raw reply related

* [PATCH v3 2/3] mm: Charge active memcg when no mm is set
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP (CGROUP),
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
In-Reply-To: <cover.1582216294.git.schatzberg.dan@gmail.com>

memalloc_use_memcg() worked for kernel allocations but was silently
ignored for user pages.

This patch establishes a precedence order for who gets charged:

1. If there is a memcg associated with the page already, that memcg is
   charged. This happens during swapin.

2. If an explicit mm is passed, mm->memcg is charged. This happens
   during page faults, which can be triggered in remote VMs (eg gup).

3. Otherwise consult the current process context. If it has configured
   a current->active_memcg, use that. Otherwise, current->mm->memcg.

Previously, if a NULL mm was passed to mem_cgroup_try_charge (case 3) it
would always charge the root cgroup. Now it looks up the current
active_memcg first (falling back to charging the root cgroup if not
set).

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
---
 mm/memcontrol.c | 11 ++++++++---
 mm/shmem.c      |  2 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6f6dc8712e39..b174aff4f069 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6317,7 +6317,8 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
  * @compound: charge the page as compound or small page
  *
  * Try to charge @page to the memcg that @mm belongs to, reclaiming
- * pages according to @gfp_mask if necessary.
+ * pages according to @gfp_mask if necessary. If @mm is NULL, try to
+ * charge to the active memcg.
  *
  * Returns 0 on success, with *@memcgp pointing to the charged memcg.
  * Otherwise, an error code is returned.
@@ -6361,8 +6362,12 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
 		}
 	}
 
-	if (!memcg)
-		memcg = get_mem_cgroup_from_mm(mm);
+	if (!memcg) {
+		if (!mm)
+			memcg = get_mem_cgroup_from_current();
+		else
+			memcg = get_mem_cgroup_from_mm(mm);
+	}
 
 	ret = try_charge(memcg, gfp_mask, nr_pages);
 
diff --git a/mm/shmem.c b/mm/shmem.c
index c8f7540ef048..7c7f5acf89d6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1766,7 +1766,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 	}
 
 	sbinfo = SHMEM_SB(inode->i_sb);
-	charge_mm = vma ? vma->vm_mm : current->mm;
+	charge_mm = vma ? vma->vm_mm : NULL;
 
 	page = find_lock_entry(mapping, index);
 	if (xa_is_value(page)) {
-- 
2.17.1


^ permalink raw reply related

* [PATCH v3 1/3] loop: Use worker per cgroup instead of kworker
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP (CGROUP),
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
In-Reply-To: <cover.1582216294.git.schatzberg.dan@gmail.com>

Existing uses of loop device may have multiple cgroups reading/writing
to the same device. Simply charging resources for I/O to the backing
file could result in priority inversion where one cgroup gets
synchronously blocked, holding up all other I/O to the loop device.

In order to avoid this priority inversion, we use a single workqueue
where each work item is a "struct loop_worker" which contains a queue of
struct loop_cmds to issue. The loop device maintains a tree mapping blk
css_id -> loop_worker. This allows each cgroup to independently make
forward progress issuing I/O to the backing file.

There is also a single queue for I/O associated with the rootcg which
can be used in cases of extreme memory shortage where we cannot allocate
a loop_worker.

The locking for the tree and queues is fairly heavy handed - we acquire
the per-loop-device spinlock any time either is accessed. The existing
implementation serializes all I/O through a single thread anyways, so I
don't believe this is any worse.

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
---
 drivers/block/loop.c | 188 +++++++++++++++++++++++++++++++++++++------
 drivers/block/loop.h |  11 ++-
 2 files changed, 170 insertions(+), 29 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 739b372a5112..78e5005c6742 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -70,7 +70,6 @@
 #include <linux/writeback.h>
 #include <linux/completion.h>
 #include <linux/highmem.h>
-#include <linux/kthread.h>
 #include <linux/splice.h>
 #include <linux/sysfs.h>
 #include <linux/miscdevice.h>
@@ -83,6 +82,8 @@
 
 #include <linux/uaccess.h>
 
+#define LOOP_IDLE_WORKER_TIMEOUT (60 * HZ)
+
 static DEFINE_IDR(loop_index_idr);
 static DEFINE_MUTEX(loop_ctl_mutex);
 
@@ -891,27 +892,101 @@ static void loop_config_discard(struct loop_device *lo)
 
 static void loop_unprepare_queue(struct loop_device *lo)
 {
-	kthread_flush_worker(&lo->worker);
-	kthread_stop(lo->worker_task);
-}
-
-static int loop_kthread_worker_fn(void *worker_ptr)
-{
-	current->flags |= PF_LESS_THROTTLE | PF_MEMALLOC_NOIO;
-	return kthread_worker_fn(worker_ptr);
+	destroy_workqueue(lo->workqueue);
 }
 
 static int loop_prepare_queue(struct loop_device *lo)
 {
-	kthread_init_worker(&lo->worker);
-	lo->worker_task = kthread_run(loop_kthread_worker_fn,
-			&lo->worker, "loop%d", lo->lo_number);
-	if (IS_ERR(lo->worker_task))
+	lo->workqueue = alloc_workqueue("loop%d",
+					WQ_UNBOUND | WQ_FREEZABLE |
+					WQ_MEM_RECLAIM,
+					lo->lo_number);
+	if (IS_ERR(lo->workqueue))
 		return -ENOMEM;
-	set_user_nice(lo->worker_task, MIN_NICE);
+
 	return 0;
 }
 
+struct loop_worker {
+	struct rb_node rb_node;
+	struct work_struct work;
+	struct list_head cmd_list;
+	struct list_head idle_list;
+	struct loop_device *lo;
+	struct cgroup_subsys_state *css;
+	unsigned long last_ran_at;
+};
+
+static void loop_workfn(struct work_struct *work);
+static void loop_rootcg_workfn(struct work_struct *work);
+static void loop_free_idle_workers(struct timer_list *timer);
+
+static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
+{
+	struct rb_node **node = &(lo->worker_tree.rb_node), *parent = NULL;
+	struct loop_worker *cur_worker, *worker = NULL;
+	struct work_struct *work;
+	struct list_head *cmd_list;
+
+	spin_lock_irq(&lo->lo_lock);
+
+	if (!cmd->css)
+		goto queue_work;
+
+	node = &(lo->worker_tree.rb_node);
+
+	while (*node) {
+		parent = *node;
+		cur_worker = container_of(*node, struct loop_worker, rb_node);
+		if ((long)cur_worker->css == (long)cmd->css) {
+			worker = cur_worker;
+			break;
+		} else if ((long)cur_worker->css < (long)cmd->css) {
+			node = &((*node)->rb_left);
+		} else {
+			node = &((*node)->rb_right);
+		}
+	}
+	if (worker)
+		goto queue_work;
+
+	worker = kzalloc(sizeof(struct loop_worker),
+			GFP_NOWAIT | __GFP_NOWARN);
+	/*
+	 * In the event we cannot allocate a worker, just queue on the
+	 * rootcg worker
+	 */
+	if (!worker)
+		goto queue_work;
+
+	worker->css = cmd->css;
+	css_get(worker->css);
+	INIT_WORK(&worker->work, loop_workfn);
+	INIT_LIST_HEAD(&worker->cmd_list);
+	INIT_LIST_HEAD(&worker->idle_list);
+	worker->lo = lo;
+	rb_link_node(&worker->rb_node, parent, node);
+	rb_insert_color(&worker->rb_node, &lo->worker_tree);
+queue_work:
+	if (worker) {
+		/*
+		 * We need to remove from the idle list here while
+		 * holding the lock so that the idle timer doesn't
+		 * free the worker
+		 */
+		if (!list_empty(&worker->idle_list))
+			list_del_init(&worker->idle_list);
+		work = &worker->work;
+		cmd_list = &worker->cmd_list;
+	} else {
+		work = &lo->rootcg_work;
+		cmd_list = &lo->rootcg_cmd_list;
+	}
+	list_add_tail(&cmd->list_entry, cmd_list);
+	queue_work(lo->workqueue, work);
+	spin_unlock_irq(&lo->lo_lock);
+}
+
 static void loop_update_rotational(struct loop_device *lo)
 {
 	struct file *file = lo->lo_backing_file;
@@ -993,6 +1068,12 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 
 	set_device_ro(bdev, (lo_flags & LO_FLAGS_READ_ONLY) != 0);
 
+	INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
+	INIT_LIST_HEAD(&lo->rootcg_cmd_list);
+	INIT_LIST_HEAD(&lo->idle_worker_list);
+	lo->worker_tree = RB_ROOT;
+	timer_setup(&lo->timer, loop_free_idle_workers,
+		TIMER_DEFERRABLE);
 	lo->use_dio = false;
 	lo->lo_device = bdev;
 	lo->lo_flags = lo_flags;
@@ -1155,6 +1236,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
 	partscan = lo->lo_flags & LO_FLAGS_PARTSCAN && bdev;
 	lo_number = lo->lo_number;
 	loop_unprepare_queue(lo);
+	del_timer_sync(&lo->timer);
 out_unlock:
 	mutex_unlock(&loop_ctl_mutex);
 	if (partscan) {
@@ -1932,7 +2014,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	} else
 #endif
 		cmd->css = NULL;
-	kthread_queue_work(&lo->worker, &cmd->work);
+	loop_queue_work(lo, cmd);
 
 	return BLK_STS_OK;
 }
@@ -1958,26 +2040,82 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
 	}
 }
 
-static void loop_queue_work(struct kthread_work *work)
+static void loop_set_timer(struct loop_device *lo)
 {
-	struct loop_cmd *cmd =
-		container_of(work, struct loop_cmd, work);
+	timer_reduce(&lo->timer, jiffies + LOOP_IDLE_WORKER_TIMEOUT);
+}
+
+static void loop_process_work(struct loop_worker *worker,
+			struct list_head *cmd_list, struct loop_device *lo)
+{
+	int orig_flags = current->flags;
+	struct loop_cmd *cmd;
+
+	current->flags |= PF_LESS_THROTTLE | PF_MEMALLOC_NOIO;
+	while (1) {
+		spin_lock_irq(&lo->lo_lock);
+		if (list_empty(cmd_list))
+			break;
+
+		cmd = container_of(
+			cmd_list->next, struct loop_cmd, list_entry);
+		list_del(cmd_list->next);
+		spin_unlock_irq(&lo->lo_lock);
+		loop_handle_cmd(cmd);
+		cond_resched();
+	}
 
-	loop_handle_cmd(cmd);
+	/*
+	 * We only add to the idle list if there are no pending cmds
+	 * *and* the worker will not run again which ensures that it
+	 * is safe to free any worker on the idle list
+	 */
+	if (worker && !work_pending(&worker->work)) {
+		worker->last_ran_at = jiffies;
+		list_add_tail(&worker->idle_list, &lo->idle_worker_list);
+		loop_set_timer(lo);
+	}
+	spin_unlock_irq(&lo->lo_lock);
+	current->flags = orig_flags;
 }
 
-static int loop_init_request(struct blk_mq_tag_set *set, struct request *rq,
-		unsigned int hctx_idx, unsigned int numa_node)
+static void loop_workfn(struct work_struct *work)
 {
-	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
+	struct loop_worker *worker =
+		container_of(work, struct loop_worker, work);
+	loop_process_work(worker, &worker->cmd_list, worker->lo);
+}
 
-	kthread_init_work(&cmd->work, loop_queue_work);
-	return 0;
+static void loop_rootcg_workfn(struct work_struct *work)
+{
+	struct loop_device *lo =
+		container_of(work, struct loop_device, rootcg_work);
+	loop_process_work(NULL, &lo->rootcg_cmd_list, lo);
+}
+
+static void loop_free_idle_workers(struct timer_list *timer)
+{
+	struct loop_device *lo = container_of(timer, struct loop_device, timer);
+	struct loop_worker *pos, *worker;
+
+	spin_lock_irq(&lo->lo_lock);
+	list_for_each_entry_safe(worker, pos, &lo->idle_worker_list,
+				idle_list) {
+		if (time_is_after_jiffies(worker->last_ran_at +
+						LOOP_IDLE_WORKER_TIMEOUT))
+			break;
+		list_del(&worker->idle_list);
+		rb_erase(&worker->rb_node, &lo->worker_tree);
+		css_put(worker->css);
+		kfree(worker);
+	}
+	if (!list_empty(&lo->idle_worker_list))
+		loop_set_timer(lo);
+	spin_unlock_irq(&lo->lo_lock);
 }
 
 static const struct blk_mq_ops loop_mq_ops = {
 	.queue_rq       = loop_queue_rq,
-	.init_request	= loop_init_request,
 	.complete	= lo_complete_rq,
 };
 
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index af75a5ee4094..87fd0e372227 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -14,7 +14,6 @@
 #include <linux/blk-mq.h>
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
-#include <linux/kthread.h>
 #include <uapi/linux/loop.h>
 
 /* Possible states of device */
@@ -54,8 +53,12 @@ struct loop_device {
 
 	spinlock_t		lo_lock;
 	int			lo_state;
-	struct kthread_worker	worker;
-	struct task_struct	*worker_task;
+	struct workqueue_struct *workqueue;
+	struct work_struct      rootcg_work;
+	struct list_head        rootcg_cmd_list;
+	struct list_head        idle_worker_list;
+	struct rb_root          worker_tree;
+	struct timer_list       timer;
 	bool			use_dio;
 	bool			sysfs_inited;
 
@@ -65,7 +68,7 @@ struct loop_device {
 };
 
 struct loop_cmd {
-	struct kthread_work work;
+	struct list_head list_entry;
 	bool use_aio; /* use AIO interface to handle I/O */
 	atomic_t ref; /* only for aio */
 	long ret;
-- 
2.17.1


^ permalink raw reply related

* [PATCH v3 1/3] loop: Use worker per cgroup instead of kworker
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP (CGROUP),
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
In-Reply-To: <cover.1582216294.git.schatzberg.dan@gmail.com>

Existing uses of loop device may have multiple cgroups reading/writing
to the same device. Simply charging resources for I/O to the backing
file could result in priority inversion where one cgroup gets
synchronously blocked, holding up all other I/O to the loop device.

In order to avoid this priority inversion, we use a single workqueue
where each work item is a "struct loop_worker" which contains a queue of
struct loop_cmds to issue. The loop device maintains a tree mapping blk
css_id -> loop_worker. This allows each cgroup to independently make
forward progress issuing I/O to the backing file.

There is also a single queue for I/O associated with the rootcg which
can be used in cases of extreme memory shortage where we cannot allocate
a loop_worker.

The locking for the tree and queues is fairly heavy handed - we acquire
the per-loop-device spinlock any time either is accessed. The existing
implementation serializes all I/O through a single thread anyways, so I
don't believe this is any worse.

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
---
 drivers/block/loop.c | 188 +++++++++++++++++++++++++++++++++++++------
 drivers/block/loop.h |  11 ++-
 2 files changed, 170 insertions(+), 29 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 739b372a5112..78e5005c6742 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -70,7 +70,6 @@
 #include <linux/writeback.h>
 #include <linux/completion.h>
 #include <linux/highmem.h>
-#include <linux/kthread.h>
 #include <linux/splice.h>
 #include <linux/sysfs.h>
 #include <linux/miscdevice.h>
@@ -83,6 +82,8 @@
 
 #include <linux/uaccess.h>
 
+#define LOOP_IDLE_WORKER_TIMEOUT (60 * HZ)
+
 static DEFINE_IDR(loop_index_idr);
 static DEFINE_MUTEX(loop_ctl_mutex);
 
@@ -891,27 +892,101 @@ static void loop_config_discard(struct loop_device *lo)
 
 static void loop_unprepare_queue(struct loop_device *lo)
 {
-	kthread_flush_worker(&lo->worker);
-	kthread_stop(lo->worker_task);
-}
-
-static int loop_kthread_worker_fn(void *worker_ptr)
-{
-	current->flags |= PF_LESS_THROTTLE | PF_MEMALLOC_NOIO;
-	return kthread_worker_fn(worker_ptr);
+	destroy_workqueue(lo->workqueue);
 }
 
 static int loop_prepare_queue(struct loop_device *lo)
 {
-	kthread_init_worker(&lo->worker);
-	lo->worker_task = kthread_run(loop_kthread_worker_fn,
-			&lo->worker, "loop%d", lo->lo_number);
-	if (IS_ERR(lo->worker_task))
+	lo->workqueue = alloc_workqueue("loop%d",
+					WQ_UNBOUND | WQ_FREEZABLE |
+					WQ_MEM_RECLAIM,
+					lo->lo_number);
+	if (IS_ERR(lo->workqueue))
 		return -ENOMEM;
-	set_user_nice(lo->worker_task, MIN_NICE);
+
 	return 0;
 }
 
+struct loop_worker {
+	struct rb_node rb_node;
+	struct work_struct work;
+	struct list_head cmd_list;
+	struct list_head idle_list;
+	struct loop_device *lo;
+	struct cgroup_subsys_state *css;
+	unsigned long last_ran_at;
+};
+
+static void loop_workfn(struct work_struct *work);
+static void loop_rootcg_workfn(struct work_struct *work);
+static void loop_free_idle_workers(struct timer_list *timer);
+
+static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
+{
+	struct rb_node **node = &(lo->worker_tree.rb_node), *parent = NULL;
+	struct loop_worker *cur_worker, *worker = NULL;
+	struct work_struct *work;
+	struct list_head *cmd_list;
+
+	spin_lock_irq(&lo->lo_lock);
+
+	if (!cmd->css)
+		goto queue_work;
+
+	node = &(lo->worker_tree.rb_node);
+
+	while (*node) {
+		parent = *node;
+		cur_worker = container_of(*node, struct loop_worker, rb_node);
+		if ((long)cur_worker->css == (long)cmd->css) {
+			worker = cur_worker;
+			break;
+		} else if ((long)cur_worker->css < (long)cmd->css) {
+			node = &((*node)->rb_left);
+		} else {
+			node = &((*node)->rb_right);
+		}
+	}
+	if (worker)
+		goto queue_work;
+
+	worker = kzalloc(sizeof(struct loop_worker),
+			GFP_NOWAIT | __GFP_NOWARN);
+	/*
+	 * In the event we cannot allocate a worker, just queue on the
+	 * rootcg worker
+	 */
+	if (!worker)
+		goto queue_work;
+
+	worker->css = cmd->css;
+	css_get(worker->css);
+	INIT_WORK(&worker->work, loop_workfn);
+	INIT_LIST_HEAD(&worker->cmd_list);
+	INIT_LIST_HEAD(&worker->idle_list);
+	worker->lo = lo;
+	rb_link_node(&worker->rb_node, parent, node);
+	rb_insert_color(&worker->rb_node, &lo->worker_tree);
+queue_work:
+	if (worker) {
+		/*
+		 * We need to remove from the idle list here while
+		 * holding the lock so that the idle timer doesn't
+		 * free the worker
+		 */
+		if (!list_empty(&worker->idle_list))
+			list_del_init(&worker->idle_list);
+		work = &worker->work;
+		cmd_list = &worker->cmd_list;
+	} else {
+		work = &lo->rootcg_work;
+		cmd_list = &lo->rootcg_cmd_list;
+	}
+	list_add_tail(&cmd->list_entry, cmd_list);
+	queue_work(lo->workqueue, work);
+	spin_unlock_irq(&lo->lo_lock);
+}
+
 static void loop_update_rotational(struct loop_device *lo)
 {
 	struct file *file = lo->lo_backing_file;
@@ -993,6 +1068,12 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 
 	set_device_ro(bdev, (lo_flags & LO_FLAGS_READ_ONLY) != 0);
 
+	INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
+	INIT_LIST_HEAD(&lo->rootcg_cmd_list);
+	INIT_LIST_HEAD(&lo->idle_worker_list);
+	lo->worker_tree = RB_ROOT;
+	timer_setup(&lo->timer, loop_free_idle_workers,
+		TIMER_DEFERRABLE);
 	lo->use_dio = false;
 	lo->lo_device = bdev;
 	lo->lo_flags = lo_flags;
@@ -1155,6 +1236,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
 	partscan = lo->lo_flags & LO_FLAGS_PARTSCAN && bdev;
 	lo_number = lo->lo_number;
 	loop_unprepare_queue(lo);
+	del_timer_sync(&lo->timer);
 out_unlock:
 	mutex_unlock(&loop_ctl_mutex);
 	if (partscan) {
@@ -1932,7 +2014,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	} else
 #endif
 		cmd->css = NULL;
-	kthread_queue_work(&lo->worker, &cmd->work);
+	loop_queue_work(lo, cmd);
 
 	return BLK_STS_OK;
 }
@@ -1958,26 +2040,82 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
 	}
 }
 
-static void loop_queue_work(struct kthread_work *work)
+static void loop_set_timer(struct loop_device *lo)
 {
-	struct loop_cmd *cmd =
-		container_of(work, struct loop_cmd, work);
+	timer_reduce(&lo->timer, jiffies + LOOP_IDLE_WORKER_TIMEOUT);
+}
+
+static void loop_process_work(struct loop_worker *worker,
+			struct list_head *cmd_list, struct loop_device *lo)
+{
+	int orig_flags = current->flags;
+	struct loop_cmd *cmd;
+
+	current->flags |= PF_LESS_THROTTLE | PF_MEMALLOC_NOIO;
+	while (1) {
+		spin_lock_irq(&lo->lo_lock);
+		if (list_empty(cmd_list))
+			break;
+
+		cmd = container_of(
+			cmd_list->next, struct loop_cmd, list_entry);
+		list_del(cmd_list->next);
+		spin_unlock_irq(&lo->lo_lock);
+		loop_handle_cmd(cmd);
+		cond_resched();
+	}
 
-	loop_handle_cmd(cmd);
+	/*
+	 * We only add to the idle list if there are no pending cmds
+	 * *and* the worker will not run again which ensures that it
+	 * is safe to free any worker on the idle list
+	 */
+	if (worker && !work_pending(&worker->work)) {
+		worker->last_ran_at = jiffies;
+		list_add_tail(&worker->idle_list, &lo->idle_worker_list);
+		loop_set_timer(lo);
+	}
+	spin_unlock_irq(&lo->lo_lock);
+	current->flags = orig_flags;
 }
 
-static int loop_init_request(struct blk_mq_tag_set *set, struct request *rq,
-		unsigned int hctx_idx, unsigned int numa_node)
+static void loop_workfn(struct work_struct *work)
 {
-	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
+	struct loop_worker *worker =
+		container_of(work, struct loop_worker, work);
+	loop_process_work(worker, &worker->cmd_list, worker->lo);
+}
 
-	kthread_init_work(&cmd->work, loop_queue_work);
-	return 0;
+static void loop_rootcg_workfn(struct work_struct *work)
+{
+	struct loop_device *lo =
+		container_of(work, struct loop_device, rootcg_work);
+	loop_process_work(NULL, &lo->rootcg_cmd_list, lo);
+}
+
+static void loop_free_idle_workers(struct timer_list *timer)
+{
+	struct loop_device *lo = container_of(timer, struct loop_device, timer);
+	struct loop_worker *pos, *worker;
+
+	spin_lock_irq(&lo->lo_lock);
+	list_for_each_entry_safe(worker, pos, &lo->idle_worker_list,
+				idle_list) {
+		if (time_is_after_jiffies(worker->last_ran_at +
+						LOOP_IDLE_WORKER_TIMEOUT))
+			break;
+		list_del(&worker->idle_list);
+		rb_erase(&worker->rb_node, &lo->worker_tree);
+		css_put(worker->css);
+		kfree(worker);
+	}
+	if (!list_empty(&lo->idle_worker_list))
+		loop_set_timer(lo);
+	spin_unlock_irq(&lo->lo_lock);
 }
 
 static const struct blk_mq_ops loop_mq_ops = {
 	.queue_rq       = loop_queue_rq,
-	.init_request	= loop_init_request,
 	.complete	= lo_complete_rq,
 };
 
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index af75a5ee4094..87fd0e372227 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -14,7 +14,6 @@
 #include <linux/blk-mq.h>
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
-#include <linux/kthread.h>
 #include <uapi/linux/loop.h>
 
 /* Possible states of device */
@@ -54,8 +53,12 @@ struct loop_device {
 
 	spinlock_t		lo_lock;
 	int			lo_state;
-	struct kthread_worker	worker;
-	struct task_struct	*worker_task;
+	struct workqueue_struct *workqueue;
+	struct work_struct      rootcg_work;
+	struct list_head        rootcg_cmd_list;
+	struct list_head        idle_worker_list;
+	struct rb_root          worker_tree;
+	struct timer_list       timer;
 	bool			use_dio;
 	bool			sysfs_inited;
 
@@ -65,7 +68,7 @@ struct loop_device {
 };
 
 struct loop_cmd {
-	struct kthread_work work;
+	struct list_head list_entry;
 	bool use_aio; /* use AIO interface to handle I/O */
 	atomic_t ref; /* only for aio */
 	long ret;
-- 
2.17.1



^ permalink raw reply related

* [PATCH v3 0/3] Charge loop device i/o to issuing cgroup
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP (CGROUP),
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)

Changes since V2:

* Deferred destruction of workqueue items so in the common case there
  is no allocation needed

Changes since V1:

* Split out and reordered patches so cgroup charging changes are
  separate from kworker -> workqueue change

* Add mem_css to struct loop_cmd to simplify logic

The loop device runs all i/o to the backing file on a separate kworker
thread which results in all i/o being charged to the root cgroup. This
allows a loop device to be used to trivially bypass resource limits
and other policy. This patch series fixes this gap in accounting.

A simple script to demonstrate this behavior on cgroupv2 machine:

'''
#!/bin/bash
set -e

CGROUP=/sys/fs/cgroup/test.slice
LOOP_DEV=/dev/loop0

if [[ ! -d $CGROUP ]]
then
    sudo mkdir $CGROUP
fi

grep oom_kill $CGROUP/memory.events

# Set a memory limit, write more than that limit to tmpfs -> OOM kill
sudo unshare -m bash -c "
echo \$\$ > $CGROUP/cgroup.procs;
echo 0 > $CGROUP/memory.swap.max;
echo 64M > $CGROUP/memory.max;
mount -t tmpfs -o size=512m tmpfs /tmp;
dd if=/dev/zero of=/tmp/file bs=1M count=256" || true

grep oom_kill $CGROUP/memory.events

# Set a memory limit, write more than that limit through loopback
# device -> no OOM kill
sudo unshare -m bash -c "
echo \$\$ > $CGROUP/cgroup.procs;
echo 0 > $CGROUP/memory.swap.max;
echo 64M > $CGROUP/memory.max;
mount -t tmpfs -o size=512m tmpfs /tmp;
truncate -s 512m /tmp/backing_file
losetup $LOOP_DEV /tmp/backing_file
dd if=/dev/zero of=$LOOP_DEV bs=1M count=256;
losetup -D $LOOP_DEV" || true

grep oom_kill $CGROUP/memory.events
'''

Naively charging cgroups could result in priority inversions through
the single kworker thread in the case where multiple cgroups are
reading/writing to the same loop device. This patch series does some
minor modification to the loop driver so that each cgroup can make
forward progress independently to avoid this inversion.

With this patch series applied, the above script triggers OOM kills
when writing through the loop device as expected.

Dan Schatzberg (3):
  loop: Use worker per cgroup instead of kworker
  mm: Charge active memcg when no mm is set
  loop: Charge i/o to mem and blk cg

 drivers/block/loop.c       | 229 +++++++++++++++++++++++++++++++------
 drivers/block/loop.h       |  14 ++-
 include/linux/memcontrol.h |   6 +
 kernel/cgroup/cgroup.c     |   1 +
 mm/memcontrol.c            |  11 +-
 mm/shmem.c                 |   2 +-
 6 files changed, 217 insertions(+), 46 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH v3 0/3] Charge loop device i/o to issuing cgroup
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP (CGROUP),
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)

Changes since V2:

* Deferred destruction of workqueue items so in the common case there
  is no allocation needed

Changes since V1:

* Split out and reordered patches so cgroup charging changes are
  separate from kworker -> workqueue change

* Add mem_css to struct loop_cmd to simplify logic

The loop device runs all i/o to the backing file on a separate kworker
thread which results in all i/o being charged to the root cgroup. This
allows a loop device to be used to trivially bypass resource limits
and other policy. This patch series fixes this gap in accounting.

A simple script to demonstrate this behavior on cgroupv2 machine:

'''
#!/bin/bash
set -e

CGROUP=/sys/fs/cgroup/test.slice
LOOP_DEV=/dev/loop0

if [[ ! -d $CGROUP ]]
then
    sudo mkdir $CGROUP
fi

grep oom_kill $CGROUP/memory.events

# Set a memory limit, write more than that limit to tmpfs -> OOM kill
sudo unshare -m bash -c "
echo \$\$ > $CGROUP/cgroup.procs;
echo 0 > $CGROUP/memory.swap.max;
echo 64M > $CGROUP/memory.max;
mount -t tmpfs -o size=512m tmpfs /tmp;
dd if=/dev/zero of=/tmp/file bs=1M count=256" || true

grep oom_kill $CGROUP/memory.events

# Set a memory limit, write more than that limit through loopback
# device -> no OOM kill
sudo unshare -m bash -c "
echo \$\$ > $CGROUP/cgroup.procs;
echo 0 > $CGROUP/memory.swap.max;
echo 64M > $CGROUP/memory.max;
mount -t tmpfs -o size=512m tmpfs /tmp;
truncate -s 512m /tmp/backing_file
losetup $LOOP_DEV /tmp/backing_file
dd if=/dev/zero of=$LOOP_DEV bs=1M count=256;
losetup -D $LOOP_DEV" || true

grep oom_kill $CGROUP/memory.events
'''

Naively charging cgroups could result in priority inversions through
the single kworker thread in the case where multiple cgroups are
reading/writing to the same loop device. This patch series does some
minor modification to the loop driver so that each cgroup can make
forward progress independently to avoid this inversion.

With this patch series applied, the above script triggers OOM kills
when writing through the loop device as expected.

Dan Schatzberg (3):
  loop: Use worker per cgroup instead of kworker
  mm: Charge active memcg when no mm is set
  loop: Charge i/o to mem and blk cg

 drivers/block/loop.c       | 229 +++++++++++++++++++++++++++++++------
 drivers/block/loop.h       |  14 ++-
 include/linux/memcontrol.h |   6 +
 kernel/cgroup/cgroup.c     |   1 +
 mm/memcontrol.c            |  11 +-
 mm/shmem.c                 |   2 +-
 6 files changed, 217 insertions(+), 46 deletions(-)

-- 
2.17.1



^ permalink raw reply

* Re: [PULL 0/9] target/hppa patch queue
From: Peter Maydell @ 2020-02-20 16:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers
In-Reply-To: <20200218193929.11404-1-richard.henderson@linaro.org>

On Tue, 18 Feb 2020 at 19:39, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The following changes since commit 6c599282f8ab382fe59f03a6cae755b89561a7b3:
>
>   Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2020-02-15-v2' into staging (2020-02-17 13:32:25 +0000)
>
> are available in the Git repository at:
>
>   https://github.com/rth7680/qemu.git tags/pull-pa-20200218
>
> for you to fetch changes up to 90e94c0591687f7f788fc40ac86b5583f30d9513:
>
>   hw/hppa/dino: Do not accept accesses to registers 0x818 and 0x82c (2020-02-18 11:22:10 -0800)
>
> ----------------------------------------------------------------
> Fixes for Dino and Artist.
>



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM


^ permalink raw reply

* Re: [PATCH v2 2/2] RDMA/rw: map P2P memory correctly for signature operations
From: Logan Gunthorpe @ 2020-02-20 16:51 UTC (permalink / raw)
  To: Max Gurtovoy, jgg, leon, linux-rdma; +Cc: israelr
In-Reply-To: <20200220100819.41860-2-maxg@mellanox.com>



On 2020-02-20 3:08 a.m., Max Gurtovoy wrote:
> Since RDMA rw API support operations with P2P memory sg list, make sure
> to map/unmap the scatter list for signature operation correctly.

Does anyone actually use P2P pages with rdma_rw_ctx_signature_init()?

Logan

> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> ---
>  drivers/infiniband/core/rw.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
> index 69513b484507..6eba8453f206 100644
> --- a/drivers/infiniband/core/rw.c
> +++ b/drivers/infiniband/core/rw.c
> @@ -392,13 +392,13 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
>  		return -EINVAL;
>  	}
>  
> -	ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
> +	ret = rdma_rw_map_sg(dev, sg, sg_cnt, dir);
>  	if (!ret)
>  		return -ENOMEM;
>  	sg_cnt = ret;
>  
>  	if (prot_sg_cnt) {
> -		ret = ib_dma_map_sg(dev, prot_sg, prot_sg_cnt, dir);
> +		ret = rdma_rw_map_sg(dev, prot_sg, prot_sg_cnt, dir);
>  		if (!ret) {
>  			ret = -ENOMEM;
>  			goto out_unmap_sg;
> @@ -467,9 +467,9 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
>  	kfree(ctx->reg);
>  out_unmap_prot_sg:
>  	if (prot_sg_cnt)
> -		ib_dma_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
> +		rdma_rw_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
>  out_unmap_sg:
> -	ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
> +	rdma_rw_unmap_sg(dev, sg, sg_cnt, dir);
>  	return ret;
>  }
>  EXPORT_SYMBOL(rdma_rw_ctx_signature_init);
> @@ -629,9 +629,9 @@ void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
>  	ib_mr_pool_put(qp, &qp->sig_mrs, ctx->reg->mr);
>  	kfree(ctx->reg);
>  
> -	ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
>  	if (prot_sg_cnt)
> -		ib_dma_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
> +		rdma_rw_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
> +	rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
>  }
>  EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);
>  
> 

^ permalink raw reply

* [PATCH v3 3/3] loop: Charge i/o to mem and blk cg
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP CGROUP,
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER MEMCG
In-Reply-To: <cover.1582216294.git.schatzberg.dan@gmail.com>

The current code only associates with the existing blkcg when aio is
used to access the backing file. This patch covers all types of i/o to
the backing file and also associates the memcg so if the backing file is
on tmpfs, memory is charged appropriately.

This patch also exports cgroup_get_e_css so it can be used by the loop
module.

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
---
 drivers/block/loop.c       | 59 ++++++++++++++++++++++++--------------
 drivers/block/loop.h       |  3 +-
 include/linux/memcontrol.h |  6 ++++
 kernel/cgroup/cgroup.c     |  1 +
 4 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 78e5005c6742..cc091a66b0d0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -77,6 +77,7 @@
 #include <linux/uio.h>
 #include <linux/ioprio.h>
 #include <linux/blk-cgroup.h>
+#include <linux/sched/mm.h>
 
 #include "loop.h"
 
@@ -134,7 +135,7 @@ static struct loop_func_table xor_funcs = {
 	.number = LO_CRYPT_XOR,
 	.transfer = transfer_xor,
 	.init = xor_init
-}; 
+};
 
 /* xfer_funcs[0] is special - its release function is never called */
 static struct loop_func_table *xfer_funcs[MAX_LO_CRYPT] = {
@@ -504,8 +505,6 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
 {
 	struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);
 
-	if (cmd->css)
-		css_put(cmd->css);
 	cmd->ret = ret;
 	lo_rw_aio_do_completion(cmd);
 }
@@ -566,8 +565,6 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	cmd->iocb.ki_complete = lo_rw_aio_complete;
 	cmd->iocb.ki_flags = IOCB_DIRECT;
 	cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
-	if (cmd->css)
-		kthread_associate_blkcg(cmd->css);
 
 	if (rw == WRITE)
 		ret = call_write_iter(file, &cmd->iocb, &iter);
@@ -575,7 +572,6 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 		ret = call_read_iter(file, &cmd->iocb, &iter);
 
 	lo_rw_aio_do_completion(cmd);
-	kthread_associate_blkcg(NULL);
 
 	if (ret != -EIOCBQUEUED)
 		cmd->iocb.ki_complete(&cmd->iocb, ret, 0);
@@ -913,7 +909,7 @@ struct loop_worker {
 	struct list_head cmd_list;
 	struct list_head idle_list;
 	struct loop_device *lo;
-	struct cgroup_subsys_state *css;
+	struct cgroup_subsys_state *blkcg_css;
 	unsigned long last_ran_at;
 };
 
@@ -930,7 +926,7 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 
 	spin_lock_irq(&lo->lo_lock);
 
-	if (!cmd->css)
+	if (!cmd->blkcg_css)
 		goto queue_work;
 
 	node = &(lo->worker_tree.rb_node);
@@ -938,10 +934,10 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 	while (*node) {
 		parent = *node;
 		cur_worker = container_of(*node, struct loop_worker, rb_node);
-		if ((long)cur_worker->css == (long)cmd->css) {
+		if ((long)cur_worker->blkcg_css == (long)cmd->blkcg_css) {
 			worker = cur_worker;
 			break;
-		} else if ((long)cur_worker->css < (long)cmd->css) {
+		} else if ((long)cur_worker->blkcg_css < (long)cmd->blkcg_css) {
 			node = &((*node)->rb_left);
 		} else {
 			node = &((*node)->rb_right);
@@ -954,13 +950,16 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 			GFP_NOWAIT | __GFP_NOWARN);
 	/*
 	 * In the event we cannot allocate a worker, just queue on the
-	 * rootcg worker
+	 * rootcg worker and issue the I/O as the rootcg
 	 */
-	if (!worker)
+	if (!worker) {
+		cmd->blkcg_css = NULL;
+		cmd->memcg_css = NULL;
 		goto queue_work;
+	}
 
-	worker->css = cmd->css;
-	css_get(worker->css);
+	worker->blkcg_css = cmd->blkcg_css;
+	css_get(worker->blkcg_css);
 	INIT_WORK(&worker->work, loop_workfn);
 	INIT_LIST_HEAD(&worker->cmd_list);
 	INIT_LIST_HEAD(&worker->idle_list);
@@ -2007,13 +2006,18 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	}
 
 	/* always use the first bio's css */
+	cmd->blkcg_css = NULL;
+	cmd->memcg_css = NULL;
 #ifdef CONFIG_BLK_CGROUP
-	if (cmd->use_aio && rq->bio && rq->bio->bi_blkg) {
-		cmd->css = &bio_blkcg(rq->bio)->css;
-		css_get(cmd->css);
-	} else
+	if (rq->bio && rq->bio->bi_blkg) {
+		cmd->blkcg_css = &bio_blkcg(rq->bio)->css;
+#ifdef CONFIG_MEMCG
+		cmd->memcg_css =
+			cgroup_get_e_css(cmd->blkcg_css->cgroup,
+					&memory_cgrp_subsys);
+#endif
+	}
 #endif
-		cmd->css = NULL;
 	loop_queue_work(lo, cmd);
 
 	return BLK_STS_OK;
@@ -2031,8 +2035,21 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
 		goto failed;
 	}
 
+	if (cmd->blkcg_css)
+		kthread_associate_blkcg(cmd->blkcg_css);
+	if (cmd->memcg_css)
+		memalloc_use_memcg(mem_cgroup_from_css(cmd->memcg_css));
+
 	ret = do_req_filebacked(lo, rq);
- failed:
+
+	if (cmd->blkcg_css)
+		kthread_associate_blkcg(NULL);
+
+	if (cmd->memcg_css) {
+		memalloc_unuse_memcg();
+		css_put(cmd->memcg_css);
+	}
+failed:
 	/* complete non-aio request */
 	if (!cmd->use_aio || ret) {
 		cmd->ret = ret ? -EIO : 0;
@@ -2106,7 +2123,7 @@ static void loop_free_idle_workers(struct timer_list *timer)
 			break;
 		list_del(&worker->idle_list);
 		rb_erase(&worker->rb_node, &lo->worker_tree);
-		css_put(worker->css);
+		css_put(worker->blkcg_css);
 		kfree(worker);
 	}
 	if (!list_empty(&lo->idle_worker_list))
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 87fd0e372227..3e65acf7a0e9 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -74,7 +74,8 @@ struct loop_cmd {
 	long ret;
 	struct kiocb iocb;
 	struct bio_vec *bvec;
-	struct cgroup_subsys_state *css;
+	struct cgroup_subsys_state *blkcg_css;
+	struct cgroup_subsys_state *memcg_css;
 };
 
 /* Support for loadable transfer modules */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a7a0a1a5c8d5..aeb51f2ded46 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -922,6 +922,12 @@ static inline struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
 	return NULL;
 }
 
+static inline
+struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 }
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 75f687301bbf..c3896c2e0942 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -587,6 +587,7 @@ struct cgroup_subsys_state *cgroup_get_e_css(struct cgroup *cgrp,
 	rcu_read_unlock();
 	return css;
 }
+EXPORT_SYMBOL_GPL(cgroup_get_e_css);
 
 static void cgroup_get_live(struct cgroup *cgrp)
 {
-- 
2.17.1


^ permalink raw reply related

* [PATCH v3 2/3] mm: Charge active memcg when no mm is set
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP CGROUP,
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER MEMCG
In-Reply-To: <cover.1582216294.git.schatzberg.dan@gmail.com>

memalloc_use_memcg() worked for kernel allocations but was silently
ignored for user pages.

This patch establishes a precedence order for who gets charged:

1. If there is a memcg associated with the page already, that memcg is
   charged. This happens during swapin.

2. If an explicit mm is passed, mm->memcg is charged. This happens
   during page faults, which can be triggered in remote VMs (eg gup).

3. Otherwise consult the current process context. If it has configured
   a current->active_memcg, use that. Otherwise, current->mm->memcg.

Previously, if a NULL mm was passed to mem_cgroup_try_charge (case 3) it
would always charge the root cgroup. Now it looks up the current
active_memcg first (falling back to charging the root cgroup if not
set).

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
---
 mm/memcontrol.c | 11 ++++++++---
 mm/shmem.c      |  2 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6f6dc8712e39..b174aff4f069 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6317,7 +6317,8 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
  * @compound: charge the page as compound or small page
  *
  * Try to charge @page to the memcg that @mm belongs to, reclaiming
- * pages according to @gfp_mask if necessary.
+ * pages according to @gfp_mask if necessary. If @mm is NULL, try to
+ * charge to the active memcg.
  *
  * Returns 0 on success, with *@memcgp pointing to the charged memcg.
  * Otherwise, an error code is returned.
@@ -6361,8 +6362,12 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
 		}
 	}
 
-	if (!memcg)
-		memcg = get_mem_cgroup_from_mm(mm);
+	if (!memcg) {
+		if (!mm)
+			memcg = get_mem_cgroup_from_current();
+		else
+			memcg = get_mem_cgroup_from_mm(mm);
+	}
 
 	ret = try_charge(memcg, gfp_mask, nr_pages);
 
diff --git a/mm/shmem.c b/mm/shmem.c
index c8f7540ef048..7c7f5acf89d6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1766,7 +1766,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 	}
 
 	sbinfo = SHMEM_SB(inode->i_sb);
-	charge_mm = vma ? vma->vm_mm : current->mm;
+	charge_mm = vma ? vma->vm_mm : NULL;
 
 	page = find_lock_entry(mapping, index);
 	if (xa_is_value(page)) {
-- 
2.17.1


^ permalink raw reply related

* [PATCH v3 1/3] loop: Use worker per cgroup instead of kworker
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP CGROUP,
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER MEMCG
In-Reply-To: <cover.1582216294.git.schatzberg.dan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Existing uses of loop device may have multiple cgroups reading/writing
to the same device. Simply charging resources for I/O to the backing
file could result in priority inversion where one cgroup gets
synchronously blocked, holding up all other I/O to the loop device.

In order to avoid this priority inversion, we use a single workqueue
where each work item is a "struct loop_worker" which contains a queue of
struct loop_cmds to issue. The loop device maintains a tree mapping blk
css_id -> loop_worker. This allows each cgroup to independently make
forward progress issuing I/O to the backing file.

There is also a single queue for I/O associated with the rootcg which
can be used in cases of extreme memory shortage where we cannot allocate
a loop_worker.

The locking for the tree and queues is fairly heavy handed - we acquire
the per-loop-device spinlock any time either is accessed. The existing
implementation serializes all I/O through a single thread anyways, so I
don't believe this is any worse.

Signed-off-by: Dan Schatzberg <schatzberg.dan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 drivers/block/loop.c | 188 +++++++++++++++++++++++++++++++++++++------
 drivers/block/loop.h |  11 ++-
 2 files changed, 170 insertions(+), 29 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 739b372a5112..78e5005c6742 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -70,7 +70,6 @@
 #include <linux/writeback.h>
 #include <linux/completion.h>
 #include <linux/highmem.h>
-#include <linux/kthread.h>
 #include <linux/splice.h>
 #include <linux/sysfs.h>
 #include <linux/miscdevice.h>
@@ -83,6 +82,8 @@
 
 #include <linux/uaccess.h>
 
+#define LOOP_IDLE_WORKER_TIMEOUT (60 * HZ)
+
 static DEFINE_IDR(loop_index_idr);
 static DEFINE_MUTEX(loop_ctl_mutex);
 
@@ -891,27 +892,101 @@ static void loop_config_discard(struct loop_device *lo)
 
 static void loop_unprepare_queue(struct loop_device *lo)
 {
-	kthread_flush_worker(&lo->worker);
-	kthread_stop(lo->worker_task);
-}
-
-static int loop_kthread_worker_fn(void *worker_ptr)
-{
-	current->flags |= PF_LESS_THROTTLE | PF_MEMALLOC_NOIO;
-	return kthread_worker_fn(worker_ptr);
+	destroy_workqueue(lo->workqueue);
 }
 
 static int loop_prepare_queue(struct loop_device *lo)
 {
-	kthread_init_worker(&lo->worker);
-	lo->worker_task = kthread_run(loop_kthread_worker_fn,
-			&lo->worker, "loop%d", lo->lo_number);
-	if (IS_ERR(lo->worker_task))
+	lo->workqueue = alloc_workqueue("loop%d",
+					WQ_UNBOUND | WQ_FREEZABLE |
+					WQ_MEM_RECLAIM,
+					lo->lo_number);
+	if (IS_ERR(lo->workqueue))
 		return -ENOMEM;
-	set_user_nice(lo->worker_task, MIN_NICE);
+
 	return 0;
 }
 
+struct loop_worker {
+	struct rb_node rb_node;
+	struct work_struct work;
+	struct list_head cmd_list;
+	struct list_head idle_list;
+	struct loop_device *lo;
+	struct cgroup_subsys_state *css;
+	unsigned long last_ran_at;
+};
+
+static void loop_workfn(struct work_struct *work);
+static void loop_rootcg_workfn(struct work_struct *work);
+static void loop_free_idle_workers(struct timer_list *timer);
+
+static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
+{
+	struct rb_node **node = &(lo->worker_tree.rb_node), *parent = NULL;
+	struct loop_worker *cur_worker, *worker = NULL;
+	struct work_struct *work;
+	struct list_head *cmd_list;
+
+	spin_lock_irq(&lo->lo_lock);
+
+	if (!cmd->css)
+		goto queue_work;
+
+	node = &(lo->worker_tree.rb_node);
+
+	while (*node) {
+		parent = *node;
+		cur_worker = container_of(*node, struct loop_worker, rb_node);
+		if ((long)cur_worker->css == (long)cmd->css) {
+			worker = cur_worker;
+			break;
+		} else if ((long)cur_worker->css < (long)cmd->css) {
+			node = &((*node)->rb_left);
+		} else {
+			node = &((*node)->rb_right);
+		}
+	}
+	if (worker)
+		goto queue_work;
+
+	worker = kzalloc(sizeof(struct loop_worker),
+			GFP_NOWAIT | __GFP_NOWARN);
+	/*
+	 * In the event we cannot allocate a worker, just queue on the
+	 * rootcg worker
+	 */
+	if (!worker)
+		goto queue_work;
+
+	worker->css = cmd->css;
+	css_get(worker->css);
+	INIT_WORK(&worker->work, loop_workfn);
+	INIT_LIST_HEAD(&worker->cmd_list);
+	INIT_LIST_HEAD(&worker->idle_list);
+	worker->lo = lo;
+	rb_link_node(&worker->rb_node, parent, node);
+	rb_insert_color(&worker->rb_node, &lo->worker_tree);
+queue_work:
+	if (worker) {
+		/*
+		 * We need to remove from the idle list here while
+		 * holding the lock so that the idle timer doesn't
+		 * free the worker
+		 */
+		if (!list_empty(&worker->idle_list))
+			list_del_init(&worker->idle_list);
+		work = &worker->work;
+		cmd_list = &worker->cmd_list;
+	} else {
+		work = &lo->rootcg_work;
+		cmd_list = &lo->rootcg_cmd_list;
+	}
+	list_add_tail(&cmd->list_entry, cmd_list);
+	queue_work(lo->workqueue, work);
+	spin_unlock_irq(&lo->lo_lock);
+}
+
 static void loop_update_rotational(struct loop_device *lo)
 {
 	struct file *file = lo->lo_backing_file;
@@ -993,6 +1068,12 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 
 	set_device_ro(bdev, (lo_flags & LO_FLAGS_READ_ONLY) != 0);
 
+	INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
+	INIT_LIST_HEAD(&lo->rootcg_cmd_list);
+	INIT_LIST_HEAD(&lo->idle_worker_list);
+	lo->worker_tree = RB_ROOT;
+	timer_setup(&lo->timer, loop_free_idle_workers,
+		TIMER_DEFERRABLE);
 	lo->use_dio = false;
 	lo->lo_device = bdev;
 	lo->lo_flags = lo_flags;
@@ -1155,6 +1236,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
 	partscan = lo->lo_flags & LO_FLAGS_PARTSCAN && bdev;
 	lo_number = lo->lo_number;
 	loop_unprepare_queue(lo);
+	del_timer_sync(&lo->timer);
 out_unlock:
 	mutex_unlock(&loop_ctl_mutex);
 	if (partscan) {
@@ -1932,7 +2014,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	} else
 #endif
 		cmd->css = NULL;
-	kthread_queue_work(&lo->worker, &cmd->work);
+	loop_queue_work(lo, cmd);
 
 	return BLK_STS_OK;
 }
@@ -1958,26 +2040,82 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
 	}
 }
 
-static void loop_queue_work(struct kthread_work *work)
+static void loop_set_timer(struct loop_device *lo)
 {
-	struct loop_cmd *cmd =
-		container_of(work, struct loop_cmd, work);
+	timer_reduce(&lo->timer, jiffies + LOOP_IDLE_WORKER_TIMEOUT);
+}
+
+static void loop_process_work(struct loop_worker *worker,
+			struct list_head *cmd_list, struct loop_device *lo)
+{
+	int orig_flags = current->flags;
+	struct loop_cmd *cmd;
+
+	current->flags |= PF_LESS_THROTTLE | PF_MEMALLOC_NOIO;
+	while (1) {
+		spin_lock_irq(&lo->lo_lock);
+		if (list_empty(cmd_list))
+			break;
+
+		cmd = container_of(
+			cmd_list->next, struct loop_cmd, list_entry);
+		list_del(cmd_list->next);
+		spin_unlock_irq(&lo->lo_lock);
+		loop_handle_cmd(cmd);
+		cond_resched();
+	}
 
-	loop_handle_cmd(cmd);
+	/*
+	 * We only add to the idle list if there are no pending cmds
+	 * *and* the worker will not run again which ensures that it
+	 * is safe to free any worker on the idle list
+	 */
+	if (worker && !work_pending(&worker->work)) {
+		worker->last_ran_at = jiffies;
+		list_add_tail(&worker->idle_list, &lo->idle_worker_list);
+		loop_set_timer(lo);
+	}
+	spin_unlock_irq(&lo->lo_lock);
+	current->flags = orig_flags;
 }
 
-static int loop_init_request(struct blk_mq_tag_set *set, struct request *rq,
-		unsigned int hctx_idx, unsigned int numa_node)
+static void loop_workfn(struct work_struct *work)
 {
-	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
+	struct loop_worker *worker =
+		container_of(work, struct loop_worker, work);
+	loop_process_work(worker, &worker->cmd_list, worker->lo);
+}
 
-	kthread_init_work(&cmd->work, loop_queue_work);
-	return 0;
+static void loop_rootcg_workfn(struct work_struct *work)
+{
+	struct loop_device *lo =
+		container_of(work, struct loop_device, rootcg_work);
+	loop_process_work(NULL, &lo->rootcg_cmd_list, lo);
+}
+
+static void loop_free_idle_workers(struct timer_list *timer)
+{
+	struct loop_device *lo = container_of(timer, struct loop_device, timer);
+	struct loop_worker *pos, *worker;
+
+	spin_lock_irq(&lo->lo_lock);
+	list_for_each_entry_safe(worker, pos, &lo->idle_worker_list,
+				idle_list) {
+		if (time_is_after_jiffies(worker->last_ran_at +
+						LOOP_IDLE_WORKER_TIMEOUT))
+			break;
+		list_del(&worker->idle_list);
+		rb_erase(&worker->rb_node, &lo->worker_tree);
+		css_put(worker->css);
+		kfree(worker);
+	}
+	if (!list_empty(&lo->idle_worker_list))
+		loop_set_timer(lo);
+	spin_unlock_irq(&lo->lo_lock);
 }
 
 static const struct blk_mq_ops loop_mq_ops = {
 	.queue_rq       = loop_queue_rq,
-	.init_request	= loop_init_request,
 	.complete	= lo_complete_rq,
 };
 
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index af75a5ee4094..87fd0e372227 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -14,7 +14,6 @@
 #include <linux/blk-mq.h>
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
-#include <linux/kthread.h>
 #include <uapi/linux/loop.h>
 
 /* Possible states of device */
@@ -54,8 +53,12 @@ struct loop_device {
 
 	spinlock_t		lo_lock;
 	int			lo_state;
-	struct kthread_worker	worker;
-	struct task_struct	*worker_task;
+	struct workqueue_struct *workqueue;
+	struct work_struct      rootcg_work;
+	struct list_head        rootcg_cmd_list;
+	struct list_head        idle_worker_list;
+	struct rb_root          worker_tree;
+	struct timer_list       timer;
 	bool			use_dio;
 	bool			sysfs_inited;
 
@@ -65,7 +68,7 @@ struct loop_device {
 };
 
 struct loop_cmd {
-	struct kthread_work work;
+	struct list_head list_entry;
 	bool use_aio; /* use AIO interface to handle I/O */
 	atomic_t ref; /* only for aio */
 	long ret;
-- 
2.17.1


^ permalink raw reply related

* [PATCH v3 0/3] Charge loop device i/o to issuing cgroup
From: Dan Schatzberg @ 2020-02-20 16:51 UTC (permalink / raw)
  Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
	Roman Gushchin, Shakeel Butt, Chris Down, Yang Shi,
	Thomas Gleixner, open list:BLOCK LAYER, open list,
	open list:CONTROL GROUP CGROUP,
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER MEMCG

Changes since V2:

* Deferred destruction of workqueue items so in the common case there
  is no allocation needed

Changes since V1:

* Split out and reordered patches so cgroup charging changes are
  separate from kworker -> workqueue change

* Add mem_css to struct loop_cmd to simplify logic

The loop device runs all i/o to the backing file on a separate kworker
thread which results in all i/o being charged to the root cgroup. This
allows a loop device to be used to trivially bypass resource limits
and other policy. This patch series fixes this gap in accounting.

A simple script to demonstrate this behavior on cgroupv2 machine:

'''
#!/bin/bash
set -e

CGROUP=/sys/fs/cgroup/test.slice
LOOP_DEV=/dev/loop0

if [[ ! -d $CGROUP ]]
then
    sudo mkdir $CGROUP
fi

grep oom_kill $CGROUP/memory.events

# Set a memory limit, write more than that limit to tmpfs -> OOM kill
sudo unshare -m bash -c "
echo \$\$ > $CGROUP/cgroup.procs;
echo 0 > $CGROUP/memory.swap.max;
echo 64M > $CGROUP/memory.max;
mount -t tmpfs -o size=512m tmpfs /tmp;
dd if=/dev/zero of=/tmp/file bs=1M count=256" || true

grep oom_kill $CGROUP/memory.events

# Set a memory limit, write more than that limit through loopback
# device -> no OOM kill
sudo unshare -m bash -c "
echo \$\$ > $CGROUP/cgroup.procs;
echo 0 > $CGROUP/memory.swap.max;
echo 64M > $CGROUP/memory.max;
mount -t tmpfs -o size=512m tmpfs /tmp;
truncate -s 512m /tmp/backing_file
losetup $LOOP_DEV /tmp/backing_file
dd if=/dev/zero of=$LOOP_DEV bs=1M count=256;
losetup -D $LOOP_DEV" || true

grep oom_kill $CGROUP/memory.events
'''

Naively charging cgroups could result in priority inversions through
the single kworker thread in the case where multiple cgroups are
reading/writing to the same loop device. This patch series does some
minor modification to the loop driver so that each cgroup can make
forward progress independently to avoid this inversion.

With this patch series applied, the above script triggers OOM kills
when writing through the loop device as expected.

Dan Schatzberg (3):
  loop: Use worker per cgroup instead of kworker
  mm: Charge active memcg when no mm is set
  loop: Charge i/o to mem and blk cg

 drivers/block/loop.c       | 229 +++++++++++++++++++++++++++++++------
 drivers/block/loop.h       |  14 ++-
 include/linux/memcontrol.h |   6 +
 kernel/cgroup/cgroup.c     |   1 +
 mm/memcontrol.c            |  11 +-
 mm/shmem.c                 |   2 +-
 6 files changed, 217 insertions(+), 46 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] drm/i915: Double check bumping after the spinlock
From: Patchwork @ 2020-02-20 16:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx
In-Reply-To: <20200220123608.1666271-1-chris@chris-wilson.co.uk>

== Series Details ==

Series: series starting with [1/2] drm/i915: Double check bumping after the spinlock
URL   : https://patchwork.freedesktop.org/series/73707/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
00c160893898 drm/i915: Double check bumping after the spinlock
-:10: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#10: 
References: a79ca656b648 ("drm/i915: Push the wakeref->count deferral to the backend")

-:10: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit a79ca656b648 ("drm/i915: Push the wakeref->count deferral to the backend")'
#10: 
References: a79ca656b648 ("drm/i915: Push the wakeref->count deferral to the backend")

total: 1 errors, 1 warnings, 0 checks, 9 lines checked
e5bc0f843da3 drm/i915: Protect i915_request_await_start from early waits

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.