From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Qian Cai <cai-J5quhbR+WMc@public.gmane.org>
Cc: Dan Schatzberg
<schatzberg.dan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Vladimir Davydov
<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>,
Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Chris Down <chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org>,
Yang Shi
<yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>,
Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
"open list:BLOCK LAYER"
<linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
open list <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"open list:CONTROL GROUP (CGROUP)"
<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)"
<linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>
Subject: Re: [PATCH v3 1/3] loop: Use worker per cgroup instead of kworker
Date: Thu, 27 Feb 2020 13:14:30 -0500 [thread overview]
Message-ID: <20200227181430.GA44024@cmpxchg.org> (raw)
In-Reply-To: <1582736558.7365.131.camel-J5quhbR+WMc@public.gmane.org>
On Wed, Feb 26, 2020 at 12:02:38PM -0500, Qian Cai wrote:
> On Mon, 2020-02-24 at 17:17 -0500, Dan Schatzberg wrote:
> > Existing uses of loop device may have multiple cgroups reading/writing
> > to the same device. Simply charging resources for I/O to the backing
> > file could result in priority inversion where one cgroup gets
> > synchronously blocked, holding up all other I/O to the loop device.
> >
> > In order to avoid this priority inversion, we use a single workqueue
> > where each work item is a "struct loop_worker" which contains a queue of
> > struct loop_cmds to issue. The loop device maintains a tree mapping blk
> > css_id -> loop_worker. This allows each cgroup to independently make
> > forward progress issuing I/O to the backing file.
> >
> > There is also a single queue for I/O associated with the rootcg which
> > can be used in cases of extreme memory shortage where we cannot allocate
> > a loop_worker.
> >
> > The locking for the tree and queues is fairly heavy handed - we acquire
> > the per-loop-device spinlock any time either is accessed. The existing
> > implementation serializes all I/O through a single thread anyways, so I
> > don't believe this is any worse.
> >
> > Signed-off-by: Dan Schatzberg <schatzberg.dan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
>
> The locking in loop_free_idle_workers() will trigger this with sysfs reading,
>
> [ 7080.047167] LTP: starting read_all_sys (read_all -d /sys -q -r 10)
> [ 7239.842276] cpufreq transition table exceeds PAGE_SIZE. Disabling
>
> [ 7247.054961] =====================================================
> [ 7247.054971] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
> [ 7247.054983] 5.6.0-rc3-next-20200226 #2 Tainted: G O
> [ 7247.054992] -----------------------------------------------------
> [ 7247.055002] read_all/8513 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> [ 7247.055014] c0000006844864c8 (&fs->seq){+.+.}, at: file_path+0x24/0x40
> [ 7247.055041]
> and this task is already holding:
> [ 7247.055061] c0002006bab8b928 (&(&lo->lo_lock)->rlock){..-.}, at:
> loop_attr_do_show_backing_file+0x3c/0x120 [loop]
> [ 7247.055078] which would create a new lock dependency:
> [ 7247.055105] (&(&lo->lo_lock)->rlock){..-.} -> (&fs->seq){+.+.}
> [ 7247.055125]
> but this new dependency connects a SOFTIRQ-irq-safe lock:
> [ 7247.055155] (&(&lo->lo_lock)->rlock){..-.}
> [ 7247.055156]
> ... which became SOFTIRQ-irq-safe at:
> [ 7247.055196] lock_acquire+0x130/0x360
> [ 7247.055221] _raw_spin_lock_irq+0x68/0x90
> [ 7247.055230] loop_free_idle_workers+0x44/0x3f0 [loop]
> [ 7247.055242] call_timer_fn+0x110/0x5f0
> [ 7247.055260] run_timer_softirq+0x8f8/0x9f0
> [ 7247.055278] __do_softirq+0x34c/0x8c8
> [ 7247.055288] irq_exit+0x16c/0x1d0
> [ 7247.055298] timer_interrupt+0x1f0/0x680
> [ 7247.055308] decrementer_common+0x124/0x130
> [ 7247.055328] arch_local_irq_restore.part.8+0x34/0x90
> [ 7247.055352] cpuidle_enter_state+0x11c/0x8f0
> [ 7247.055361] cpuidle_enter+0x50/0x70
> [ 7247.055389] call_cpuidle+0x4c/0x90
> [ 7247.055398] do_idle+0x378/0x470
> [ 7247.055414] cpu_startup_entry+0x3c/0x40
> [ 7247.055442] start_secondary+0x7a8/0xa80
> [ 7247.055461] start_secondary_prolog+0x10/0x14
That's kind of hilarious.
So even though it's a spin_lock_irq(), suggesting it's used from both
process and irq context, Dan appears to be adding the first user that
actually runs from irq context. It looks like it should have been a
regular spinlock all along. Until now, anyway.
Fixing it should be straight-forward. Use get_file() under lock to pin
the file, drop the lock to do file_path(), release file with fput().
next prev parent reply other threads:[~2020-02-27 18:14 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-24 22:17 [PATCH v3 0/3] Charge loop device i/o to issuing cgroup Dan Schatzberg
[not found] ` <cover.1582581887.git.schatzberg.dan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2020-02-24 22:17 ` [PATCH v3 1/3] loop: Use worker per cgroup instead of kworker Dan Schatzberg
2020-02-26 17:02 ` Qian Cai
[not found] ` <1582736558.7365.131.camel-J5quhbR+WMc@public.gmane.org>
2020-02-27 18:14 ` Johannes Weiner [this message]
2020-02-24 22:17 ` [PATCH v3 2/3] mm: Charge active memcg when no mm is set Dan Schatzberg
2020-02-24 22:26 ` Hugh Dickins
2020-02-24 22:17 ` [PATCH v3 3/3] loop: Charge i/o to mem and blk cg Dan Schatzberg
-- strict thread matches above, loose matches on Subject: below --
2020-02-20 16:51 [PATCH v3 0/3] Charge loop device i/o to issuing cgroup Dan Schatzberg
[not found] ` <cover.1582216294.git.schatzberg.dan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2020-02-20 16:51 ` [PATCH v3 1/3] loop: Use worker per cgroup instead of kworker Dan Schatzberg
2020-02-20 17:50 ` Johannes Weiner
2020-02-20 22:00 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200227181430.GA44024@cmpxchg.org \
--to=hannes-druugvl0lcnafugrpc6u6w@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=cai-J5quhbR+WMc@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org \
--cc=guro-b10kYP2dOMg@public.gmane.org \
--cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
--cc=mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=schatzberg.dan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).