From: Michal Hocko <mhocko@kernel.org>
To: Ilya Dryomov <idryomov@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
stable@vger.kernel.org,
Sergey Jerusalimov <wintchester@gmail.com>,
Jeff Layton <jlayton@redhat.com>,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4.4 48/76] libceph: force GFP_NOIO for socket allocations
Date: Thu, 30 Mar 2017 13:21:26 +0200 [thread overview]
Message-ID: <20170330112126.GE1972@dhcp22.suse.cz> (raw)
In-Reply-To: <CAOi1vP8z4hngZecp6MoOOhKsLadZ5eJbQ92MvAGBbqdN03CfPw@mail.gmail.com>
On Thu 30-03-17 12:02:03, Ilya Dryomov wrote:
> On Thu, Mar 30, 2017 at 8:25 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Wed 29-03-17 16:25:18, Ilya Dryomov wrote:
[...]
> >> We got rid of osdc->request_mutex in 4.7, so these workers are almost
> >> independent in newer kernels and should be able to free up memory for
> >> those blocked on GFP_NOIO retries with their respective con->mutex
> >> held. Using GFP_KERNEL and thus allowing the recursion is just asking
> >> for an AA deadlock on con->mutex OTOH, so it does make a difference.
> >
> > You keep saying this but so far I haven't heard how the AA deadlock is
> > possible. Both GFP_KERNEL and GFP_NOIO can stall for an unbounded amount
> > of time and that would cause you problems AFAIU.
>
> Suppose we have an I/O for OSD X, which means it's got to go through
> ceph_connection X:
>
> ceph_con_workfn
> mutex_lock(&con->mutex)
> try_write
> ceph_tcp_connect
> sock_create_kern
> GFP_KERNEL allocation
>
> Suppose that generates another I/O for OSD X and blocks on it.
Yeah, I have understand that but I am asking _who_ is going to generate
that IO. We do not do writeback from the direct reclaim path. I am not
familiar with Ceph at all but does any of its (slab) shrinkers generate
IO to recurse back?
> Well,
> it's got to go through the same ceph_connection:
>
> rbd_queue_workfn
> ceph_osdc_start_request
> ceph_con_send
> mutex_lock(&con->mutex) # deadlock, OSD X worker is knocked out
>
> Now if that was a GFP_NOIO allocation, we would simply block in the
> allocator. The placement algorithm distributes objects across the OSDs
> in a pseudo-random fashion, so even if we had a whole bunch of I/Os for
> that OSD, some other I/Os for other OSDs would complete in the meantime
> and free up memory. If we are under the kind of memory pressure that
> makes GFP_NOIO allocations block for an extended period of time, we are
> bound to have a lot of pre-open sockets, as we would have done at least
> some flushing by then.
How is this any different from xfs waiting for its IO to be done?
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2017-03-30 11:21 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20170328122559.966310440@linuxfoundation.org>
[not found] ` <20170328122601.905696872@linuxfoundation.org>
[not found] ` <20170328124312.GE18241@dhcp22.suse.cz>
[not found] ` <CAOi1vP-TeEwNM8n=Z5b6yx1epMDVJ4f7+S1poubA7zfT7L0hQQ@mail.gmail.com>
[not found] ` <20170328133040.GJ18241@dhcp22.suse.cz>
[not found] ` <CAOi1vP-doHSj8epQ1zLBnEi8QM4Eb7nFb5uo-XeUquZUkhacsg@mail.gmail.com>
2017-03-29 10:41 ` [PATCH 4.4 48/76] libceph: force GFP_NOIO for socket allocations Michal Hocko
2017-03-29 10:55 ` Michal Hocko
2017-03-29 11:10 ` Ilya Dryomov
2017-03-29 11:16 ` Michal Hocko
2017-03-29 14:25 ` Ilya Dryomov
2017-03-30 6:25 ` Michal Hocko
2017-03-30 10:02 ` Ilya Dryomov
2017-03-30 11:21 ` Michal Hocko [this message]
2017-03-30 13:48 ` Ilya Dryomov
2017-03-30 14:36 ` Michal Hocko
2017-03-30 15:06 ` Ilya Dryomov
2017-03-30 16:12 ` Michal Hocko
2017-03-30 17:19 ` Ilya Dryomov
2017-03-30 18:44 ` Michal Hocko
2017-03-30 13:53 ` Ilya Dryomov
2017-03-30 13:59 ` Michal Hocko
2017-03-29 11:05 ` Brian Foster
2017-03-29 11:14 ` Ilya Dryomov
2017-03-29 11:18 ` Michal Hocko
2017-03-29 11:49 ` Brian Foster
2017-03-29 14:30 ` Ilya Dryomov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170330112126.GE1972@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=idryomov@gmail.com \
--cc=jlayton@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=wintchester@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).