From: Mel Gorman <mgorman@suse.de>
To: Ilya Dryomov <idryomov@gmail.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>,
Mike Christie <michaelc@cs.wisc.edu>, Sage Weil <sage@redhat.com>
Subject: Re: [PATCH] libceph: don't set memalloc flags in loopback case
Date: Fri, 3 Apr 2015 11:34:23 +0100 [thread overview]
Message-ID: <20150403103423.GW4701@suse.de> (raw)
In-Reply-To: <CAOi1vP_VYDQ1+CjzGG4kZx5C2K97Vhv63w2s9edLE4rnZ2pTMA@mail.gmail.com>
On Thu, Apr 02, 2015 at 11:35:35AM +0300, Ilya Dryomov wrote:
> On Thu, Apr 2, 2015 at 8:41 AM, Mel Gorman <mgorman@suse.de> wrote:
> > On Thu, Apr 02, 2015 at 02:40:19AM +0300, Ilya Dryomov wrote:
> >> On Thu, Apr 2, 2015 at 2:03 AM, Mel Gorman <mgorman@suse.de> wrote:
> >> > On Wed, Apr 01, 2015 at 08:19:20PM +0300, Ilya Dryomov wrote:
> >> >> Following nbd and iscsi, commit 89baaa570ab0 ("libceph: use memalloc
> >> >> flags for net IO") set SOCK_MEMALLOC and PF_MEMALLOC flags for rbd and
> >> >> cephfs. However it turned out to not play nice with loopback scenario,
> >> >> leading to lockups with a full socket send-q and empty recv-q.
> >> >>
> >> >> While we always advised against colocating kernel client and ceph
> >> >> servers on the same box, a few people are doing it and it's also useful
> >> >> for light development testing, so rather than reverting make sure to
> >> >> not set those flags in the loopback case.
> >> >>
> >> >
> >> > This does not clarify why the non-loopback case needs access to pfmemalloc
> >> > reserves. Granted, I've spent zero time on this but it's really unclear
> >> > what problem was originally tried to be solved and why dirty page limiting
> >> > was insufficient. Swap over NFS was always a very special case minimally
> >> > because it's immune to dirty page throttling.
> >>
> >> I don't think there was any particular problem tried to be solved,
> >
> > Then please go back and look at why dirty page limiting is insufficient
> > for ceph.
> >
> >> certainly not one we hit and fixed with 89baaa570ab0. Mike is out this
> >> week, but I'm pretty sure he said he copied this for iscsi from nbd
> >> because you nudged him to (and you yourself did this for nbd as part of
> >> swap-over-NFS series).
> >
> > In http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/23708 I
> > stated that if ceph insisted on using using nbd as justification for ceph
> > using __GFP_MEMALLOC that it was preferred that nbd be broken instead. In
> > commit 7f338fe4540b1d0600b02314c7d885fd358e9eca, the use case in mind was
> > the swap-over-nbd case and I regret I didn't have userspace explicitly
> > tell the kernel that NBD was being used as a swap device.
>
> OK, it all starts to make sense now. So ideally nbd would only use
> __GFP_MEMALLOC if nbd-client was invoked with -swap - you just didn't
> implement that.
Yes.
> I think ceph is fine with dirty page limiting in general,
Then I suggest removing ceph's usage of __GFP_MEMALLOC until there is a
genuine problem that dirty page limiting is unable to handle. Dirty page
limiting might stall in some cases but worst case for __GFP_MEMALLOC abuse
is a livelocked machine.
> so it's only
> if we wanted to support swap-over-rbd (cephfs is a bit of a weak link
> currently, so I'm not going there) would we need to enable
> SOCK_MEMALLOC/PF_MEMALLOC and only for that ceph_client instance.
Yes.
> Sounds like that will require a "swap" libceph option, which will also
> implicitly enable "noshare" to make sure __GFP_MEMALLOC ceph_client is
> not shared with anything else - luckily we don't have a userspace
> process a la nbd-client we need to worry about.
>
I'm not familiar enough with the ins and outs of rbd to know what sort
of implementation hazards might be encountered.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2015-04-03 10:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-01 17:19 [PATCH] libceph: don't set memalloc flags in loopback case Ilya Dryomov
2015-04-01 23:03 ` Mel Gorman
2015-04-01 23:40 ` Ilya Dryomov
2015-04-02 5:41 ` Mel Gorman
2015-04-02 8:35 ` Ilya Dryomov
2015-04-03 10:34 ` Mel Gorman [this message]
2015-04-03 20:03 ` Mike Christie
2015-04-07 12:35 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150403103423.GW4701@suse.de \
--to=mgorman@suse.de \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=michaelc@cs.wisc.edu \
--cc=sage@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.