linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	darrick.wong@oracle.com, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org
Subject: Re: [PATCH RFC] xfs, memcg: Call xfs_fs_nr_cached_objects() only in case of global reclaim
Date: Thu, 22 Mar 2018 16:01:52 +1100	[thread overview]
Message-ID: <20180322050152.GA18129@dastard> (raw)
In-Reply-To: <5eb88165-c108-5989-3cb5-1f260cec5e53@virtuozzo.com>

On Wed, Mar 21, 2018 at 07:15:14PM +0300, Kirill Tkhai wrote:
> On 20.03.2018 17:34, Dave Chinner wrote:
> > On Tue, Mar 20, 2018 at 04:15:16PM +0300, Kirill Tkhai wrote:
> >> On 20.03.2018 03:18, Dave Chinner wrote:
> >>> On Mon, Mar 19, 2018 at 02:06:01PM +0300, Kirill Tkhai wrote:
> >>>> On 17.03.2018 00:39, Dave Chinner wrote:
> >>> Actually, it is fair, because:
> >>>
> >>>         /* proportion the scan between the caches */
> >>>         dentries = mult_frac(sc->nr_to_scan, dentries, total_objects);
> >>>         inodes = mult_frac(sc->nr_to_scan, inodes, total_objects);
> >>>         fs_objects = mult_frac(sc->nr_to_scan, fs_objects, total_objects);
> >>>
> >>> This means that if the number of objects in the memcg aware VFS
> >>> caches are tiny compared to the global XFS inode cache, then they
> >>> only get a tiny amount of the total scanning done by the shrinker.
> >>
> >> This is just wrong. If you have two memcgs, the first is charged
> >> by xfs dentries and inodes, while the second is not charged by xfs
> >> dentries and inodes, the second will response for xfs shrink as well
> >> as the first.
> > 
> > That makes no sense to me. Shrinkers are per-sb, so they only
> > shrink per-filesystem dentries and inodes. If your memcgs are on
> > different filesystems, then they'll behave according to the size of
> > that filesystem's caches, not the cache of some other filesystem.
> 
> But this break the isolation purpose of memcg. When someone places
> different services in different pairs of memcg and cpu cgroup, he
> wants do not do foreign work.

Too bad. Filesystems *break memcg isolation all the time*.

Think about it. The filesystem journal is owned by the filesystem,
not the memcg that is writing a transaction to it.  If one memcg
does lots of metadata modifications, it can use all the journal
space and starve iitself and all other memcgs of journal space while
the filesystem does writeback to clear it out.

Another example: if one memcg is manipulating free space in a
particular allocation group, other memcgs that need to manipulate
space in those allocation groups will block and be prevented from
operating until the other memcg finishes it's work.

Another example: inode IO in XFS are done in clusters of 32. read or
write any inode in the cluster, and we read/write all the other
inodes in that cluster, too. Hence if we need to read an inode and
the cluster is busy because the filesystem journal needed flushing
or another inode in the cluster is being unlinked (e.g. by a process
in a different memcg), then the read in the first memcg will block
until whatever is being done on the buffer is complete. IOWs, even
at the physical on-disk inode level we violate memcg resource 
isolation principles.

I can go on, but I think you get the picture: Filesystems are full
of global structures whose access arbitration mechanisms
fundamentally break the concept of operational memcg isolation.

With that in mind, we really don't care that the shrinker does
global work and violate memcg isolation principles because we
violately them everywhere. IOWs, if we try to enforce memcg
isolation in the shrinker, then we can't guarantee forwards progress
in memory reclaim because we end up with multi-dimensional memcg
dependencies at the physical layers of the filesystem structure....

I don't expect people who know nothing about XFS or filesystems to
understand the complexity of the interactions we are dealing with
here. Everything is a compromise when it comes to the memory reclaim
code as tehre are so many corner cases we have to handle. In this
situation, perfect is the enemy of good....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-03-22  5:02 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-15 15:01 [PATCH RFC] xfs, memcg: Call xfs_fs_nr_cached_objects() only in case of global reclaim Kirill Tkhai
2018-03-15 15:53 ` Darrick J. Wong
2018-03-15 16:06   ` Kirill Tkhai
2018-03-15 17:49 ` Michal Hocko
2018-03-15 19:28   ` Kirill Tkhai
2018-03-15 19:32     ` Michal Hocko
2018-03-15 19:42       ` Kirill Tkhai
2018-03-15 23:03     ` Dave Chinner
2018-03-16  8:55       ` Kirill Tkhai
2018-03-16 21:39         ` Dave Chinner
2018-03-19 11:06           ` Kirill Tkhai
2018-03-19 11:25             ` Kirill Tkhai
2018-03-20  0:18             ` Dave Chinner
2018-03-20 13:15               ` Kirill Tkhai
2018-03-20 14:34                 ` Dave Chinner
2018-03-21 16:15                   ` Kirill Tkhai
2018-03-22  5:01                     ` Dave Chinner [this message]
2018-03-22 16:52                       ` Kirill Tkhai
2018-03-22 23:46                         ` Dave Chinner
2018-03-23 12:39                           ` Kirill Tkhai
2018-03-26  2:37                             ` Dave Chinner
2018-03-26 11:16                               ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180322050152.GA18129@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=darrick.wong@oracle.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).