xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Jana Saout <jana@saout.de>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: xen-devel@lists.xensource.com
Subject: Re: Self-ballooning question / cache issue
Date: Wed, 02 May 2012 12:13:52 +0200	[thread overview]
Message-ID: <1335953632.3599.16.camel@localhost> (raw)
In-Reply-To: <d9a09803-cfe6-4c69-b069-adb51e2d2848@default>

Hi Dan,

> > I have been testing autoballooning on a production Xen system today
> > (with cleancache + frontswap on Xen-provided tmem).  For most of the
> > idle or CPU-centric VMs it seems to work just fine.
> > 
> > However, on one of the web-serving VMs, there is also a cron job running
> > every few minutes which runs over a rather large directory (plus, this
> > directory is on OCFS2 so this is a rather time-consuming process).  Now,
> > if the dcache/inode cache is large enough (which it was before, since
> > the VM got allocated 4 GB and is only using 1-2 most of the time), this
> > was not a problem.
> > 
> > Now, with self-ballooning, the memory gets reduced to somewhat between 1
> > and 2 GB and after a few minutes the load is going through the ceiling.
> > Jobs reading through said directories are piling up (stuck in D state,
> > waiting for the FS).  And most of the time kswapd is spinning at 100%.
> > If I deactivate self-ballooning and assign the VM 3 GB, everything goes
> > back to normal after a few minutes. (and, "ls -l" on said directory is
> > served from the cache again).
> > 
> > Now, I am aware that said problem is a self-made one.  The directory was
> > not actually supposed to contain that many files and the next job not
> > waiting for the previous job to terminate is cause for trouble - but
> > still, I would consider this a possible regression since it seems
> > self-ballooning is constantly thrashing the VM's caches.  Not all caches
> > can be saved in cleancache.
> > 
> > What about an additional tunable: a user-specified amount of pages that
> > is added on top of the computed target number of pages?  This way, one
> > could manually reserve a bit more room for other types of caches. (in
> > fact, I might try this myself, since it shouldn't be too hard to do so)
> > 
> > Any opinions on this?
> 
> Thanks for doing this analysis.  While your workload is a bit
> unusual, I agree that you have exposed a problem that will need
> to be resolved.  It was observed three years ago that the next
> "frontend" for tmem could handle a cleancache-like mechanism
> for the dcache.  Until now, I had thought that this was purely
> optional and would yield only a small performance improvement.
> But with your workload, I think the combination of the facts that
> selfballooning is forcing out dcache entries and they aren't
> being saved in tmem is resulting in the problem you are seeing.

Yes.  In fact, I've been rolling out selfballooning across a development
system and most VMs were just fine with the default.  The overall memory
savings from going from a static to a dynamic memory allocation is quite
significant - without the VMs having to resort to actual to-disk-paging
when there is a sudden increase in memory usage.  Quite nice.

Just for information: The filesystem which this machine was using is
OCFS2 (shared across 5 VMs) and the directory contains 45k files
(*cough* - I'm aware that's not optimal, I'm currently talking to the
dev of that application to not scan the entire list of files every
minute) - which takes a few minutes (especially stat'ing every file).

I have been observing, that kswapd seems rather busy at times on some
VMs, even when there is no actual swapping taking place. (or, could it
be frontswap or just page reclaim?) This can be migitated by increasing
the memory reserve a bit using my trivial test patch (see below).

> I think the best solution for this will be a "cleandcache"
> patch in the Linux guest... but given how long it has taken
> to get cleancache and frontswap into the kernel (and the fact
> that a working cleandcache patch doesn't even exist yet), I
> wouldn't hold my breath ;-)  I will put it on the "to do"
> list though.

That sounds nice!

> Your idea of the tunable is interesting (and patches are always
> welcome!) but I am skeptical that it will solve the problem
> since I would guess the Linux kernel is shrinking dcache
> proportional to the size of the page cache.  So adding more
> RAM with your "user-specified amount of pages that is
> added on top of the computed target number of pages",
> the RAM will still be shared across all caches and only
> some small portion of the added RAM will likely be used
> for dcache.

That's true.  In fact, I have to add about 1 GB of memory in order to
keep the relevant dcache / inode cache entries to stay in the cache.
When I do that the largest portion of memory is still eaten up by the
regular page cache.  So this is more of a workaround than a solution,
but for now it works.

I've attached the simple patch I've whipped up below.

> However, if you have a chance to try it, I would be interested
> in your findings.  Note that you already can set a
> permanent floor for selfballooning ("min_usable_mb") or,
> of course, just turn off selfballooning altogether.

Sure, that's always a possibility.  However, the VM already had an
overly large amount of memory before to avoid the problem.  Now it runs
with less memory (still a bit more than required), and when a load spike
comes, it can quickly balloon up, which is exactly what I was looking
for.

	Jana

----
Author: Jana Saout <jana@saout.de>
Date:   Sun Apr 29 22:09:29 2012 +0200

    Add selfballoning memory reservation tunable.

diff --git a/drivers/xen/xen-selfballoon.c b/drivers/xen/xen-selfballoon.c
index 146c948..7d041cb 100644
--- a/drivers/xen/xen-selfballoon.c
+++ b/drivers/xen/xen-selfballoon.c
@@ -105,6 +105,12 @@ static unsigned int selfballoon_interval __read_mostly = 5;
  */
 static unsigned int selfballoon_min_usable_mb;
 
+/*
+ * Amount of RAM in MB to add to the target number of pages.
+ * Can be used to reserve some more room for caches and the like.
+ */
+static unsigned int selfballoon_reserved_mb;
+
 static void selfballoon_process(struct work_struct *work);
 static DECLARE_DELAYED_WORK(selfballoon_worker, selfballoon_process);
 
@@ -217,7 +223,8 @@ static void selfballoon_process(struct work_struct *work)
 		cur_pages = totalram_pages;
 		tgt_pages = cur_pages; /* default is no change */
 		goal_pages = percpu_counter_read_positive(&vm_committed_as) +
-				totalreserve_pages;
+				totalreserve_pages +
+				MB2PAGES(selfballoon_reserved_mb);
 #ifdef CONFIG_FRONTSWAP
 		/* allow space for frontswap pages to be repatriated */
 		if (frontswap_selfshrinking && frontswap_enabled)
@@ -397,6 +404,30 @@ static DEVICE_ATTR(selfballoon_min_usable_mb, S_IRUGO | S_IWUSR,
 		   show_selfballoon_min_usable_mb,
 		   store_selfballoon_min_usable_mb);
 
+SELFBALLOON_SHOW(selfballoon_reserved_mb, "%d\n",
+				selfballoon_reserved_mb);
+
+static ssize_t store_selfballoon_reserved_mb(struct device *dev,
+					     struct device_attribute *attr,
+					     const char *buf,
+					     size_t count)
+{
+	unsigned long val;
+	int err;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+	err = strict_strtoul(buf, 10, &val);
+	if (err)
+		return -EINVAL;
+	selfballoon_reserved_mb = val;
+	return count;
+}
+
+static DEVICE_ATTR(selfballoon_reserved_mb, S_IRUGO | S_IWUSR,
+		   show_selfballoon_reserved_mb,
+		   store_selfballoon_reserved_mb);
+
 
 #ifdef CONFIG_FRONTSWAP
 SELFBALLOON_SHOW(frontswap_selfshrinking, "%d\n", frontswap_selfshrinking);
@@ -480,6 +511,7 @@ static struct attribute *selfballoon_attrs[] = {
 	&dev_attr_selfballoon_downhysteresis.attr,
 	&dev_attr_selfballoon_uphysteresis.attr,
 	&dev_attr_selfballoon_min_usable_mb.attr,
+	&dev_attr_selfballoon_reserved_mb.attr,
 #ifdef CONFIG_FRONTSWAP
 	&dev_attr_frontswap_selfshrinking.attr,
 	&dev_attr_frontswap_hysteresis.attr,

  reply	other threads:[~2012-05-02 10:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-29 19:34 Self-ballooning question / cache issue Jana Saout
2012-05-01 16:52 ` Dan Magenheimer
2012-05-02 10:13   ` Jana Saout [this message]
2012-05-02 17:51     ` Dan Magenheimer
2012-05-10 14:42       ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1335953632.3599.16.camel@localhost \
    --to=jana@saout.de \
    --cc=dan.magenheimer@oracle.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).