From: Johannes Weiner <hannes@cmpxchg.org>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>,
Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] writeback: Record if the congestion was unnecessary
Date: Mon, 30 Aug 2010 15:19:29 +0200 [thread overview]
Message-ID: <20100830131929.GA28652@cmpxchg.org> (raw)
In-Reply-To: <20100827092415.GB19556@csn.ul.ie>
[-- Attachment #1: Type: text/plain, Size: 5340 bytes --]
On Fri, Aug 27, 2010 at 10:24:16AM +0100, Mel Gorman wrote:
> On Fri, Aug 27, 2010 at 10:16:48AM +0200, Johannes Weiner wrote:
> > On Thu, Aug 26, 2010 at 09:31:30PM +0100, Mel Gorman wrote:
> > > On Thu, Aug 26, 2010 at 08:29:04PM +0200, Johannes Weiner wrote:
> > > > On Thu, Aug 26, 2010 at 04:14:15PM +0100, Mel Gorman wrote:
> > > > > If congestion_wait() is called when there is no congestion, the caller
> > > > > will wait for the full timeout. This can cause unreasonable and
> > > > > unnecessary stalls. There are a number of potential modifications that
> > > > > could be made to wake sleepers but this patch measures how serious the
> > > > > problem is. It keeps count of how many congested BDIs there are. If
> > > > > congestion_wait() is called with no BDIs congested, the tracepoint will
> > > > > record that the wait was unnecessary.
> > > >
> > > > I am not convinced that unnecessary is the right word. On a workload
> > > > without any IO (i.e. no congestion_wait() necessary, ever), I noticed
> > > > the VM regressing both in time and in reclaiming the right pages when
> > > > simply removing congestion_wait() from the direct reclaim paths (the
> > > > one in __alloc_pages_slowpath and the other one in
> > > > do_try_to_free_pages).
> > > >
> > > > So just being stupid and waiting for the timeout in direct reclaim
> > > > while kswapd can make progress seemed to do a better job for that
> > > > load.
> > > >
> > > > I can not exactly pinpoint the reason for that behaviour, it would be
> > > > nice if somebody had an idea.
> > > >
> > >
> > > There is a possibility that the behaviour in that case was due to flusher
> > > threads doing the writes rather than direct reclaim queueing pages for IO
> > > in an inefficient manner. So the stall is stupid but happens to work out
> > > well because flusher threads get the chance to do work.
> >
> > The workload was accessing a large sparse-file through mmap, so there
> > wasn't much IO in the first place.
> >
>
> Then waiting on congestion was the totally wrong thing to do. We were
> effectively calling sleep(HZ/10) and magically this was helping in some
> undefined manner. Do you know *which* called of congestion_wait() was
> the most important to you?
Removing congestion_wait() in do_try_to_free_pages() definitely
worsens reclaim behaviour for this workload:
1. wallclock time of the testrun increases by 11%
2. the scanners do a worse job and go for the wrong zone:
-pgalloc_dma 79597
-pgalloc_dma32 134465902
+pgalloc_dma 297089
+pgalloc_dma32 134247237
-pgsteal_dma 77501
-pgsteal_dma32 133939446
+pgsteal_dma 294998
+pgsteal_dma32 133722312
-pgscan_kswapd_dma 145897
-pgscan_kswapd_dma32 266141381
+pgscan_kswapd_dma 287981
+pgscan_kswapd_dma32 186647637
-pgscan_direct_dma 9666
-pgscan_direct_dma32 1758655
+pgscan_direct_dma 302495
+pgscan_direct_dma32 80947179
-pageoutrun 1768531
-allocstall 614
+pageoutrun 1927451
+allocstall 8566
I attached the full vmstat contents below. Also the test program,
which I ran in this case as: ./mapped-file-stream 1 $((512 << 30))
> > > > So personally I think it's a good idea to get an insight on the use of
> > > > congestion_wait() [patch 1] but I don't agree with changing its
> > > > behaviour just yet, or judging its usefulness solely on whether it
> > > > correctly waits for bdi congestion.
> > > >
> > >
> > > Unfortunately, I strongly suspect that some of the desktop stalls seen during
> > > IO (one of which involved no writes) were due to calling congestion_wait
> > > and waiting the full timeout where no writes are going on.
> >
> > Oh, I am in full agreement here! Removing those congestion_wait() as
> > described above showed a reduction in peak latency. The dilemma is
> > only that it increased the overall walltime of the load.
> >
>
> Do you know why because leaving in random sleeps() hardly seems to be
> the right approach?
I am still trying to find out what's going wrong.
> > And the scanning behaviour deteriorated, as in having increased
> > scanning pressure on other zones than the unpatched kernel did.
> >
>
> Probably because it was scanning more but not finding what it needed.
> There is a condition other than congestion it is having trouble with. In
> some respects, I think if we change congestion_wait() as I propose,
> we may see a case where CPU usage is higher because it's now
> encountering the unspecified reclaim problem we have.
Exactly.
> > So I think very much that we need a fix. congestion_wait() causes
> > stalls and relying on random sleeps for the current reclaim behaviour
> > can not be the solution, at all.
> >
> > I just don't think we can remove it based on the argument that it
> > doesn't do what it is supposed to do, when it does other things right
> > that it is not supposed to do ;-)
> >
>
> We are not removing it, we are just stopping it going to sleep for
> stupid reasons. If we find that wall time is increasing as a result, we
> have a path to figuring out what the real underlying problem is instead
> of sweeping it under the rug.
Well, for that testcase it is in effect the same as a removal as
there's never congestion.
But again: I agree with your changes per-se, I just don't think they
should get merged as long as they knowingly catalyze a problem that
has yet to be identified.
[-- Attachment #2: mapped-file-stream.c --]
[-- Type: text/plain, Size: 1885 bytes --]
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <limits.h>
#include <signal.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
static int start_process(unsigned long nr_bytes)
{
char filename[] = "/tmp/clog-XXXXXX";
unsigned long i;
char *map;
int fd;
fd = mkstemp(filename);
unlink(filename);
if (fd == -1) {
perror("mkstemp()");
return -1;
}
if (ftruncate(fd, nr_bytes)) {
perror("ftruncate()");
return -1;
}
map = mmap(NULL, nr_bytes, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
perror("mmap()");
return -1;
}
if (madvise(map, nr_bytes, MADV_RANDOM)) {
perror("madvise()");
return -1;
}
kill(getpid(), SIGSTOP);
for (i = 0; i < nr_bytes; i += 4096)
((volatile char *)map)[i];
close(fd);
return 0;
}
static int do_test(unsigned long nr_procs, unsigned long nr_bytes)
{
pid_t procs[nr_procs];
unsigned long i;
int dummy;
for (i = 0; i < nr_procs; i++) {
switch ((procs[i] = fork())) {
case -1:
kill(0, SIGKILL);
perror("fork()");
return -1;
case 0:
return start_process(nr_bytes);
default:
waitpid(procs[i], &dummy, WUNTRACED);
break;
}
}
kill(0, SIGCONT);
for (i = 0; i < nr_procs; i++)
waitpid(procs[i], &dummy, 0);
return 0;
}
static int xstrtoul(const char *str, unsigned long *valuep)
{
unsigned long value;
char *endp;
value = strtoul(str, &endp, 0);
if (*endp || (value == ULONG_MAX && errno == ERANGE))
return -1;
*valuep = value;
return 0;
}
int main(int ac, char **av)
{
unsigned long nr_procs, nr_bytes;
if (ac != 3)
goto usage;
if (xstrtoul(av[1], &nr_procs))
goto usage;
if (xstrtoul(av[2], &nr_bytes))
goto usage;
setbuf(stdout, NULL);
setbuf(stderr, NULL);
return !!do_test(nr_procs, nr_bytes);
usage:
fprintf(stderr, "usage: %s nr_procs nr_bytes\n", av[0]);
return 1;
}
[-- Attachment #3: vmstat.a.2 --]
[-- Type: application/x-troff-man, Size: 1794 bytes --]
[-- Attachment #4: vmstat.b.2 --]
[-- Type: application/x-troff-man, Size: 1803 bytes --]
next prev parent reply other threads:[~2010-08-30 13:19 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-26 15:14 [RFC PATCH 0/3] Do not wait the full timeout on congestion_wait when there is no congestion Mel Gorman
2010-08-26 15:14 ` Mel Gorman
2010-08-26 15:14 ` [PATCH 1/3] writeback: Account for time spent congestion_waited Mel Gorman
2010-08-26 15:14 ` Mel Gorman
2010-08-26 17:23 ` Minchan Kim
2010-08-26 17:23 ` Minchan Kim
2010-08-26 18:10 ` Johannes Weiner
2010-08-26 18:10 ` Johannes Weiner
2010-08-26 15:14 ` [PATCH 2/3] writeback: Record if the congestion was unnecessary Mel Gorman
2010-08-26 15:14 ` Mel Gorman
2010-08-26 17:35 ` Minchan Kim
2010-08-26 17:35 ` Minchan Kim
2010-08-26 17:41 ` Mel Gorman
2010-08-26 17:41 ` Mel Gorman
2010-08-26 18:29 ` Johannes Weiner
2010-08-26 18:29 ` Johannes Weiner
2010-08-26 20:31 ` Mel Gorman
2010-08-26 20:31 ` Mel Gorman
2010-08-27 2:12 ` Shaohua Li
2010-08-27 2:12 ` Shaohua Li
2010-08-27 9:20 ` Mel Gorman
2010-08-27 9:20 ` Mel Gorman
2010-08-27 8:16 ` Johannes Weiner
2010-08-27 8:16 ` Johannes Weiner
2010-08-27 9:24 ` Mel Gorman
2010-08-27 9:24 ` Mel Gorman
2010-08-30 13:19 ` Johannes Weiner [this message]
2010-08-31 15:02 ` Mel Gorman
2010-08-31 15:02 ` Mel Gorman
2010-09-02 15:49 ` Johannes Weiner
2010-09-02 15:49 ` Johannes Weiner
2010-09-02 18:28 ` Mel Gorman
2010-09-02 18:28 ` Mel Gorman
2010-08-29 16:03 ` Minchan Kim
2010-08-29 16:03 ` Minchan Kim
2010-08-26 15:14 ` [PATCH 3/3] writeback: Do not congestion sleep when there are no congested BDIs Mel Gorman
2010-08-26 15:14 ` Mel Gorman
2010-08-26 17:38 ` Minchan Kim
2010-08-26 17:38 ` Minchan Kim
2010-08-26 17:42 ` Mel Gorman
2010-08-26 17:42 ` Mel Gorman
2010-08-26 18:17 ` Johannes Weiner
2010-08-26 18:17 ` Johannes Weiner
2010-08-26 20:23 ` Mel Gorman
2010-08-26 20:23 ` Mel Gorman
2010-08-27 1:11 ` Wu Fengguang
2010-08-27 1:11 ` Wu Fengguang
2010-08-27 9:34 ` Mel Gorman
2010-08-27 9:34 ` Mel Gorman
2010-08-27 1:42 ` Wu Fengguang
2010-08-27 1:42 ` Wu Fengguang
2010-08-27 9:37 ` Mel Gorman
2010-08-27 9:37 ` Mel Gorman
2010-08-27 5:13 ` Dave Chinner
2010-08-27 5:13 ` Dave Chinner
2010-08-27 9:33 ` Mel Gorman
2010-08-27 9:33 ` Mel Gorman
2010-08-26 17:20 ` [RFC PATCH 0/3] Do not wait the full timeout on congestion_wait when there is no congestion Minchan Kim
2010-08-26 17:20 ` Minchan Kim
2010-08-26 17:31 ` Mel Gorman
2010-08-26 17:31 ` Mel Gorman
2010-08-26 17:50 ` Minchan Kim
2010-08-26 17:50 ` Minchan Kim
2010-08-27 1:21 ` Wu Fengguang
2010-08-27 1:21 ` Wu Fengguang
2010-08-27 1:41 ` Minchan Kim
2010-08-27 1:41 ` Minchan Kim
2010-08-27 1:50 ` Wu Fengguang
2010-08-27 1:50 ` Wu Fengguang
2010-08-27 2:02 ` Minchan Kim
2010-08-27 2:02 ` Minchan Kim
2010-08-27 4:34 ` Wu Fengguang
2010-08-27 4:34 ` Wu Fengguang
2010-08-27 9:38 ` Mel Gorman
2010-08-27 9:38 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100830131929.GA28652@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=ehrhardt@linux.vnet.ibm.com \
--cc=fengguang.wu@intel.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.