From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mel@linux.vnet.ibm.com>,
Dave Chinner <david@fromorbit.com>,
Rik van Riel <riel@redhat.com>, Mel Gorman <mel@csn.ul.ie>,
Itaru Kitayama <kitayama@cl.bb4u.ne.jp>,
Minchan Kim <minchan.kim@gmail.com>,
Linux Memory Management List <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/6] writeback: sync expired inodes first in background writeback
Date: Tue, 26 Apr 2011 21:51:30 +0800 [thread overview]
Message-ID: <20110426135130.GA5719@localhost> (raw)
In-Reply-To: <20110426121751.GB5114@quack.suse.cz>
On Tue, Apr 26, 2011 at 08:17:51PM +0800, Jan Kara wrote:
> On Sun 24-04-11 11:15:31, Wu Fengguang wrote:
> > > One of the many requirements for writeback is that if userspace is
> > > continually dirtying pages in a particular file, that shouldn't cause
> > > the kupdate function to concentrate on that file's newly-dirtied pages,
> > > neglecting pages from other files which were less-recently dirtied.
> > > (and dirty nodes, etc).
> >
> > Sadly I do find the old pages that the flusher never get a chance to
> > catch and write them out.
> What kind of load do you use?
Sorry I was just thinking about it and then got a _theoretic_ case.
> > In the below case, if the task dirties pages fast enough at the end of
> > file, writeback_index will never get a chance to wrap back. There may
> > be various variations of this case.
> >
> > file head
> > [ *** ==>***************]==>
> > old pages writeback_index fresh dirties
> >
> > Ironically the current kernel relies on pageout() to catch these
> > old pages, which is not only inefficient, but also not reliable.
> > If a full LRU walk takes an hour, the old pages may stay dirtied
> > for an hour.
> Well, the kupdate behavior has always been just a best-effort thing. We
> always tried to handle well common cases but didn't try to solve all of
> them. Unless we want to track dirty-age of every page (which we don't
> want because it's too expensive), there is really no way to make syncing
> of old pages 100% working for all the cases unless we do data-integrity
> type of writeback for the whole inode - but that could create new problems
> with stalling other files for too long I suspect.
Yeah, it's a hard problem in general. The flusher works naturally in
the coarse way..
> > We may have to do (conditional) tagged ->writepages to safeguard users
> > from losing data he'd expect to be written hours ago.
> Well, if the file is continuously written (and in your case it must be
> even continuosly grown) I'd be content if we handle well the common case of
> linear append (that happens for log files etc.). If we can do well for more
> cases, even better but I'd be cautious not to disrupt some other more
> common cases.
I scratched a patch (totally untested) which will guarantee any kind
of starvation inside an inode. Will this be too overweight?
Thanks,
Fengguang
---
Subject: writeback: livelock prevention inside actively dirtied files
Date: Tue Apr 26 21:35:47 CST 2011
- refresh dirtied_when on every full writeback_index cycle
(pages may be skipped on SYNC_NONE, but as long as they are retried in
next cycle..)
- do tagged sync when writeback_index not cycled for too long time
(the arbitrarily 60s may lead to more page tagging overheads in
"large dirty threshold but slow storage" system..)
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
fs/fs-writeback.c | 1 +
include/linux/fs.h | 1 +
include/linux/pagemap.h | 16 ++++++++++++++++
mm/page-writeback.c | 24 ++++++++++++++++++------
4 files changed, 36 insertions(+), 6 deletions(-)
--- linux-next.orig/fs/fs-writeback.c 2011-04-26 21:26:28.000000000 +0800
+++ linux-next/fs/fs-writeback.c 2011-04-26 21:26:39.000000000 +0800
@@ -1110,6 +1110,7 @@ void __mark_inode_dirty(struct inode *in
spin_unlock(&inode->i_lock);
spin_lock(&bdi->wb.list_lock);
inode->dirtied_when = jiffies;
+ inode->i_mapping->writeback_cycle_time = jiffies;
list_move(&inode->i_wb_list, &bdi->wb.b_dirty);
spin_unlock(&bdi->wb.list_lock);
--- linux-next.orig/include/linux/fs.h 2011-04-26 21:26:28.000000000 +0800
+++ linux-next/include/linux/fs.h 2011-04-26 21:26:39.000000000 +0800
@@ -639,6 +639,7 @@ struct address_space {
unsigned int truncate_count; /* Cover race condition with truncate */
unsigned long nrpages; /* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
+ unsigned long writeback_cycle_time;
const struct address_space_operations *a_ops; /* methods */
unsigned long flags; /* error bits/gfp mask */
struct backing_dev_info *backing_dev_info; /* device readahead, etc */
--- linux-next.orig/mm/page-writeback.c 2011-04-26 21:26:28.000000000 +0800
+++ linux-next/mm/page-writeback.c 2011-04-26 21:33:47.000000000 +0800
@@ -835,6 +835,9 @@ void tag_pages_for_writeback(struct addr
cond_resched();
/* We check 'start' to handle wrapping when end == ~0UL */
} while (tagged >= WRITEBACK_TAG_BATCH && start);
+
+ mapping_set_tagged_sync(mapping);
+ mapping->writeback_cycle_time = jiffies;
}
EXPORT_SYMBOL(tag_pages_for_writeback);
@@ -872,7 +875,7 @@ int write_cache_pages(struct address_spa
pgoff_t end; /* Inclusive */
pgoff_t done_index;
int range_whole = 0;
- int tag;
+ int tag = PAGECACHE_TAG_DIRTY;
pagevec_init(&pvec, 0);
if (wbc->range_cyclic) {
@@ -884,13 +887,19 @@ int write_cache_pages(struct address_spa
if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
range_whole = 1;
}
- if (wbc->sync_mode == WB_SYNC_ALL)
- tag = PAGECACHE_TAG_TOWRITE;
- else
- tag = PAGECACHE_TAG_DIRTY;
+ if (!index)
+ mapping->writeback_cycle_time = jiffies;
- if (wbc->sync_mode == WB_SYNC_ALL)
+ if (wbc->sync_mode == WB_SYNC_ALL ||
+ (!mapping_tagged_sync(mapping) &&
+ jiffies - mapping->host->dirtied_when > 60 * HZ)) {
tag_pages_for_writeback(mapping, index, end);
+ tag = PAGECACHE_TAG_TOWRITE;
+ }
+
+ if (mapping_tagged_sync(mapping))
+ tag = PAGECACHE_TAG_TOWRITE;
+
done_index = index;
while (!done && (index <= end)) {
int i;
@@ -899,6 +908,9 @@ int write_cache_pages(struct address_spa
min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
if (nr_pages == 0) {
done_index = 0;
+ mapping->dirtied_when = mapping->writeback_cycle_time;
+ if (tag == PAGECACHE_TAG_TOWRITE)
+ mapping_clear_tagged_sync(mapping);
break;
}
--- linux-next.orig/include/linux/pagemap.h 2011-04-26 21:26:28.000000000 +0800
+++ linux-next/include/linux/pagemap.h 2011-04-26 21:46:38.000000000 +0800
@@ -24,6 +24,7 @@ enum mapping_flags {
AS_ENOSPC = __GFP_BITS_SHIFT + 1, /* ENOSPC on async write */
AS_MM_ALL_LOCKS = __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks() */
AS_UNEVICTABLE = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
+ AS_TAGGED_SYNC = __GFP_BITS_SHIFT + 4, /* sync only tagged pages */
};
static inline void mapping_set_error(struct address_space *mapping, int error)
@@ -53,6 +54,21 @@ static inline int mapping_unevictable(st
return !!mapping;
}
+static inline void mapping_set_tagged_sync(struct address_space *mapping)
+{
+ set_bit(AS_TAGGED_SYNC, &mapping->flags);
+}
+
+static inline void mapping_clear_tagged_sync(struct address_space *mapping)
+{
+ clear_bit(AS_TAGGED_SYNC, &mapping->flags);
+}
+
+static inline int mapping_tagged_sync(struct address_space *mapping)
+{
+ return test_bit(AS_TAGGED_SYNC, &mapping->flags);
+}
+
static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
{
return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-04-26 13:51 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-20 8:03 [PATCH 0/6] writeback: moving expire targets for background/kupdate works v2 Wu Fengguang
2011-04-20 8:03 ` [PATCH 1/6] writeback: pass writeback_control down to move_expired_inodes() Wu Fengguang
2011-05-04 11:04 ` Christoph Hellwig
2011-05-04 11:13 ` Wu Fengguang
2011-04-20 8:03 ` [PATCH 2/6] writeback: introduce writeback_control.inodes_cleaned Wu Fengguang
2011-05-04 11:05 ` Christoph Hellwig
2011-05-04 11:11 ` Wu Fengguang
2011-05-04 11:16 ` Christoph Hellwig
2011-05-04 11:32 ` Wu Fengguang
2011-04-20 8:03 ` [PATCH 3/6] writeback: try more writeback as long as something was written Wu Fengguang
2011-04-20 8:03 ` [PATCH 4/6] writeback: the kupdate expire timestamp should be a moving target Wu Fengguang
2011-04-20 8:03 ` [PATCH 5/6] writeback: sync expired inodes first in background writeback Wu Fengguang
2011-04-20 23:40 ` Andrew Morton
2011-04-21 1:14 ` Wu Fengguang
2011-04-21 1:21 ` Wu Fengguang
2011-04-24 3:15 ` Wu Fengguang
2011-04-26 12:17 ` Jan Kara
2011-04-26 13:51 ` Wu Fengguang [this message]
2011-04-26 13:59 ` Wu Fengguang
2011-04-26 14:05 ` Wu Fengguang
2011-04-27 11:15 ` Wu Fengguang
2011-04-20 8:03 ` [PATCH 6/6] writeback: refill b_io iff empty Wu Fengguang
2011-05-04 7:39 ` Wu Fengguang
2011-05-05 16:37 ` Jan Kara
2011-05-05 16:47 ` Wu Fengguang
2011-05-06 5:29 ` Wu Fengguang
2011-05-06 8:42 ` [RFC][PATCH] writeback: limit number of moved inodes in queue_io() Wu Fengguang
2011-05-06 10:06 ` [RFC][PATCH v2] " Wu Fengguang
2011-05-06 23:06 ` Dave Chinner
2011-05-06 14:21 ` [PATCH 6/6] writeback: refill b_io iff empty Jan Kara
2011-05-10 4:31 ` Wu Fengguang
2011-05-10 4:53 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110426135130.GA5719@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=kitayama@cl.bb4u.ne.jp \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mel@linux.vnet.ibm.com \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).