All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "mel@csn.ul.ie" <mel@csn.ul.ie>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"npiggin@suse.de" <npiggin@suse.de>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: what is the point of nr_pages information for the flusher thread?
Date: Sat, 10 Jul 2010 22:58:06 +0800	[thread overview]
Message-ID: <20100710145806.GA6628@localhost> (raw)
In-Reply-To: <20100707231611.GA24281@infradead.org>

Hi Christoph,

Here are some of my findings.

On Thu, Jul 08, 2010 at 07:16:11AM +0800, Christoph Hellwig wrote:
> Currently there's three possible values we pass into the flusher thread
> for the nr_pages arguments:

The current wb_writeback_work.nr_pages parameter semantic is actually quite
different from the _min_pages argument in 2.6.30. Current semantic is
"max pages to write", the old one is "min pages to write (until all written)".

current wb_writeback():

        for (;;) {
                /*
                 * Stop writeback when nr_pages has been consumed
                 */
                if (work->nr_pages <= 0)
                        break;

2.6.30 background_writeout(_min_pages):

                if (global_page_state(NR_FILE_DIRTY) +
                        global_page_state(NR_UNSTABLE_NFS) < background_thresh
                                && min_pages <= 0)
                        break;

>  - in sync_inodes_sb and bdi_start_background_writeback:
> 
> 	LONG_MAX
> 
>  - in writeback_inodes_sb and wb_check_old_data_flush:
> 
> 	global_page_state(NR_FILE_DIRTY) +
> 	global_page_state(NR_UNSTABLE_NFS) +
> 	(inodes_stat.nr_inodes - inodes_stat.nr_unused)
> 
>  - in wakeup_flusher_threads and laptop_mode_timer_fn:
> 
> 	global_page_state(NR_FILE_DIRTY) +
> 	global_page_state(NR_UNSTABLE_NFS)
> 
> The LONG_MAX cases are triviall explained, as we ignore the nr_to_write
> value for data integrity writepage in the lowlevel writeback code, and
> the for_background in bdi_start_background_writeback has it's own check
> for the background threshold.  So far so good, and now it gets
> interesting.

Yeah.
 
> Why does writeback_inodes_sb add the number of used inodes into a value
> that is in units of pages?  And why don't the other callers do this?

The 2.6.30 sync_inodes_sb() has this comment:

 * We add in the number of potentially dirty inodes, because each inode write
 * can dirty pagecache in the underlying blockdev.
 
The periodic writeback also referenced it:

        nr_pages = global_page_state(NR_FILE_DIRTY) +
                        global_page_state(NR_UNSTABLE_NFS) +
                        (inodes_stat.nr_inodes - inodes_stat.nr_unused);

        if (nr_pages) {
                struct wb_writeback_work work = {
                        .nr_pages       = nr_pages,

Here it looks more sane to do

        if (wb_has_dirty_io(wb)) {
                struct wb_writeback_work work = {
                        .nr_pages       = LONG_MAX,


> But seriously, how is the _global_ number of dirty and unstable pages
> a good indicator for the amount of writeback per-bdi or superblock
> anyway?

Good point.
 
> Somehow I'd feel much better about doing this calculation all the way
> down in wb_writeback instead of the callers so we'll at least have
> one documented place for these insanities.

I guess the current "max pages to write" semantic serves as a poor
man's live-lock prevention guard. sync() want that semantic (in this
sense the old "min pages to write" has never worked as expected). When
proper live-lock preventions are ready, this guard will no longer be
necessary.

However the current semantic is not suitable for other users. "To write at most
nr_pages until hitting background dirty threshold" is basically a no-op,
because the callers may as well let the normal background writeback do the job
for them.

For example, laptop_mode_timer_fn() actually want to write the whole world, so
it wants nr_pages=LONG_MAX with the old "min pages to write" semantic.

There are other cases that try to write some pages
- free_more_memory()
- do_try_to_free_pages() 
- ubifs shrink_liability()
- ext4 ext4_nonda_switch()

They don't really know or care about the exact nr_pages to write.
The latter two functions even sync everything for simplicity..

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      parent reply	other threads:[~2010-07-10 14:58 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-07 23:16 what is the point of nr_pages information for the flusher thread? Christoph Hellwig
2010-07-07 23:16 ` Christoph Hellwig
2010-07-07 23:37 ` Andrew Morton
2010-07-07 23:37   ` Andrew Morton
2010-07-07 23:43   ` Christoph Hellwig
2010-07-07 23:43     ` Christoph Hellwig
2010-07-07 23:55     ` Andrew Morton
2010-07-10 14:58 ` Wu Fengguang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100710145806.GA6628@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.