From: Jan Kara <jack@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel@kernel.org, npiggin@suse.de, david@fromorbit.com,
linux-mm@kvack.org
Subject: Re: [PATCH 2/2] mm: Implement writeback livelock avoidance using page tagging
Date: Thu, 10 Jun 2010 14:31:03 +0200 [thread overview]
Message-ID: <20100610123103.GD10827@quack.suse.cz> (raw)
In-Reply-To: <20100609164115.22723403.akpm@linux-foundation.org>
On Wed 09-06-10 16:41:15, Andrew Morton wrote:
> On Fri, 4 Jun 2010 20:40:54 +0200
> Jan Kara <jack@suse.cz> wrote:
> > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> > index efdfb07..f7ebff8 100644
> > --- a/include/linux/radix-tree.h
> > +++ b/include/linux/radix-tree.h
> > @@ -55,7 +55,7 @@ static inline int radix_tree_is_indirect_ptr(void *ptr)
> >
> > /*** radix-tree API starts here ***/
> >
> > -#define RADIX_TREE_MAX_TAGS 2
> > +#define RADIX_TREE_MAX_TAGS 3
> >
> > /* root tags are stored in gfp_mask, shifted by __GFP_BITS_SHIFT */
> > struct radix_tree_root {
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index b289310..f590a12 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -807,6 +807,30 @@ void __init page_writeback_init(void)
> > }
> >
> > /**
> > + * tag_pages_for_writeback - tag pages to be written by write_cache_pages
> > + * @mapping: address space structure to write
> > + * @start: starting page index
> > + * @end: ending page index (inclusive)
> > + *
> > + * This function scans the page range from @start to @end
>
> I'd add "inclusive" here as well. Add it everywhere, very carefully
> (or "exclusive"). it really is a minefield and we've had off-by-ones
> from this before
Done.
> > and tags all pages
> > + * that have DIRTY tag set with a special TOWRITE tag. The idea is that
> > + * write_cache_pages (or whoever calls this function) will then use TOWRITE tag
> > + * to identify pages eligible for writeback. This mechanism is used to avoid
> > + * livelocking of writeback by a process steadily creating new dirty pages in
> > + * the file (thus it is important for this function to be damn quick so that it
> > + * can tag pages faster than a dirtying process can create them).
> > + */
> > +void tag_pages_for_writeback(struct address_space *mapping,
> > + pgoff_t start, pgoff_t end)
> > +{
> > + spin_lock_irq(&mapping->tree_lock);
> > + radix_tree_gang_tag_if_tagged(&mapping->page_tree, start, end,
> > + PAGECACHE_TAG_DIRTY, PAGECACHE_TAG_TOWRITE);
> > + spin_unlock_irq(&mapping->tree_lock);
> > +}
> > +EXPORT_SYMBOL(tag_pages_for_writeback);
>
> Problem. For how long can this disable interrupts? Maybe 1TB of dirty
> pagecache before the NMI watchdog starts getting involved? Could be a
> problem in some situations.
>
> Easy enough to fix - just walk the range in 1000(?) page hunks,
> dropping the lock and doing cond_resched() each time.
> radix_tree_gang_tag_if_tagged() will need to return next_index to make
> that efficient with sparse files.
Yes, Nick had a similar objection. I've already fixed it in my tree the
way you suggest.
> > +/**
> > * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
> > * @mapping: address space structure to write
> > * @wbc: subtract the number of written pages from *@wbc->nr_to_write
> > @@ -820,6 +844,13 @@ void __init page_writeback_init(void)
> > * the call was made get new I/O started against them. If wbc->sync_mode is
> > * WB_SYNC_ALL then we were called for data integrity and we must wait for
> > * existing IO to complete.
> > + *
> > + * To avoid livelocks (when other process dirties new pages), we first tag
> > + * pages which should be written back with TOWRITE tag and only then start
> > + * writing them. For data-integrity sync we have to be careful so that we do
> > + * not miss some pages (e.g., because some other process has cleared TOWRITE
> > + * tag we set). The rule we follow is that TOWRITE tag can be cleared only
> > + * by the process clearing the DIRTY tag (and submitting the page for IO).
> > */
>
> Seems simple enough and I can't think of any bugs which the obvious
> races will cause.
Thanks for looking into the patches!
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-06-10 12:31 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-04 18:40 [PATCH 0/2 RFC v3] Livelock avoidance for data integrity writeback Jan Kara
2010-06-04 18:40 ` [PATCH 1/2] radix-tree: Implement function radix_tree_gang_tag_if_tagged Jan Kara
2010-06-09 23:30 ` Andrew Morton
2010-06-10 12:28 ` Jan Kara
2010-06-04 18:40 ` [PATCH 2/2] mm: Implement writeback livelock avoidance using page tagging Jan Kara
2010-06-09 23:41 ` Andrew Morton
2010-06-10 12:31 ` Jan Kara [this message]
2010-06-09 23:45 ` Andrew Morton
2010-06-10 12:42 ` Jan Kara
-- strict thread matches above, loose matches on Subject: below --
2010-06-04 18:47 [PATCH 0/2 RFC v3] Livelock avoidance for data integrity writeback Jan Kara
2010-06-04 18:47 ` [PATCH 2/2] mm: Implement writeback livelock avoidance using page tagging Jan Kara
2010-06-05 1:38 ` Nick Piggin
2010-06-07 16:09 ` Jan Kara
2010-06-08 5:29 ` Nick Piggin
2010-06-09 13:04 ` Jan Kara
2010-06-10 8:12 ` Jan Kara
2010-08-12 18:35 ` Christoph Hellwig
2010-08-12 22:28 ` Jan Kara
2010-08-13 7:50 ` Christoph Hellwig
2010-06-16 16:33 (unknown), Jan Kara
2010-06-16 16:33 ` [PATCH 2/2] mm: Implement writeback livelock avoidance using page tagging Jan Kara
2010-06-18 22:21 ` Andrew Morton
2010-06-21 12:42 ` Jan Kara
[not found] <1277387867-5525-1-git-send-email-jack@suse.cz>
2010-06-24 13:57 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100610123103.GD10827@quack.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.