linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: jack@suse.cz (Jan Kara)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] bounce:fix bug, avoid to flush dcache on slab page from jbd2.
Date: Wed, 13 Mar 2013 22:02:16 +0100	[thread overview]
Message-ID: <20130313210216.GA7754@quack.suse.cz> (raw)
In-Reply-To: <20130313194429.GE5313@blackbox.djwong.org>

On Wed 13-03-13 12:44:29, Darrick J. Wong wrote:
> On Wed, Mar 13, 2013 at 09:50:21AM +0100, Jan Kara wrote:
> > On Tue 12-03-13 18:10:20, Darrick J. Wong wrote:
> > > On Tue, Mar 12, 2013 at 03:32:21PM -0700, Andrew Morton wrote:
> > > > On Fri, 08 Mar 2013 20:37:36 +0800 Shuge <shugelinux@gmail.com> wrote:
> > > > 
> > > > > The bounce accept slab pages from jbd2, and flush dcache on them.
> > > > > When enabling VM_DEBUG, it will tigger VM_BUG_ON in page_mapping().
> > > > > So, check PageSlab to avoid it in __blk_queue_bounce().
> > > > > 
> > > > > Bug URL: http://lkml.org/lkml/2013/3/7/56
> > > > > 
> > > > > ...
> > > > >
> > > > > --- a/mm/bounce.c
> > > > > +++ b/mm/bounce.c
> > > > > @@ -214,7 +214,8 @@ static void __blk_queue_bounce(struct request_queue 
> > > > > *q, struct bio **bio_orig,
> > > > >   		if (rw == WRITE) {
> > > > >   			char *vto, *vfrom;
> > > > >   -			flush_dcache_page(from->bv_page);
> > > > > +			if (unlikely(!PageSlab(from->bv_page)))
> > > > > +				flush_dcache_page(from->bv_page);
> > > > >   			vto = page_address(to->bv_page) + to->bv_offset;
> > > > >   			vfrom = kmap(from->bv_page) + from->bv_offset;
> > > > >   			memcpy(vto, vfrom, to->bv_len);
> > > > 
> > > > I guess this is triggered by Catalin's f1a0c4aa0937975b ("arm64: Cache
> > > > maintenance routines"), which added a page_mapping() call to arm64's
> > > > arch/arm64/mm/flush.c:flush_dcache_page().
> > > > 
> > > > What's happening is that jbd2 is using kmalloc() to allocate buffer_head
> > > > data.  That gets submitted down the BIO layer and __blk_queue_bounce()
> > > > calls flush_dcache_page() which in the arm64 case calls page_mapping()
> > > > and page_mapping() does VM_BUG_ON(PageSlab) and splat.
> > > > 
> > > > The unusual thing about all of this is that the payload for some disk
> > > > IO is coming from kmalloc, rather than being a user page.  It's oddball
> > > > but we've done this for ages and should continue to support it.
> > > > 
> > > > 
> > > > Now, the page from kmalloc() cannot be in highmem, so why did the
> > > > bounce code decide to bounce it?
> > > > 
> > > > __blk_queue_bounce() does
> > > > 
> > > > 		/*
> > > > 		 * is destination page below bounce pfn?
> > > > 		 */
> > > > 		if (page_to_pfn(page) <= queue_bounce_pfn(q) && !force)
> > > > 			continue;
> > > > 
> > > > and `force' comes from must_snapshot_stable_pages().  But
> > > > must_snapshot_stable_pages() must have returned false, because if it
> > > > had returned true then it would have been must_snapshot_stable_pages()
> > > > which went BUG, because must_snapshot_stable_pages() calls page_mapping().
> > > > 
> > > > So my tentative diagosis is that arm64 is fishy.  A page which was
> > > > allocated via jbd2_alloc(GFP_NOFS)->kmem_cache_alloc() ended up being
> > > > above arm64's queue_bounce_pfn().  Can you please do a bit of
> > > > investigation to work out if this is what is happening?  Find out why
> > > > __blk_queue_bounce() decided to bounce a page which shouldn't have been
> > > > bounced?
> > > 
> > > That sure is strange.  I didn't see any obvious reasons why we'd end up with a
> > > kmalloc above queue_bounce_pfn().  But then I don't have any arm64s either.
> > > 
> > > > This is all terribly fragile :( afaict if someone sets
> > > > bdi_cap_stable_pages_required() against that jbd2 queue, we're going to
> > > > hit that BUG_ON() again, via must_snapshot_stable_pages()'s
> > > > page_mapping() call.  (Darrick, this means you ;))
> > > 
> > > Wheeee.  You're right, we shouldn't be calling page_mapping on slab pages.
> > > We can keep walking the bio segments to find a non-slab page that can tell us
> > > MS_SNAP_STABLE is set, since we probably won't need the bounce buffer anyway.
> > > 
> > > How does something like this look?  (+ the patch above)
> >   Umm, this won't quite work. We can have a bio which has just PageSlab
> > page attached and so you won't be able to get to the superblock. Heh, isn't
> > the whole page_mapping() thing in must_snapshot_stable_pages() wrong? When we
> > do direct IO, these pages come directly from userspace and hell knows where
> > they come from. Definitely their page_mapping() doesn't give us anything
> > useful... Sorry for not realizing this earlier when reviewing the patch.
> > 
> > ... remembering why we need to get to sb and why ext3 needs this ... So
> > maybe a better solution would be to have a bio flag meaning that pages need
> > bouncing? And we would set it from filesystems that need it - in case of
> > ext3 only writeback of data from kjournald actually needs to bounce the
> > pages. Thoughts?
> 
> What about dirty pages that don't result in journal transactions?  I think
> ext3_sync_file() eventually calls ext3_ordered_writepage, which then calls
> __block_write_full_page, which in turn calls submit_bh().
  So here we have two options:
Either we let ext3 wait the same way as other filesystems when stable pages
are required. Then only data IO from kjournald needs to be bounced (all
other IO is properly protected by PageWriteback bit).

Or we won't let ext3 wait (as it is now), keep the superblock flag that fs
needs bouncing, and set the bio flag in __block_write_full_page() and
kjournald based on the sb flag.

I think the first option is slightly better but I don't feel strongly
about that.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2013-03-13 21:02 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <5139DB90.5090302@gmail.com>
2013-03-12 22:32 ` [PATCH] bounce:fix bug, avoid to flush dcache on slab page from jbd2 Andrew Morton
2013-03-13  1:10   ` Darrick J. Wong
2013-03-13  3:35     ` Shuge
2013-03-13  4:11       ` Andrew Morton
2013-03-13  9:42         ` Russell King - ARM Linux
2013-03-13  8:50     ` Jan Kara
2013-03-13 19:44       ` Darrick J. Wong
2013-03-13 21:02         ` Jan Kara [this message]
2013-03-14 22:42           ` Darrick J. Wong
2013-03-14 23:01             ` Andrew Morton
2013-03-15 10:01             ` Jan Kara
2013-03-15 17:54               ` Darrick J. Wong
2013-03-18 17:32                 ` Jan Kara
2013-03-15 23:28               ` [PATCH] mm: Make snapshotting pages for stable writes a per-bio operation Darrick J. Wong
2013-03-18 17:41                 ` Jan Kara
2013-03-18 23:01                   ` Darrick J. Wong
2013-03-18 23:02                   ` [PATCH v3] " Darrick J. Wong
2013-03-19  8:54                     ` Jan Kara
2013-04-02 17:01                     ` Darrick J. Wong
2013-04-03 14:20                       ` Mel Gorman
2013-04-03 14:42                         ` Jan Kara
2013-04-09 18:03                           ` Darrick J. Wong
2013-03-14 22:46           ` [PATCH] bounce:fix bug, avoid to flush dcache on slab page from jbd2 Andrew Morton
2013-03-14 23:27             ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130313210216.GA7754@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).