From: Ying Han <yinghan@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>, Jan Kara <jack@suse.cz>,
"Martin J. Bligh" <mbligh@mbligh.org>,
linux-ext4@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
guichaz@gmail.com, Alex Khesin <alexk@google.com>,
Mike Waychison <mikew@google.com>,
Rohit Seth <rohitseth@google.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: ftruncate-mmap: pages are lost after writing to mmaped file.
Date: Wed, 25 Mar 2009 17:03:58 -0700 [thread overview]
Message-ID: <604427e00903251703s62a62e7fkc81719503228626a@mail.gmail.com> (raw)
In-Reply-To: <20090324033204.64f3da9d.akpm@linux-foundation.org>
On Tue, Mar 24, 2009 at 3:32 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 24 Mar 2009 18:44:21 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>> On Friday 20 March 2009 03:46:39 Jan Kara wrote:
>> > On Fri 20-03-09 02:48:21, Nick Piggin wrote:
>>
>> > > Holding mapping->private_lock over the __set_page_dirty should
>> > > fix it, although I guess you'd want to release it before calling
>> > > __mark_inode_dirty so as not to put inode_lock under there. I
>> > > have a patch for this if it sounds reasonable.
>> >
>> > Yes, that seems to be a bug - the function actually looked suspitious to
>> > me yesterday but I somehow convinced myself that it's fine. Probably
>> > because fsx-linux is single-threaded.
>>
>>
>> After a whole lot of chasing my own tail in the VM and buffer layers,
>> I think it is a problem in ext2 (and I haven't been able to reproduce
>> with ext3 yet, which might lend weight to that, although as we have
>> seen, it is very timing dependent).
>>
>> That would be slightly unfortunate because we still have Jan's ext3
>> problem, and also another reported problem of corruption on ext3 (on
>> brd driver).
>>
>> Anyway, when I have reproduced the problem with the test case, the
>> "lost" writes are all reported to be holes. Unfortunately, that doesn't
>> point straight to the filesystem, because ext2 allocates blocks in this
>> case at writeout time, so if dirty bits are getting lost, then it would
>> be normal to see holes.
>>
>> I then put in a whole lot of extra infrastructure to track metadata about
>> each struct page (when it was last written out, when it last had the number
>> of writable ptes reach 0, when the dirty bits were last cleared etc). And
>> none of the normal asertions were triggering: eg. when any page is removed
>> from pagecache (except truncates), it has always had all its buffers
>> written out *after* all ptes were made readonly or unmapped. Lots of other
>> tests and crap like that.
>>
>> So I tried what I should have done to start with and did an e2fsck after
>> seeing corruption. Yes, it comes up with errors.
>
> Do you recall what the errors were?
I run e2fsck on the partition after the failure happened and here is
what i saw, not sure if that is the same message Jan looked at:
e2fsck 1.41.3 (12-Oct-2008)
Warning! /dev/hda1 is mounted.
/dev/hda1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: +74915 -195111 -224680
Fix? no
Free blocks count wrong for group #6 (170, counted=169).
Fix? no
Free blocks count wrong (10120, counted=523).
Fix? no
Free inodes count wrong (95678, counted=95672).
Fix? no
/dev/hda1: ********** WARNING: Filesystem still has errors **********
/dev/hda1: 35938/131616 files (1.5% non-contiguous), 252936/263056 blocks
--Ying
>
>> Now that is unusual
>> because that should be largely insulated from the vm: if a dirty bit gets
>> lost, then the filesystem image should be quite happy and error-free with
>> a hole or unwritten data there.
>>
>> I don't know ext? locking very well, except that it looks pretty overly
>> complex and crufty.
>>
>> Usually, blocks are instantiated by write(2), under i_mutex, serialising
>> the allocator somewhat. mmap-write blocks are instantiated at writeout
>> time, unserialised. I moved truncate_mutex to cover the entire get_blocks
>> function, and can no longer trigger the problem. Might be a timing issue
>> though -- Ying, can you try this and see if you can still reproduce?
>>
>> I close my eyes and pick something out of a hat. a686cd89. Search for XXX.
>> Nice. Whether or not this cased the problem, can someone please tell me
>> why it got merged in that state?
>>
>> I'm leaving ext3 running for now. It looks like a nasty task to bisect
>> ext2 down to that commit :( and I would be more interested in trying to
>> reproduce Jan's ext3 problem, however, because I'm not too interested in
>> diving into ext2 locking to work out exactly what is racing and how to
>> fix it properly. I suspect it would be most productive to wire up some
>> ioctls right into the block allocator/lookup and code up a userspace
>> tester for it that could probably stress it a lot harder than kernel
>> writeout can.
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-03-25 23:21 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-18 19:44 ftruncate-mmap: pages are lost after writing to mmaped file Ying Han
2009-03-18 22:11 ` Andrew Morton
2009-03-18 22:40 ` Linus Torvalds
2009-03-18 23:18 ` Ying Han
2009-03-18 23:36 ` Linus Torvalds
2009-03-18 23:54 ` Ying Han
2009-03-19 15:48 ` Nick Piggin
2009-03-19 16:16 ` Peter Zijlstra
2009-03-19 16:36 ` Nick Piggin
2009-03-19 16:20 ` Linus Torvalds
2009-03-19 16:34 ` Nick Piggin
2009-03-19 16:51 ` Linus Torvalds
2009-03-19 17:03 ` Jan Kara
2009-03-19 17:06 ` Jan Kara
2009-03-19 20:05 ` Linus Torvalds
2009-03-19 20:21 ` Linus Torvalds
2009-03-19 21:17 ` Ying Han
2009-03-19 22:16 ` Jan Kara
2009-03-19 16:46 ` Jan Kara
2009-03-24 7:44 ` Nick Piggin
2009-03-24 10:27 ` Nick Piggin
2009-03-24 10:32 ` Andrew Morton
2009-03-24 15:35 ` Nick Piggin
2009-03-26 18:29 ` Jan Kara
2009-03-26 0:03 ` Ying Han [this message]
2009-03-24 12:39 ` Jan Kara
2009-03-24 12:55 ` Jan Kara
2009-03-24 13:26 ` Jan Kara
2009-03-24 14:01 ` Chris Mason
2009-03-24 14:07 ` Jan Kara
2009-03-26 8:18 ` Aneesh Kumar K.V
2009-03-24 14:30 ` Nick Piggin
2009-03-24 14:47 ` Jan Kara
2009-03-24 14:56 ` Peter Zijlstra
2009-03-24 15:29 ` Jan Kara
2009-03-24 20:14 ` OGAWA Hirofumi
2009-03-26 8:47 ` Aneesh Kumar K.V
2009-03-26 11:37 ` Jan Kara
2009-03-26 23:02 ` Linus Torvalds
2009-03-24 15:03 ` Nick Piggin
2009-03-24 15:48 ` Jan Kara
2009-03-24 17:35 ` Jan Kara
2009-04-01 22:36 ` Ying Han
2009-04-02 10:11 ` Jan Kara
2009-04-02 11:24 ` Nick Piggin
2009-04-02 11:34 ` Jan Kara
2009-04-02 15:51 ` Nick Piggin
2009-04-02 17:44 ` Ying Han
2009-04-02 22:52 ` Ying Han
2009-04-02 23:39 ` Jan Kara
2009-04-03 0:25 ` Ying Han
2009-04-03 1:29 ` Ying Han
2009-04-03 9:41 ` Jan Kara
2009-04-03 21:34 ` Ying Han
2009-04-03 0:13 ` Ying Han
2009-03-27 20:35 ` Ying Han
2009-03-20 0:34 ` Ying Han
2009-03-20 0:49 ` Linus Torvalds
2009-03-20 7:00 ` Ying Han
2009-03-25 23:15 ` Ying Han
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=604427e00903251703s62a62e7fkc81719503228626a@mail.gmail.com \
--to=yinghan@google.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=alexk@google.com \
--cc=guichaz@gmail.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mbligh@mbligh.org \
--cc=mikew@google.com \
--cc=nickpiggin@yahoo.com.au \
--cc=rohitseth@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).