From: Jan Kara <jack@suse.cz>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
"Martin J. Bligh" <mbligh@mbligh.org>,
linux-ext4@vger.kernel.org, Ying Han <yinghan@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
guichaz@gmail.com, Alex Khesin <alexk@google.com>,
Mike Waychison <mikew@google.com>,
Rohit Seth <rohitseth@google.com>
Subject: Re: ftruncate-mmap: pages are lost after writing to mmaped file.
Date: Tue, 24 Mar 2009 16:29:59 +0100 [thread overview]
Message-ID: <20090324152959.GG23439@duck.suse.cz> (raw)
In-Reply-To: <1237906563.24918.184.camel@twins>
On Tue 24-03-09 15:56:03, Peter Zijlstra wrote:
> On Tue, 2009-03-24 at 15:47 +0100, Jan Kara wrote:
> >
> > Or we could implement ext3_mkwrite() to allocate buffers already when we
> > make page writeable. But it costs some performace (we have to write page
> > full of zeros when allocating those buffers, where previously we didn't
> > have to do anything) and it's not trivial to make it work if pagesize >
> > blocksize (we should not allocate buffers outside of i_size so if i_size
> > = 1024, we create just one block in ext3_mkwrite() but then we need to
> > allocate more when we extend the file).
>
> I think this is the best option, failing with SIGBUS when we fail to
> allocate blocks seems consistent with other filesystems as well.
I agree this looks attractive at the first sight. But there are drawbacks
as I wrote - the problem with blocksize < pagesize, slight performance
decrease due to additional write, page faults doing allocation can take a
*long* time and overall fragmentation is going to be higher (previously
writepage wrote pages for us in the right order, now we are going to
allocate in the first-accessed order). So I'm not sure we really want to
go this way.
Hmm, maybe we could play a trick ala delayed allocation - i.e., reserve
some space in mkwrite() but don't actually allocate it. That would be done
in writepage(). This would solve all the problems I describe above. We could
use PG_Checked flag to track that the page has a reservation and behave
accordingly in writepage() / invalidatepage(). ext3 in data=journal mode
already uses the flag but the use seems to be compatible with what I want
to do now... So it may actually work.
BTW: Note that there's a plenty of filesystems that don't implement
mkwrite() (e.g. ext2, UDF, VFAT...) and thus have the same problem with
ENOSPC. So I'd not speak too much about consistency ;).
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
"Martin J. Bligh" <mbligh@mbligh.org>,
linux-ext4@vger.kernel.org, Ying Han <yinghan@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
guichaz@gmail.com, Alex Khesin <alexk@google.com>,
Mike Waychison <mikew@google.com>,
Rohit Seth <rohitseth@google.com>
Subject: Re: ftruncate-mmap: pages are lost after writing to mmaped file.
Date: Tue, 24 Mar 2009 16:29:59 +0100 [thread overview]
Message-ID: <20090324152959.GG23439@duck.suse.cz> (raw)
In-Reply-To: <1237906563.24918.184.camel@twins>
On Tue 24-03-09 15:56:03, Peter Zijlstra wrote:
> On Tue, 2009-03-24 at 15:47 +0100, Jan Kara wrote:
> >
> > Or we could implement ext3_mkwrite() to allocate buffers already when we
> > make page writeable. But it costs some performace (we have to write page
> > full of zeros when allocating those buffers, where previously we didn't
> > have to do anything) and it's not trivial to make it work if pagesize >
> > blocksize (we should not allocate buffers outside of i_size so if i_size
> > = 1024, we create just one block in ext3_mkwrite() but then we need to
> > allocate more when we extend the file).
>
> I think this is the best option, failing with SIGBUS when we fail to
> allocate blocks seems consistent with other filesystems as well.
I agree this looks attractive at the first sight. But there are drawbacks
as I wrote - the problem with blocksize < pagesize, slight performance
decrease due to additional write, page faults doing allocation can take a
*long* time and overall fragmentation is going to be higher (previously
writepage wrote pages for us in the right order, now we are going to
allocate in the first-accessed order). So I'm not sure we really want to
go this way.
Hmm, maybe we could play a trick ala delayed allocation - i.e., reserve
some space in mkwrite() but don't actually allocate it. That would be done
in writepage(). This would solve all the problems I describe above. We could
use PG_Checked flag to track that the page has a reservation and behave
accordingly in writepage() / invalidatepage(). ext3 in data=journal mode
already uses the flag but the use seems to be compatible with what I want
to do now... So it may actually work.
BTW: Note that there's a plenty of filesystems that don't implement
mkwrite() (e.g. ext2, UDF, VFAT...) and thus have the same problem with
ENOSPC. So I'd not speak too much about consistency ;).
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2009-03-24 15:29 UTC|newest]
Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-18 19:44 ftruncate-mmap: pages are lost after writing to mmaped file Ying Han
2009-03-18 19:44 ` Ying Han
2009-03-18 22:11 ` Andrew Morton
2009-03-18 22:11 ` Andrew Morton
2009-03-18 22:40 ` Linus Torvalds
2009-03-18 22:40 ` Linus Torvalds
2009-03-18 23:18 ` Ying Han
2009-03-18 23:18 ` Ying Han
2009-03-18 23:36 ` Linus Torvalds
2009-03-18 23:36 ` Linus Torvalds
2009-03-18 23:54 ` Ying Han
2009-03-18 23:54 ` Ying Han
2009-03-19 15:48 ` Nick Piggin
2009-03-19 15:48 ` Nick Piggin
2009-03-19 16:16 ` Peter Zijlstra
2009-03-19 16:16 ` Peter Zijlstra
2009-03-19 16:36 ` Nick Piggin
2009-03-19 16:36 ` Nick Piggin
2009-03-19 16:20 ` Linus Torvalds
2009-03-19 16:20 ` Linus Torvalds
2009-03-19 16:34 ` Nick Piggin
2009-03-19 16:34 ` Nick Piggin
2009-03-19 16:51 ` Linus Torvalds
2009-03-19 16:51 ` Linus Torvalds
2009-03-19 17:03 ` Jan Kara
2009-03-19 17:03 ` Jan Kara
2009-03-19 17:06 ` Jan Kara
2009-03-19 17:06 ` Jan Kara
2009-03-19 20:05 ` Linus Torvalds
2009-03-19 20:05 ` Linus Torvalds
2009-03-19 20:21 ` Linus Torvalds
2009-03-19 20:21 ` Linus Torvalds
2009-03-19 21:17 ` Ying Han
2009-03-19 21:17 ` Ying Han
2009-03-19 22:16 ` Jan Kara
2009-03-19 22:16 ` Jan Kara
2009-03-19 16:46 ` Jan Kara
2009-03-19 16:46 ` Jan Kara
2009-03-24 7:44 ` Nick Piggin
2009-03-24 7:44 ` Nick Piggin
2009-03-24 10:27 ` Nick Piggin
2009-03-24 10:27 ` Nick Piggin
2009-03-24 10:32 ` Andrew Morton
2009-03-24 10:32 ` Andrew Morton
2009-03-24 15:35 ` Nick Piggin
2009-03-24 15:35 ` Nick Piggin
2009-03-26 18:29 ` Jan Kara
2009-03-26 18:29 ` Jan Kara
2009-03-26 0:03 ` Ying Han
2009-03-26 0:03 ` Ying Han
2009-03-24 12:39 ` Jan Kara
2009-03-24 12:39 ` Jan Kara
2009-03-24 12:55 ` Jan Kara
2009-03-24 12:55 ` Jan Kara
2009-03-24 13:26 ` Jan Kara
2009-03-24 13:26 ` Jan Kara
2009-03-24 14:01 ` Chris Mason
2009-03-24 14:01 ` Chris Mason
2009-03-24 14:07 ` Jan Kara
2009-03-24 14:07 ` Jan Kara
2009-03-26 8:18 ` Aneesh Kumar K.V
2009-03-26 8:18 ` Aneesh Kumar K.V
2009-03-24 14:30 ` Nick Piggin
2009-03-24 14:30 ` Nick Piggin
2009-03-24 14:47 ` Jan Kara
2009-03-24 14:47 ` Jan Kara
2009-03-24 14:56 ` Peter Zijlstra
2009-03-24 14:56 ` Peter Zijlstra
2009-03-24 15:29 ` Jan Kara [this message]
2009-03-24 15:29 ` Jan Kara
2009-03-24 20:14 ` OGAWA Hirofumi
2009-03-24 20:14 ` OGAWA Hirofumi
2009-03-26 8:47 ` Aneesh Kumar K.V
2009-03-26 8:47 ` Aneesh Kumar K.V
2009-03-26 11:37 ` Jan Kara
2009-03-26 11:37 ` Jan Kara
2009-03-26 23:02 ` Linus Torvalds
2009-03-26 23:02 ` Linus Torvalds
2009-03-24 15:03 ` Nick Piggin
2009-03-24 15:03 ` Nick Piggin
2009-03-24 15:48 ` Jan Kara
2009-03-24 15:48 ` Jan Kara
2009-03-24 17:35 ` Jan Kara
2009-03-24 17:35 ` Jan Kara
2009-03-24 17:35 ` Jan Kara
2009-04-01 22:36 ` Ying Han
2009-04-01 22:36 ` Ying Han
2009-04-02 10:11 ` Jan Kara
2009-04-02 10:11 ` Jan Kara
2009-04-02 11:24 ` Nick Piggin
2009-04-02 11:24 ` Nick Piggin
2009-04-02 11:34 ` Jan Kara
2009-04-02 11:34 ` Jan Kara
2009-04-02 15:51 ` Nick Piggin
2009-04-02 15:51 ` Nick Piggin
2009-04-02 17:44 ` Ying Han
2009-04-02 17:44 ` Ying Han
2009-04-02 22:52 ` Ying Han
2009-04-02 22:52 ` Ying Han
2009-04-02 23:39 ` Jan Kara
2009-04-02 23:39 ` Jan Kara
2009-04-03 0:25 ` Ying Han
2009-04-03 0:25 ` Ying Han
2009-04-03 1:29 ` Ying Han
2009-04-03 1:29 ` Ying Han
2009-04-03 9:41 ` Jan Kara
2009-04-03 9:41 ` Jan Kara
2009-04-03 21:34 ` Ying Han
2009-04-03 21:34 ` Ying Han
2009-04-03 0:13 ` Ying Han
2009-04-03 0:13 ` Ying Han
2009-03-27 20:35 ` Ying Han
2009-03-27 20:35 ` Ying Han
2009-03-20 0:34 ` Ying Han
2009-03-20 0:34 ` Ying Han
2009-03-20 0:49 ` Linus Torvalds
2009-03-20 0:49 ` Linus Torvalds
2009-03-20 7:00 ` Ying Han
2009-03-20 7:00 ` Ying Han
2009-03-25 23:15 ` Ying Han
2009-03-25 23:15 ` Ying Han
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090324152959.GG23439@duck.suse.cz \
--to=jack@suse.cz \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=alexk@google.com \
--cc=guichaz@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mbligh@mbligh.org \
--cc=mikew@google.com \
--cc=nickpiggin@yahoo.com.au \
--cc=rohitseth@google.com \
--cc=torvalds@linux-foundation.org \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.