linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-ext4@vger.kernel.org
Subject: Re: [PATCH 4/4] ext3: Implement delayed allocation on page_mkwrite time
Date: Tue, 3 May 2011 13:39:00 +0300	[thread overview]
Message-ID: <BANLkTi=a8Ht0CbXicdAfJBy+tesY1jKX5A@mail.gmail.com> (raw)
In-Reply-To: <20110502222020.GB14889@quack.suse.cz>

On Tue, May 3, 2011 at 1:20 AM, Jan Kara <jack@suse.cz> wrote:
> On Mon 02-05-11 14:12:30, Andrew Morton wrote:
>> On Mon,  2 May 2011 22:56:56 +0200
>> Jan Kara <jack@suse.cz> wrote:
>>
>> > So far, ext3 was allocating necessary blocks for mmapped writes when
>> > writepage() was called. There are several issues with this. The worst
>> > being that user is allowed to arbitrarily exceed disk quotas because
>> > writepage() is called from flusher thread context (which is root) and thus
>> > quota limits are ignored. Another bad consequence is that data is just lost
>> > if we find there's no space on the filesystem during ->writepage() time.
>> >
>> > We solve these issues by implementing block reservation in page_mkwrite()
>> > callback. We don't want to really allocate blocks on page_mkwrite() time
>> > because for random writes via mmap (as seen for example with applications using
>> > BerkeleyDB) it results in much more fragmented files and thus much worse
>> > performance. So we allocate indirect blocks and reserve space for data block in
>> > page_mkwrite() and do the allocation of data block from writepage().
>>
>> Yes, instantiating the metadata and accounting the data is a good
>> approach.  The file layout will be a bit suboptimal, but surely that
>> will be a minor thing.
>>
>> But boy, it's a complicated patch!  Are we really sure that we want to
>> make changes this extensive to our antiquated old fs?  Or do we just
>> say "yeah, it's broken with quotas - use ext4"?
>  The patch isn't trivial, I agree (although it's mostly straightforward).
> Regarding telling users to switch to ext4 - it seems a bit harsh to me
> to ask people to switch to ext4 as a response to a (possibly security)
> issue they uncover. Because for most admins switching to ext4 will require
> some non-trivial testing I presume. Of course, the counterweight is the
> possibility of new bugs introduced to the code by my patch. But after some
> considerations I've decided it's worth it and and fixed the bug...

Jan,

Maybe you can work out a simpler(=safer) patch, which uses much larger
reservation margins.

For example, by reserving on mmap call, the difference between i_size
and i_blocks
and storing that reservation in in-memory inode, protected by i_mutex.
If an in-memory inode RESERVATION flag is set, you can update i_reserved_blocks
when updating i_blocks (alloc and free) and i_size (truncate).
global in-memory reservation counters can be updated at the same time.

This approach will avoid the need to introduce delayed allocation to ext3,
since you got the reservation on mmap time, you can rest assure that
no dirty data will be dropped on the floor and no quota limits will be exceeded.

The trade off is that an application that want to mmap a very large sparse file
to write just some blocks will not be able to do that on a near full
file system.
Do you know of any such real application?

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      parent reply	other threads:[~2011-05-03 10:39 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-02 20:56 [PATCH 0/4] Block reservation on page fault time for ext3 Jan Kara
2011-05-02 20:56 ` [PATCH 1/4] vfs: Unmap underlying metadata of new data buffers only when buffer is mapped Jan Kara
2011-05-02 20:56 ` [PATCH 3/4] ext3: Implement per-cpu counters for delayed allocation Jan Kara
2011-05-02 21:08   ` Andrew Morton
2011-05-02 20:56 ` [PATCH 4/4] ext3: Implement delayed allocation on page_mkwrite time Jan Kara
2011-05-02 21:12   ` Andrew Morton
2011-05-02 22:20     ` Jan Kara
2011-05-02 22:29       ` Andrew Morton
2011-05-03 17:09         ` Jan Kara
2011-05-11 15:38           ` Jan Kara
2011-05-11 19:52             ` Andrew Morton
2011-05-03 10:39       ` Amir Goldstein [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTi=a8Ht0CbXicdAfJBy+tesY1jKX5A@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).