linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Prashant Shah <pshah.mumbai@gmail.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: Fwd: block level cow operation
Date: Tue, 9 Apr 2013 17:02:04 -0400	[thread overview]
Message-ID: <20130409210204.GB430@thunk.org> (raw)
In-Reply-To: <CAD6i1f+NLF6Vj8D84FFWAbDttqbBvcg-kWswf7Hez2o0-cXpMw@mail.gmail.com>

On Tue, Apr 09, 2013 at 02:35:56PM +0530, Prashant Shah wrote:
> I am trying to implement copy on write operation by reading the
> original disk block and writing it to some other location....

Lukas asked the correct first question, which is why are you trying to
do this?  If the goal is to make COW snapshots, then there's a lot of
accounting information that you'll need to keep track of, and it is
very doubtful ext4 will be the right place to do things.

If the goal is to do efficient writes into cheap eMMC flash for random
write workloads (i.e., which is the same problem f2fs is trying to
solve), it's not totally insane to try to adapt ext4 to handle this
problem.

#1 You'd need to add support into mballoc to understand how to align
its block writes on eMMC erase block boundaries, and to have a mode
where it gives you sequentially increasing physical blocks ignoring
the logical block numbers.

#2 You'd need to intercept the write requests at the writepages() and
writepage() calls, and that's where the decision would have to be made
to allocate a new set of block numbers, based on some flag setting
that would either be on a per-filesystem or per open file basis.  As
part of the I/O completion callback, where today we have code paths to
convert an uninitialized extent to initialized extents, we could teach
that code path to update the logical block mapping.

#3 You'd have to come up with some approach to deal with direct I/O
(including potentially not supporting COW writes for DIO).  

#4 You'd probably only want to do this for indirect block mapped
files, since for a random write workload, the extent tree would
become very inefficient very quickly.


So it's not _insane_ but it's a huge amount of work, and it would be
very trickly, and it's not something that I would recommend, say, if a
student was looking for a term project.  It would also not be faster
on SSD or HDD's.  The only reason to do something like this would be
to deal with the extremely low-cost FTL of cheap eMMC flash devices
(where the BOM cost of eMMC is approximately two orders of magnitude
cheaper than SSD's).  So if you are benchmarking this on a HDD or SSD,
don't be surprised if it's much slower.  And if you are benchmarking
on eMMC, you have to make sure that you have the writes appropriately
erase block aligned, or any performance gains would be hopeless.

Regards,

					- Ted

      parent reply	other threads:[~2013-04-09 21:02 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAD6i1f+GsVZJwaz1R3NDjP_m8nOCUsmqHTQS3R=M+d+hq8f5vw@mail.gmail.com>
2013-04-09  9:05 ` Fwd: block level cow operation Prashant Shah
2013-04-09  9:56   ` Lukáš Czerner
2013-04-09 14:46   ` Dmitry Monakhov
2013-04-25 13:00     ` Prashant Shah
2013-05-10 13:14       ` Prashant Shah
2013-04-09 21:02   ` Theodore Ts'o [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130409210204.GB430@thunk.org \
    --to=tytso@mit.edu \
    --cc=linux-ext4@vger.kernel.org \
    --cc=pshah.mumbai@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).