* Two questions regarding ext4_fallocate()
@ 2013-05-04 15:31 Ji Wu
0 siblings, 0 replies; 4+ messages in thread
From: Ji Wu @ 2013-05-04 15:31 UTC (permalink / raw)
To: linux-ext4
Hi,
I have two questions regarding ext4_fallocate(),
(1) The first is the FALLOC_FL_PUNCH_HOLE support, I am wondering
what is the usage for it? The only use case comes to my mind is while
ext4 being
used for virtual machine image file storage. When VMM is aware of the
file deleting
operation in guest os, it can invoke host file system's fallocate() on
the virtual machine
image file to punch a hole to free host storage, so that save host
space. But how can VMM being
aware of guest file deleting? Simulate a virtual SSD-like block device
to guest os,
then capture the TRIM instruction issued by guest file system? That
seems too tricky.
So basically, where and how to benefit from hole punching?
(2) At the beginning of the function ext4_ext_punch_hole(), the
codes are as follows,
/* write out all dirty pages to avoid race condition */
filemap_write_and_wait_range(mapping, offset, offset+length-1);
mutex_lock(&inode->i_mutex);
truncate_page_cache_range();
Why does it need synchronously write back the dirty pages fit into
the hole,
the data on the disk responding to those pages are to be deleted,
why not directly
release those pages, no matter they are dirty or not. And
furthermore, this is done
before the inode lock is held, so it seems it may happen that after
the pages are written
back, and before the lock is held, those pages are dirtied again.
So basically, why does it need call filemap_write_and_wait_range()
before releasing those pages?
Explanations are appreciated.
Cheers,
Ji Wu
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Two questions regarding ext4_fallocate()
[not found] <5185222A.20801@163.com>
@ 2013-05-04 17:33 ` Theodore Ts'o
2013-05-05 1:14 ` Ji Wu
2013-05-05 7:18 ` Dmitry Monakhov
0 siblings, 2 replies; 4+ messages in thread
From: Theodore Ts'o @ 2013-05-04 17:33 UTC (permalink / raw)
To: Ji Wu; +Cc: linux-ext4, Andreas Dilger, Zheng Liu
On Sat, May 04, 2013 at 10:58:50PM +0800, Ji Wu wrote:
> Hi,
> I have two questions regarding ext4_fallocate(),
>
> (1) The first is the FALLOC_FL_PUNCH_HOLE support, I am wondering
> what is the usage for it? The only use case comes to my mind is
> while ext4 being used for virtual machine image file storage. When
> VMM is aware of the file deleting operation in guest os, it can
> invoke host file system's fallocate() on the virtual machine image
> file to punch a hole to free host storage, so that save host
> space. But how can VMM being aware of guest file deleting? Simulate
> a virtual SSD-like block device to guest os, then capture the TRIM
> instruction issued by guest file system? That seems too tricky. So
> basically, where and how to benefit from hole punching?
It's not too tricky; all of the hypervisors, whether it's KVM, or Xen,
or VMWare, are already simulating a SATA device to the guest OS.
Implementing support for the TRIM request is not that hard, and most
of the hypervisors are doing this already. Implementing the punch
hole functionality was indeed primarily motivated for this use case.
The other historical use of this was for digital video recorders, but
that's a much more specialized use case.
> (2) At the beginning of the function ext4_ext_punch_hole(), the
> codes are as follows,
>
> /* write out all dirty pages to avoid race condition */
> filemap_write_and_wait_range(mapping, offset, offset+length-1);
> mutex_lock(&inode->i_mutex);
> truncate_page_cache_range();
>
> Why does it need synchronously write back the dirty pages fit
> into the hole, the data on the disk responding to those pages are to
> be deleted, why not directly release those pages, no matter they are
> dirty or not. And furthermore, this is done before the inode lock is
> held, so it seems it may happen that after the pages are written
> back, and before the lock is held, those pages are dirtied again.
> So basically, why does it need call filemap_write_and_wait_range()
> before releasing those pages?
That's a good a question. Looking at it, I'm not sure we do. I
suspect this was put in originally to avoid races with setting the
EOFBLOCKS_FL flag, but as you point out, there's no way we can prevent
writes to sneak in before we grab the i_mutex. As a result, we ended
up dropping the need for EOFBLOCKS_FL entirely.
Maybe one of the ext4 developers will see something that I'm missing,
but I think we can drop this, which indeed will have a significant
performance improvement for systems that use the punch hole
functionality.
Cheers,
- Ted
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Two questions regarding ext4_fallocate()
2013-05-04 17:33 ` Two questions regarding ext4_fallocate() Theodore Ts'o
@ 2013-05-05 1:14 ` Ji Wu
2013-05-05 7:18 ` Dmitry Monakhov
1 sibling, 0 replies; 4+ messages in thread
From: Ji Wu @ 2013-05-05 1:14 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: linux-ext4, Andreas Dilger, linux-fsdevel, Zheng Liu
Hi Theodore,
Thanks for your explanation.
These questions are originally raised by my friend, after
a discussion, we did not figure out an exact answer. Now
I think I can ask him to prepare patch for it. Actually, we did find
this useless call applies to some other file systems.
Cheers,
Ji Wu
On 05/05/2013 01:33 AM, Theodore Ts'o wrote:
> On Sat, May 04, 2013 at 10:58:50PM +0800, Ji Wu wrote:
>> Hi,
>> I have two questions regarding ext4_fallocate(),
>>
>> (1) The first is the FALLOC_FL_PUNCH_HOLE support, I am wondering
>> what is the usage for it? The only use case comes to my mind is
>> while ext4 being used for virtual machine image file storage. When
>> VMM is aware of the file deleting operation in guest os, it can
>> invoke host file system's fallocate() on the virtual machine image
>> file to punch a hole to free host storage, so that save host
>> space. But how can VMM being aware of guest file deleting? Simulate
>> a virtual SSD-like block device to guest os, then capture the TRIM
>> instruction issued by guest file system? That seems too tricky. So
>> basically, where and how to benefit from hole punching?
> It's not too tricky; all of the hypervisors, whether it's KVM, or Xen,
> or VMWare, are already simulating a SATA device to the guest OS.
> Implementing support for the TRIM request is not that hard, and most
> of the hypervisors are doing this already. Implementing the punch
> hole functionality was indeed primarily motivated for this use case.
>
> The other historical use of this was for digital video recorders, but
> that's a much more specialized use case.
>
>> (2) At the beginning of the function ext4_ext_punch_hole(), the
>> codes are as follows,
>>
>> /* write out all dirty pages to avoid race condition */
>> filemap_write_and_wait_range(mapping, offset, offset+length-1);
>> mutex_lock(&inode->i_mutex);
>> truncate_page_cache_range();
>>
>> Why does it need synchronously write back the dirty pages fit
>> into the hole, the data on the disk responding to those pages are to
>> be deleted, why not directly release those pages, no matter they are
>> dirty or not. And furthermore, this is done before the inode lock is
>> held, so it seems it may happen that after the pages are written
>> back, and before the lock is held, those pages are dirtied again.
>> So basically, why does it need call filemap_write_and_wait_range()
>> before releasing those pages?
> That's a good a question. Looking at it, I'm not sure we do. I
> suspect this was put in originally to avoid races with setting the
> EOFBLOCKS_FL flag, but as you point out, there's no way we can prevent
> writes to sneak in before we grab the i_mutex. As a result, we ended
> up dropping the need for EOFBLOCKS_FL entirely.
>
> Maybe one of the ext4 developers will see something that I'm missing,
> but I think we can drop this, which indeed will have a significant
> performance improvement for systems that use the punch hole
> functionality.
>
> Cheers,
>
> - Ted
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Two questions regarding ext4_fallocate()
2013-05-04 17:33 ` Two questions regarding ext4_fallocate() Theodore Ts'o
2013-05-05 1:14 ` Ji Wu
@ 2013-05-05 7:18 ` Dmitry Monakhov
1 sibling, 0 replies; 4+ messages in thread
From: Dmitry Monakhov @ 2013-05-05 7:18 UTC (permalink / raw)
To: Theodore Ts'o, Ji Wu; +Cc: linux-ext4, Andreas Dilger, Zheng Liu
On Sat, 4 May 2013 13:33:26 -0400, Theodore Ts'o <tytso@mit.edu> wrote:
> On Sat, May 04, 2013 at 10:58:50PM +0800, Ji Wu wrote:
> > Hi,
> > I have two questions regarding ext4_fallocate(),
> >
> > (1) The first is the FALLOC_FL_PUNCH_HOLE support, I am wondering
> > what is the usage for it? The only use case comes to my mind is
> > while ext4 being used for virtual machine image file storage. When
> > VMM is aware of the file deleting operation in guest os, it can
> > invoke host file system's fallocate() on the virtual machine image
> > file to punch a hole to free host storage, so that save host
> > space. But how can VMM being aware of guest file deleting? Simulate
> > a virtual SSD-like block device to guest os, then capture the TRIM
> > instruction issued by guest file system? That seems too tricky. So
> > basically, where and how to benefit from hole punching?
>
> It's not too tricky; all of the hypervisors, whether it's KVM, or Xen,
> or VMWare, are already simulating a SATA device to the guest OS.
> Implementing support for the TRIM request is not that hard, and most
> of the hypervisors are doing this already. Implementing the punch
> hole functionality was indeed primarily motivated for this use case.
>
> The other historical use of this was for digital video recorders, but
> that's a much more specialized use case.
>
> > (2) At the beginning of the function ext4_ext_punch_hole(), the
> > codes are as follows,
> >
> > /* write out all dirty pages to avoid race condition */
> > filemap_write_and_wait_range(mapping, offset, offset+length-1);
> > mutex_lock(&inode->i_mutex);
> > truncate_page_cache_range();
> >
> > Why does it need synchronously write back the dirty pages fit
> > into the hole, the data on the disk responding to those pages are to
> > be deleted, why not directly release those pages, no matter they are
> > dirty or not. And furthermore, this is done before the inode lock is
> > held, so it seems it may happen that after the pages are written
> > back, and before the lock is held, those pages are dirtied again.
> > So basically, why does it need call filemap_write_and_wait_range()
> > before releasing those pages?
>
> That's a good a question. Looking at it, I'm not sure we do. I
> suspect this was put in originally to avoid races with setting the
> EOFBLOCKS_FL flag, but as you point out, there's no way we can prevent
> writes to sneak in before we grab the i_mutex. As a result, we ended
> up dropping the need for EOFBLOCKS_FL entirely.
>
> Maybe one of the ext4 developers will see something that I'm missing,
> but I think we can drop this, which indeed will have a significant
> performance improvement for systems that use the punch hole
> functionality.
Yes, there is a space for optimization here, but ordered case is special
and we have to call analog of ext4_begin_ordered_truncate() with two
arguments.
>
> Cheers,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-05-05 7:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <5185222A.20801@163.com>
2013-05-04 17:33 ` Two questions regarding ext4_fallocate() Theodore Ts'o
2013-05-05 1:14 ` Ji Wu
2013-05-05 7:18 ` Dmitry Monakhov
2013-05-04 15:31 Ji Wu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).