Two questions regarding ext4

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Two questions regarding ext4_fallocate()
@ 2013-05-04 15:31 Ji Wu
  0 siblings, 0 replies; 4+ messages in thread
From: Ji Wu @ 2013-05-04 15:31 UTC (permalink / raw)
  To: linux-ext4

Hi,
    I have two questions regarding ext4_fallocate(),

    (1) The first is the FALLOC_FL_PUNCH_HOLE support, I am wondering
what is the usage for it? The only use case comes to my mind is while 
ext4 being
used for virtual machine image file storage. When VMM is aware of the 
file deleting
operation in guest os, it can invoke host file system's fallocate() on 
the virtual machine
image file to punch a hole to free host storage, so that save host 
space. But how can VMM being
aware of guest file deleting? Simulate a virtual SSD-like block device 
to guest os,
then capture the TRIM instruction issued by guest file system? That 
seems too tricky.
So basically, where and how to benefit from hole punching?

    (2) At the beginning of the function ext4_ext_punch_hole(), the 
codes are as follows,

         /* write out all dirty pages to avoid race condition */
         filemap_write_and_wait_range(mapping, offset, offset+length-1);
         mutex_lock(&inode->i_mutex);
         truncate_page_cache_range();

     Why does it need synchronously write back the dirty pages fit into 
the hole,
     the data on the disk responding to those pages are to be deleted, 
why not directly
     release those pages, no matter they are dirty or not. And 
furthermore, this is done
     before the inode lock is held, so it seems it may happen that after 
the pages are written
     back, and before the lock is held, those pages are dirtied again.
     So basically, why does it need call filemap_write_and_wait_range() 
before releasing those pages?

Explanations are appreciated.

Cheers,
Ji Wu


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Two questions regarding ext4_fallocate()
       [not found] <5185222A.20801@163.com>
@ 2013-05-04 17:33 ` Theodore Ts'o
  2013-05-05  1:14   ` Ji Wu
  2013-05-05  7:18   ` Dmitry Monakhov
  0 siblings, 2 replies; 4+ messages in thread
From: Theodore Ts'o @ 2013-05-04 17:33 UTC (permalink / raw)
  To: Ji Wu; +Cc: linux-ext4, Andreas Dilger, Zheng Liu

On Sat, May 04, 2013 at 10:58:50PM +0800, Ji Wu wrote:
> Hi,
>    I have two questions regarding ext4_fallocate(),
> 
>    (1) The first is the FALLOC_FL_PUNCH_HOLE support, I am wondering
> what is the usage for it? The only use case comes to my mind is
> while ext4 being used for virtual machine image file storage. When
> VMM is aware of the file deleting operation in guest os, it can
> invoke host file system's fallocate() on the virtual machine image
> file to punch a hole to free host storage, so that save host
> space. But how can VMM being aware of guest file deleting? Simulate
> a virtual SSD-like block device to guest os, then capture the TRIM
> instruction issued by guest file system? That seems too tricky.  So
> basically, where and how to benefit from hole punching?

It's not too tricky; all of the hypervisors, whether it's KVM, or Xen,
or VMWare, are already simulating a SATA device to the guest OS.
Implementing support for the TRIM request is not that hard, and most
of the hypervisors are doing this already.  Implementing the punch
hole functionality was indeed primarily motivated for this use case.

The other historical use of this was for digital video recorders, but
that's a much more specialized use case.

>    (2) At the beginning of the function ext4_ext_punch_hole(), the
> codes are as follows,
> 
>         /* write out all dirty pages to avoid race condition */
>         filemap_write_and_wait_range(mapping, offset, offset+length-1);
>         mutex_lock(&inode->i_mutex);
>         truncate_page_cache_range();
> 
>     Why does it need synchronously write back the dirty pages fit
> into the hole, the data on the disk responding to those pages are to
> be deleted, why not directly release those pages, no matter they are
> dirty or not.  And furthermore, this is done before the inode lock is
> held, so it seems it may happen that after the pages are written
> back, and before the lock is held, those pages are dirtied again.
> So basically, why does it need call filemap_write_and_wait_range()
> before releasing those pages?

That's a good a question.  Looking at it, I'm not sure we do.  I
suspect this was put in originally to avoid races with setting the
EOFBLOCKS_FL flag, but as you point out, there's no way we can prevent
writes to sneak in before we grab the i_mutex.  As a result, we ended
up dropping the need for EOFBLOCKS_FL entirely.

Maybe one of the ext4 developers will see something that I'm missing,
but I think we can drop this, which indeed will have a significant
performance improvement for systems that use the punch hole
functionality.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Two questions regarding ext4_fallocate()
  2013-05-04 17:33 ` Two questions regarding ext4_fallocate() Theodore Ts'o
@ 2013-05-05  1:14   ` Ji Wu
  2013-05-05  7:18   ` Dmitry Monakhov
  1 sibling, 0 replies; 4+ messages in thread
From: Ji Wu @ 2013-05-05  1:14 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, Andreas Dilger, linux-fsdevel, Zheng Liu

Hi Theodore,
      Thanks for your explanation.
      These questions are originally raised by my friend, after
a discussion, we did not figure out an exact answer. Now
I think I can ask him to prepare patch for it. Actually, we did find
this useless call applies to some other file systems.

Cheers,
Ji Wu

On 05/05/2013 01:33 AM, Theodore Ts'o wrote:
> On Sat, May 04, 2013 at 10:58:50PM +0800, Ji Wu wrote:
>> Hi,
>>     I have two questions regarding ext4_fallocate(),
>>
>>     (1) The first is the FALLOC_FL_PUNCH_HOLE support, I am wondering
>> what is the usage for it? The only use case comes to my mind is
>> while ext4 being used for virtual machine image file storage. When
>> VMM is aware of the file deleting operation in guest os, it can
>> invoke host file system's fallocate() on the virtual machine image
>> file to punch a hole to free host storage, so that save host
>> space. But how can VMM being aware of guest file deleting? Simulate
>> a virtual SSD-like block device to guest os, then capture the TRIM
>> instruction issued by guest file system? That seems too tricky.  So
>> basically, where and how to benefit from hole punching?
> It's not too tricky; all of the hypervisors, whether it's KVM, or Xen,
> or VMWare, are already simulating a SATA device to the guest OS.
> Implementing support for the TRIM request is not that hard, and most
> of the hypervisors are doing this already.  Implementing the punch
> hole functionality was indeed primarily motivated for this use case.
>
> The other historical use of this was for digital video recorders, but
> that's a much more specialized use case.
>
>>     (2) At the beginning of the function ext4_ext_punch_hole(), the
>> codes are as follows,
>>
>>          /* write out all dirty pages to avoid race condition */
>>          filemap_write_and_wait_range(mapping, offset, offset+length-1);
>>          mutex_lock(&inode->i_mutex);
>>          truncate_page_cache_range();
>>
>>      Why does it need synchronously write back the dirty pages fit
>> into the hole, the data on the disk responding to those pages are to
>> be deleted, why not directly release those pages, no matter they are
>> dirty or not.  And furthermore, this is done before the inode lock is
>> held, so it seems it may happen that after the pages are written
>> back, and before the lock is held, those pages are dirtied again.
>> So basically, why does it need call filemap_write_and_wait_range()
>> before releasing those pages?
> That's a good a question.  Looking at it, I'm not sure we do.  I
> suspect this was put in originally to avoid races with setting the
> EOFBLOCKS_FL flag, but as you point out, there's no way we can prevent
> writes to sneak in before we grab the i_mutex.  As a result, we ended
> up dropping the need for EOFBLOCKS_FL entirely.
>
> Maybe one of the ext4 developers will see something that I'm missing,
> but I think we can drop this, which indeed will have a significant
> performance improvement for systems that use the punch hole
> functionality.
>
> Cheers,
>
> 						- Ted
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Two questions regarding ext4_fallocate()
  2013-05-04 17:33 ` Two questions regarding ext4_fallocate() Theodore Ts'o
  2013-05-05  1:14   ` Ji Wu
@ 2013-05-05  7:18   ` Dmitry Monakhov
  1 sibling, 0 replies; 4+ messages in thread
From: Dmitry Monakhov @ 2013-05-05  7:18 UTC (permalink / raw)
  To: Theodore Ts'o, Ji Wu; +Cc: linux-ext4, Andreas Dilger, Zheng Liu

On Sat, 4 May 2013 13:33:26 -0400, Theodore Ts'o <tytso@mit.edu> wrote:
> On Sat, May 04, 2013 at 10:58:50PM +0800, Ji Wu wrote:
> > Hi,
> >    I have two questions regarding ext4_fallocate(),
> > 
> >    (1) The first is the FALLOC_FL_PUNCH_HOLE support, I am wondering
> > what is the usage for it? The only use case comes to my mind is
> > while ext4 being used for virtual machine image file storage. When
> > VMM is aware of the file deleting operation in guest os, it can
> > invoke host file system's fallocate() on the virtual machine image
> > file to punch a hole to free host storage, so that save host
> > space. But how can VMM being aware of guest file deleting? Simulate
> > a virtual SSD-like block device to guest os, then capture the TRIM
> > instruction issued by guest file system? That seems too tricky.  So
> > basically, where and how to benefit from hole punching?
> 
> It's not too tricky; all of the hypervisors, whether it's KVM, or Xen,
> or VMWare, are already simulating a SATA device to the guest OS.
> Implementing support for the TRIM request is not that hard, and most
> of the hypervisors are doing this already.  Implementing the punch
> hole functionality was indeed primarily motivated for this use case.
> 
> The other historical use of this was for digital video recorders, but
> that's a much more specialized use case.
> 
> >    (2) At the beginning of the function ext4_ext_punch_hole(), the
> > codes are as follows,
> > 
> >         /* write out all dirty pages to avoid race condition */
> >         filemap_write_and_wait_range(mapping, offset, offset+length-1);
> >         mutex_lock(&inode->i_mutex);
> >         truncate_page_cache_range();
> > 
> >     Why does it need synchronously write back the dirty pages fit
> > into the hole, the data on the disk responding to those pages are to
> > be deleted, why not directly release those pages, no matter they are
> > dirty or not.  And furthermore, this is done before the inode lock is
> > held, so it seems it may happen that after the pages are written
> > back, and before the lock is held, those pages are dirtied again.
> > So basically, why does it need call filemap_write_and_wait_range()
> > before releasing those pages?
> 
> That's a good a question.  Looking at it, I'm not sure we do.  I
> suspect this was put in originally to avoid races with setting the
> EOFBLOCKS_FL flag, but as you point out, there's no way we can prevent
> writes to sneak in before we grab the i_mutex.  As a result, we ended
> up dropping the need for EOFBLOCKS_FL entirely.
> 
> Maybe one of the ext4 developers will see something that I'm missing,
> but I think we can drop this, which indeed will have a significant
> performance improvement for systems that use the punch hole
> functionality.
Yes, there is a space for optimization here, but ordered case is special
and we have to call analog of ext4_begin_ordered_truncate() with two
arguments.
> 
> Cheers,
> 
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-05-05  7:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5185222A.20801@163.com>
2013-05-04 17:33 ` Two questions regarding ext4_fallocate() Theodore Ts'o
2013-05-05  1:14   ` Ji Wu
2013-05-05  7:18   ` Dmitry Monakhov
2013-05-04 15:31 Ji Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).