public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* xfs over pmem - cp performance
@ 2016-01-08 21:07 Elliott, Robert (Persistent Memory)
  2016-01-08 22:03 ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Elliott, Robert (Persistent Memory) @ 2016-01-08 21:07 UTC (permalink / raw)
  To: david@fromorbit.com
  Cc: linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org

I tried using cp to copy the linux git tree between
pmem devices like this:
                cp -r /mnt/xfs-pmem1/linux /mnt/xfs-pmem2

The time taken by various filesystems varies (4.4-rc5):
* xfs    w/dax: 42 s
* xfs   no dax: 14 s
* ext4   w/dax:  7 s
* ext4  no dax: 15 s
* btrfs no dax: 18 s

mount options:
* /dev/pmem1 on /mnt/xfs-pmem1 type xfs (rw,relatime,seclabel,attr2,dax,inode64,noquota)
* /dev/pmem1 on /mnt/ext4-pmem1 type ext4 (rw,relatime,seclabel,dax,data=ordered)
* /dev/pmem1 on /mnt/btrfs-pmem1 type btrfs (rw,relatime,seclabel,ssd,space_cache,subvolid=5,subvol=/)

xfs with dax spends most of the time in clear_page_c_e and
dax-clear_blocks (from "perf top"): 
  30.06%  [kernel]            [k] clear_page_c_e        
  12.24%  [kernel]            [k] dax_clear_blocks      
   5.36%  [kernel]            [k] copy_user_enhanced_fast_string
   4.33%  [kernel]            [k] __copy_user_nocache   
   2.55%  [xfs]               [k] xfs_perag_put         
   1.77%  [kernel]            [k] security_compute_sid.part.12  
   1.19%  [kernel]            [k] __percpu_counter_sum  
   1.14%  [kernel]            [k] acpi_os_write_port    
   1.03%  [kernel]            [k] dax_do_io             
   1.00%  [kernel]            [k] _raw_spin_lock        

The others spend most of their time in the 
copy_user_enhanced_fast_string and __copy_user_nocache 
functions that actually copy data.

xfs without dax:
  28.82%  [kernel]            [k] copy_user_enhanced_fast_string
   7.48%  [kernel]            [k] __copy_user_nocache
   3.63%  [kernel]            [k] __block_commit_write.isra.22
   1.86%  [kernel]            [k] acpi_os_write_port
   1.72%  [kernel]            [k] filenametr_cmp
   1.48%  [kernel]            [k] hashtab_search
   1.28%  [kernel]            [k] security_compute_sid.part.12
   0.96%  [kernel]            [k] _raw_spin_lock

ext4 with dax:
  22.85%  [kernel]             [k] __copy_user_nocache
  22.51%  [kernel]             [k] copy_user_enhanced_fast_string
   4.15%  [kernel]             [k] mb_find_order_for_block
   3.03%  [kernel]             [k] dax_do_io
   2.08%  [kernel]             [k] __d_lookup_rcu
   1.85%  [kernel]             [k] mb_find_extent
   1.75%  [kernel]             [k] ext4_mark_iloc_dirty
   1.54%  [kernel]             [k] acpi_os_write_port
   1.15%  [kernel]             [k] _find_next_bit.part.0
   0.99%  [kernel]             [k] ext4_mb_good_group

ext4 without dax:
  29.89%  [kernel]            [k] copy_user_enhanced_fast_string
  15.81%  [kernel]            [k] __copy_user_nocache
   4.45%  [kernel]            [k] __block_commit_write.isra.22
   1.39%  [kernel]            [k] ext4_mark_iloc_dirty
   1.37%  [kernel]            [k] ext4_bio_write_page
   1.12%  [kernel]            [k] filenametr_cmp
   1.09%  [kernel]            [k] security_compute_sid.part.12
   0.98%  [kernel]            [k] hashtab_search

btrfs (without dax):
  14.25%  [kernel]            [k] copy_user_enhanced_fast_string
  14.12%  [kernel]            [k] queued_spin_lock_slowpath
   9.70%  [kernel]            [k] __copy_user_nocache
   3.48%  [kernel]            [k] acpi_os_write_port
   1.52%  [kernel]            [k] _raw_spin_lock
   1.38%  [kernel]            [k] queued_write_lock_slowpath
   1.36%  [kernel]            [k] _raw_spin_lock_irqsave

---
Robert Elliott, HPE Persistent Memory



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfs over pmem - cp performance
  2016-01-08 21:07 xfs over pmem - cp performance Elliott, Robert (Persistent Memory)
@ 2016-01-08 22:03 ` Dave Chinner
  2016-01-12 17:31   ` Ross Zwisler
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2016-01-08 22:03 UTC (permalink / raw)
  To: Elliott, Robert (Persistent Memory)
  Cc: linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org

On Fri, Jan 08, 2016 at 09:07:27PM +0000, Elliott, Robert (Persistent Memory) wrote:
> I tried using cp to copy the linux git tree between
> pmem devices like this:
>                 cp -r /mnt/xfs-pmem1/linux /mnt/xfs-pmem2
> 
> The time taken by various filesystems varies (4.4-rc5):
> * xfs    w/dax: 42 s
> * xfs   no dax: 14 s
> * ext4   w/dax:  7 s
> * ext4  no dax: 15 s
> * btrfs no dax: 18 s

Yes, we know.

> mount options:
> * /dev/pmem1 on /mnt/xfs-pmem1 type xfs (rw,relatime,seclabel,attr2,dax,inode64,noquota)
> * /dev/pmem1 on /mnt/ext4-pmem1 type ext4 (rw,relatime,seclabel,dax,data=ordered)
> * /dev/pmem1 on /mnt/btrfs-pmem1 type btrfs (rw,relatime,seclabel,ssd,space_cache,subvolid=5,subvol=/)
> 
> xfs with dax spends most of the time in clear_page_c_e and
> dax-clear_blocks (from "perf top"): 
>   30.06%  [kernel]            [k] clear_page_c_e        
>   12.24%  [kernel]            [k] dax_clear_blocks      

That's where the difference is - XFS is zeroing the blocks during
allocation so that we know that a failed write or crash during a
write will not expose stale data to the user. I've made comment
about this previously here:

http://oss.sgi.com/archives/xfs/2015-11/msg00021.html

and it's a result of the current "everything is synchronous" DAX cpu
cache control behaviour.

I think it's worth noting that ext4 is not spending any time
zeroing the blocks during allocation, which I think means that it
can expose stale data as a result of a crash or partial write....

We're working on fixing this, but it needs all the fsync patches
from Ross to enable us to turn off the synchronous cache flushes
in the DAX IO code.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfs over pmem - cp performance
  2016-01-08 22:03 ` Dave Chinner
@ 2016-01-12 17:31   ` Ross Zwisler
  0 siblings, 0 replies; 3+ messages in thread
From: Ross Zwisler @ 2016-01-12 17:31 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Elliott, Robert (Persistent Memory),
	linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org,
	Jan Kara

On Sat, Jan 09, 2016 at 09:03:28AM +1100, Dave Chinner wrote:
> On Fri, Jan 08, 2016 at 09:07:27PM +0000, Elliott, Robert (Persistent Memory) wrote:
> > I tried using cp to copy the linux git tree between
> > pmem devices like this:
> >                 cp -r /mnt/xfs-pmem1/linux /mnt/xfs-pmem2
> > 
> > The time taken by various filesystems varies (4.4-rc5):
> > * xfs    w/dax: 42 s
> > * xfs   no dax: 14 s
> > * ext4   w/dax:  7 s
> > * ext4  no dax: 15 s
> > * btrfs no dax: 18 s
> 
> Yes, we know.
> 
> > mount options:
> > * /dev/pmem1 on /mnt/xfs-pmem1 type xfs (rw,relatime,seclabel,attr2,dax,inode64,noquota)
> > * /dev/pmem1 on /mnt/ext4-pmem1 type ext4 (rw,relatime,seclabel,dax,data=ordered)
> > * /dev/pmem1 on /mnt/btrfs-pmem1 type btrfs (rw,relatime,seclabel,ssd,space_cache,subvolid=5,subvol=/)
> > 
> > xfs with dax spends most of the time in clear_page_c_e and
> > dax-clear_blocks (from "perf top"): 
> >   30.06%  [kernel]            [k] clear_page_c_e        
> >   12.24%  [kernel]            [k] dax_clear_blocks      
> 
> That's where the difference is - XFS is zeroing the blocks during
> allocation so that we know that a failed write or crash during a
> write will not expose stale data to the user. I've made comment
> about this previously here:
> 
> http://oss.sgi.com/archives/xfs/2015-11/msg00021.html
> 
> and it's a result of the current "everything is synchronous" DAX cpu
> cache control behaviour.
> 
> I think it's worth noting that ext4 is not spending any time
> zeroing the blocks during allocation, which I think means that it
> can expose stale data as a result of a crash or partial write....

Jan's patch series that does the zeroing for newly allocated blocks in ext4
hasn't been merged yet, and is queued for v4.5 inclusion:

https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/log/

My guess is that once this set is included the ext4 overhead for block zeroing
will go up.  If you're testing v4.4 code the zeroing for newly allocated
blocks with ext4 is still happening inside of DAX.

> We're working on fixing this, but it needs all the fsync patches
> from Ross to enable us to turn off the synchronous cache flushes
> in the DAX IO code.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-01-12 17:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-08 21:07 xfs over pmem - cp performance Elliott, Robert (Persistent Memory)
2016-01-08 22:03 ` Dave Chinner
2016-01-12 17:31   ` Ross Zwisler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox