All of lore.kernel.org
 help / color / mirror / Atom feed
* FileStore should not use syncfs(2)
@ 2015-08-05 21:26 Sage Weil
  2015-08-05 21:38 ` Somnath Roy
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Sage Weil @ 2015-08-05 21:26 UTC (permalink / raw)
  To: Somnath.Roy; +Cc: ceph-devel, sjust

Today I learned that syncfs(2) does an O(n) search of the superblock's 
inode list searching for dirty items.  I've always assumed that it was 
only traversing dirty inodes (e.g., a list of dirty inodes), but that 
appears not to be the case, even on the latest kernels.

That means that the more RAM in the box, the larger (generally) the inode 
cache, the longer syncfs(2) will take, and the more CPU you'll waste doing 
it.  The box I was looking at had 256GB of RAM, 36 OSDs, and a load of ~40 
servicing a very light workload, and each syncfs(2) call was taking ~7 
seconds (usually to write out a single inode).

A possible workaround for such boxes is to turn 
/proc/sys/vm/vfs_cache_pressure way up (so that the kernel favors caching 
pages instead of inodes/dentries)...

I think the take-away though is that we do need to bite the bullet and 
make FileStore f[data]sync all the right things so that the syncfs call 
can be avoided.  This is the path you were originally headed down, 
Somnath, and I think it's the right one.

The main thing to watch out for is that according to POSIX you really need 
to fsync directories.  With XFS that isn't the case since all metadata 
operations are going into the journal and that's fully ordered, but we 
don't want to allow data loss on e.g. ext4 (we need to check what the 
metadata ordering behavior is there) or other file systems.

:(

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-08-07  6:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-05 21:26 FileStore should not use syncfs(2) Sage Weil
2015-08-05 21:38 ` Somnath Roy
2015-08-06  2:17   ` Haomai Wang
2015-08-06 12:47     ` Sage Weil
2015-08-05 21:55 ` Mark Nelson
2015-08-07  6:50   ` Chen, Xiaoxi
2015-08-06  9:44 ` Yan, Zheng
2015-08-06 12:57   ` Sage Weil
2015-08-06 11:27 ` Christoph Hellwig
2015-08-06 13:00   ` Sage Weil
2015-08-06 13:06     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.