On 04/13/2015 10:27 AM, Sage Weil wrote: > [adding ceph-devel] > > On Mon, 13 Apr 2015, Chen, Xiaoxi wrote: >> Hi, >> >> Actually I have done the tuning survey on RocksDB when I was >> updating the RocksDB to newer version and exposed the tuning in >> ceph.conf. >> >> What we need to ensure is the WAL never hit the disk. The rocksdb > > We'll always have to pay that 1x write to the log; we just want to make > sure it doesn't turn into 2x. I take it you're assuming the log is on an > SSD (not disk)? > >> write ahead log is already introduce 1X write, if the data flushed to >> SST in level 0, that will be 2X, not to mention any further compaction. >> >> The tuning that makes the differences are : >> write_buffer_size >> max_write_buffer_number >> min_write_buffer_number_to_merge >> >> Say if we have >> write_buffer_size =512M >> max_write_buffer_number = 6 >> min_write_buffer_number_to_merge =2 Attached are tests for a single PCIE ssd with filestore, newstore + fsync + default tunables, newstore+fsync + Xiaoxi's tunables, and also a test using xiaoxi's tunables with fdatasync. Basically Xioaxi's tunables help, and fdatasync helps a little more (mostly at small IO sizes), but still not enough to get us to beat filestore, though newstore *does* do consistently better than filestore with 4MB writes now. Mark