All of lore.kernel.org
 help / color / mirror / Atom feed
* FileStore performance: coalescing operations
@ 2015-02-26 14:28 Andreas Bluemle
  2015-02-26 15:02 ` Haomai Wang
  2015-03-04  1:05 ` Sage Weil
  0 siblings, 2 replies; 9+ messages in thread
From: Andreas Bluemle @ 2015-02-26 14:28 UTC (permalink / raw)
  To: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 817 bytes --]

Hi,

during the performance weely meeting, I had mentioned
my experiences concerning the transaction structure
for write requests at the level of the FileStore.
Such a transaction not only contains the OP_WRITE
operation to the object in the file system, but also
a series of OP_OMAP_SETKEYS and OP_SETATTR operations.

Find attached a README and source code patch, which
describe a prototype for coalescing the OP_OMAP_SETKEYS
operations and the performance impact f this change.

Regards

Andreas Bluemle

-- 
Andreas Bluemle                     mailto:Andreas.Bluemle@itxperts.de
ITXperts GmbH                       http://www.itxperts.de
Balanstrasse 73, Geb. 08            Phone: (+49) 89 89044917
D-81541 Muenchen (Germany)          Fax:   (+49) 89 89044910

Company details: http://www.itxperts.de/imprint.htm

[-- Attachment #2: ceph-master.file-store-omap_setkeys-colaescing.patch --]
[-- Type: text/x-patch, Size: 1864 bytes --]

diff --git a/src/os/FileStore.cc b/src/os/FileStore.cc
index f6c3bb8..29382b2 100644
--- a/src/os/FileStore.cc
+++ b/src/os/FileStore.cc
@@ -2260,10 +2260,24 @@ int FileStore::_check_replay_guard(int fd, const SequencerPosition& spos)
   }
 }
 
+void FileStore::_coalesce(map<string, bufferlist> &target, map<string, bufferlist> &source)
+{
+  for (map<string, bufferlist>::iterator p = source.begin();
+       p != source.end();
+       p++) {
+    target[p->first] = p->second;
+  }
+  return;
+}
+
 unsigned FileStore::_do_transaction(
   Transaction& t, uint64_t op_seq, int trans_num,
   ThreadPool::TPHandle *handle)
 {
+  map<string, bufferlist> collected_aset;
+  coll_t collected_cid;
+  ghobject_t collected_oid;
+
   dout(10) << "_do_transaction on " << &t << dendl;
 
 #ifdef WITH_LTTNG
@@ -2282,6 +2296,22 @@ unsigned FileStore::_do_transaction(
 
     _inject_failure();
 
+    if (op->op == Transaction::OP_OMAP_SETKEYS) {
+	collected_cid = i.get_cid(op->cid);
+	collected_oid = i.get_oid(op->oid);
+	map<string, bufferlist> aset;
+	i.decode_attrset(aset);
+	_coalesce(collected_aset, aset);
+	continue;
+    } else {
+	if (collected_aset.empty() == false) {
+	  tracepoint(objectstore, omap_setkeys_enter, osr_name);
+	  r = _omap_setkeys(collected_cid, collected_oid, collected_aset, spos);
+	  tracepoint(objectstore, omap_setkeys_exit, r);
+	  collected_aset.clear();
+	}
+    }
+
     switch (op->op) {
     case Transaction::OP_NOP:
       break;
diff --git a/src/os/FileStore.h b/src/os/FileStore.h
index af1fb8d..a039731 100644
--- a/src/os/FileStore.h
+++ b/src/os/FileStore.h
@@ -449,6 +449,8 @@ public:
 
   int statfs(struct statfs *buf);
 
+  void _coalesce( map<string, bufferlist> &target, map<string, bufferlist> &source);
+
   int _do_transactions(
     list<Transaction*> &tls, uint64_t op_seq,
     ThreadPool::TPHandle *handle);

[-- Attachment #3: README.file-store-coalescing --]
[-- Type: text/plain, Size: 2554 bytes --]

Coalescing OMAP_SETKEYS operations in a write transaction
---------------------------------------------------------
Description
-----------

At the level of FileStore, every write request is embedded in a transaction
which consists of
  6 key-value pair settings in 3 OMAP_SETKEYS operations
  the actual OP_WRITE
  2 settings in the extended file system attributes.

The modification of the FileStore::_do_transaction() coalesces the
6 key-value pairs into a single operation, with the side effect of
reducing the number of key-value pairs to 5: one key appears twice
and only the last values is going to be set.

Performance improvement
-----------------------

Cluster with 3 storage nodes, 4 osd (SAS disk, SSD journal) per node,
separate client node with rbd using the kernel clients,
test load generated by fio, randon write, 4K block size, iodepth 16.

client improvement: approx. 5 % (12890 iops vs. 13369 iops)
storage node improvement: reduction in CPU consuptiom of ceph-osd daemon
by 10%; see follwoing table (derived from /proc/<pid>/schedstat:


ceph-osd process and             CPU usage         | CPU usage
thread classes                   v0.91 unmodified  | v0.91 with coalescing
---------------------------------------------------+----------------------
total cpu usage:                 43.17 CPU-seconds | 39.33 CPU-seconds
                                                   |
ThreadPool::WorkThread::entry(): 15.56   36.04%    | 12.45   31.66%
ShardedThreadPool::workers:       8.07   18.70%    |  7.94   20.18%
Pipe::Reader::                    5.81   13.45%    |  5.92   15.04%
Pipe::Writer::entry():            4.59   10.63%    |  4.73   12.02%
FileJournal::Writer::             2.41    5.57%    |  2.45    6.22%
Finisher::finisher_thread:        2.86    6.63%    |  1.03    2.61%
                                                   |
WBThrottle::entry:                n/a     n/a      |  0.81   2.06%

Interesting: with coalescing active, the WBthrottle shows up in CPU usage.
In the default case, this was almost invisible.


Source/Patch
------------
https://www.github.com/andreas-bluemle/ceph
   commit f33c48358f762cbeb5d30724efacf78ff5438e9e

patches:
   relative to pull request at https://www.github.com/andreas-bluemle/ceph
     ceph-andreas-bluemle.file-store-omap_setkeys-colaescing.patch

   relative to ceph master at at https://www.github.com
     (commit a7a70cabe25fdfe3322c784f6797231d14e112c2)
     ceph-master.file-store-omap_setkeys-colaescing.patch


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-03-19 14:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-26 14:28 FileStore performance: coalescing operations Andreas Bluemle
2015-02-26 15:02 ` Haomai Wang
2015-02-26 15:06   ` Mark Nelson
2015-03-04  1:05 ` Sage Weil
2015-03-05  0:10   ` Sage Weil
2015-03-05  7:04     ` Haomai Wang
2015-03-11  3:44       ` Ning Yao
2015-03-11 12:34         ` Sage Weil
2015-03-19 14:59       ` Andreas Bluemle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.