All of lore.kernel.org
 help / color / mirror / Atom feed
* xio messenger prelim benchmark
@ 2015-02-04 20:08 Vu Pham
  2015-02-05  2:08 ` Mark Nelson
  0 siblings, 1 reply; 6+ messages in thread
From: Vu Pham @ 2015-02-04 20:08 UTC (permalink / raw)
  To: ceph-devel
  Cc: "Matt W. Benjamin" <matt@cohortfs.com>, Oren Duer

[-- Attachment #1: Type: text/plain, Size: 3290 bytes --]

Hi,

I would like to share some benchmarking numbers on xio messenger and 
simple messenger

HW/SW configuration:
---------------------
. 1 32-core Xeon E5-2697V3 2.6G (Haswell) node, 64GB of memory.
. Hyperthreading is ON/enabled, 64 cores
. Mellanox ConnectX3-EN 40Gb/s HCAs, fw- 2.33.5000
. Mellanox SX1012 40Gb/s switch EN.
. Ubuntu 14.04 LTS stock kernel
. MLNX_OFED_LINUX-2.4-1.0.0 sw package
. Accelio master branch (tag v-1.3)
. Ceph master (Jan-29) + pr #3544 (xio spread portals).
. Use ramdisks and filestore backend
. Use fio_rbd on user rbd as client

1 OSD, 1 client node
-------------------------
a. 1 rbd image
xio messenger:
. ~9100 iops (4K random write, 6 cores used on osd node, numjobs=1, 
iodepth=64)
. ~21k iops (4K random read, 4 cores used, numjobs=1, iodepth=32)
. ~121k iops (4K random read, 15 cores used, numjobs=8, iodepth=32)
. ~520MB/s (256K random write, 3 cores used, numjobs=1, iodepth=64)
. ~3140MB/s (256K random read, 4 cores used, numjobs=1, iodepth=32)
. ~4330MB/s (256K random read, 6 cores used, numjobs=8)
simple messenger:
. ~8500 iops (4K random write, 7 cores used)
. ~20k iops (4K random read, 5 cores used)
. ~105k iops (4K random read, 20 cores used, numjobs=8, iopdepth=32)
. ~450MB/s (256K random write, 3 cores used)
. ~1140MB/s (256K random read, 3 cores used)
. ~4330MB/s (256K random read, 8 cores used, numjobs=8)


b. 2 rbd images on two separated pools, 2 fio_rbd instances
xio messenger:
. ~9100 iops (4K random write, 6 cores used on osd node, each fio_rbd 
instance has numjobs=1, iodepth=64)
. ~155k iops (4K random read, 19 cores used, each fio_rbd instance has 
numjobs=8, iodepth=32)
. ~4225MB/s (256K random read, 6 cores used, each fio_rbd instance has 
numjobs=1, iodepth=32)
. ~4330MB/s (256K random read, 8 cores used, each fio_rbd instance has 
numjobs=8, iodepth=32)

simple messenger:
. ~7800 iops (4K random write, 7 cores used on osd node, each fio_rbd 
instance has numjobs=1, iodepth=64)
. ~125k iops (4K random read, 25 cores used, each fio_rbd instance has 
numjobs=8, iodepth=32)
. ~2068MB/s (256K random read, 4 cores used, each fio_rbd instance has 
numjobs=1, iodepth=32)
. ~4330MB/s (256K random read, 11 cores used, each fio_rbd instance has 
numjobs=8, iodepth=32)

2 OSDs, 1 client node, 4 rbd images on 4 separated pools
--------------------------------------------------------------------
4K random read: xio messenger max at ~272k iops, simple messenger max at 
~170k iops


4 OSDs, 1 client node, 4 rbd images on 4 separated pools
---------------------------------------------------------------------
4K random read: xio messenger max at ~355k iops, simple messenger max at 
~204k iops


8 OSDs, 1 client node, 4 rbd images on 4 separated pools
---------------------------------------------------------------------
4K random read: xio messenger max at ~355K iops, simple messenger max at 
~225k iops


I attach here the ceph configuration files that I used.
Please note that I enable flow-control and turn off header_crc & 
data_crc for both xio & simple.

Are the simple messenger numbers looking reasonable and in the ballpark?
Please share your numbers and configuration if you have higher numbers.

thanks,
-vu

[-- Attachment #2: ceph_perf_simple_1osd.conf --]
[-- Type: application/octet-stream, Size: 2465 bytes --]

[global]

	admin_socket = /ceph-test/var/run/ceph/ceph-$name.$id.asok

	; no secure authentication
	auth_supported = none
	auth_service_required = none
	auth_client_required = none
	auth_cluster_required = none

	; allow to open a lot of files
	max_open_files = 131072

	; setup logging
	log_file = /ceph-test/var/log/ceph/$name.log
	pid_file = /ceph-test/var/run/ceph/$name.pid

	filestore_xattr_use_omap = 1

	osd_pool_default_size = 1
	osd_pool_default_min_size = 1

	; turn off debugs
	; a
	debug_auth = 0/0
	debug_asok = 0/0

	; b
	debug_buffer = 0/0

	; c
	debug_client = 0/0
	debug_context = 0/0
	debug_crush = 0/0
	debug_crypto = 0/0

	; f
	debug_filer = 0/0
	debug_filestore = 0/0
	debug_finisher = 0/0

	; h
	debug_heartbeatmap = 0/0

	; j
	debug_journal = 0/0
	debug_journaler = 0/0

	; l
	debug_lockdep = 0/0

	; m
	debug_monclient = 0/0
	debug_mon = 0/0
	debug_monc = 0/0
	debug_ms = 0/0

	; o
	debug_objclass = 0/0
	debug_objecter = 0/0
	debug_objectcacher = 0/0
	debug_optracker = 0/0
	debug_osd = 0/0

	; p
	debug_paxos = 0/0
	debug_perfcounter = 0/0

	; r
	debug_rados = 0/0
	debug_rbd = 0/0
	debug_rgw = 0/0

	; t
	debug_timer = 0/0
	debug_tp = 0/0
	debug_throttle = 0/0

	; messenger type
	ms_type = simple

	ms_tcp_nodelay = true
        ms_crc_header = false
        ms_crc_data = false

	osd_op_threads = 2
	filesore_op_threads = 3
	filestore_fd_cache_size = 64
	filestore_fd_cache_shards = 16
	osd_op_num_threads_per_shard = 1
	osd_op_num_shards = 25

	throttle_perf_counter = false
	ms_dispatch_throttle_bytes = 0

	rbd_cache = false

[mon]
	mon_data = /ceph-test/ceph-data/mon.$id

[mon.0]
	host = vlab-017
	mon_addr = 12.20.1.117:16789
	;user = root

[mds]
	; where the mds keeps its secret encryption keys
	keyring = /ceph-test/ceph-data/keyring.$name

	cluster_addr = 12.20.1.117:26789
	public_addr = 12.20.1.117:36789

	objecter_timeout = 10
	mds_reconnect_timeout = 5
	mds_beacon_interval = 2

[mds.0]
	host = vlab-017

[osd]
	osd_client_message_size_cap = 0
	osd_client_message_cap = 0
	osd_enable_op_tracker = false

	osd_data = /ceph-test/ceph-data/osd.$id
	osd_journal = /ceph-test/ceph-data/osd.$id/journal
	osd_journal_size = 256
	osd_scrub_load_threshold = 2.5
	;osd_objectstore = memstore
	osd_mkfs_type = xfs
	osd_mount_options_xfs = rw,noatime

	osd_class_dir = /opt/ceph/lib/rados-classes
	ms_bind_port_min = 3000

[osd.0]
	host = vlab-017
	;user = root

	devs = /dev/ram0
	ms_bind_port_min = 7100
	ms_bind_port_max = 7200


[-- Attachment #3: ceph_perf_xio_1osd.conf --]
[-- Type: application/octet-stream, Size: 2765 bytes --]

[global]

	admin_socket = /ceph-test/var/run/ceph/ceph-$name.$id.asok

	; no secure authentication
	auth_supported = none
	auth_service_required = none
	auth_client_required = none
	auth_cluster_required = none

	; allow to open a lot of files
	max_open_files = 131072

	; setup logging
	log_file = /ceph-test/var/log/ceph/$name.log
	pid_file = /ceph-test/var/run/ceph/$name.pid

        ; default datacrc & headercrc is true
        ms_crc_header = false
        ms_crc_data = false

	filestore_xattr_use_omap = 1

	osd_pool_default_size = 1
	osd_pool_default_min_size = 1

	; turn off debugs
	; a
	debug_auth = 0/0
	debug_asok = 0/0

	; b
	debug_buffer = 0/0

	; c
	debug_client = 0/0
	debug_context = 0/0
	debug_crush = 0/0
	debug_crypto = 0/0

	; f
	debug_filer = 0/0
	debug_filestore = 0/0
	debug_finisher = 0/0

	; h
	debug_heartbeatmap = 0/0

	; j
	debug_journal = 0/0
	debug_journaler = 0/0

	; l
	debug_lockdep = 0/0

	; m
	debug_monclient = 0/0
	debug_mon = 0/0
	debug_monc = 0/0
	debug_ms = 0/0

	; o
	debug_objclass = 0/0
	debug_objecter = 0/0
	debug_objectcacher = 0/0
	debug_optracker = 0/0
	debug_osd = 0/0

	; p
	debug_paxos = 0/0
	debug_perfcounter = 0/0

	; r
	debug_rados = 0/0
	debug_rbd = 0/0
	debug_rgw = 0/0

	; t
	debug_timer = 0/0
	debug_tp = 0/0
	debug_throttle = 0/0

	; x
	debug_xio = 2

	; rdma setting
	rdma_local = 12.20.1.117
	enable experimental unrecoverable data corrupting features = ms-type-xio
	ms_type = xio
	;xio_queue_depth = 128
	xio_mp_max_64 = 262144
	xio_mp_max_256 = 262144
	xio_mp_max_1k = 262144
	xio_mp_max_page = 131072
	xio_portal_threads = 8

	osd_op_threads = 2
	filesore_op_threads = 3
	filestore_fd_cache_size = 64
	filestore_fd_cache_shards = 16

	osd_op_num_threads_per_shard = 1
	osd_op_num_shards = 25

	throttle_perf_counter = false
	ms_dispatch_throttle_bytes = 0

	rbd_cache = false

[osd]
	osd_client_message_size_cap = 0
	osd_client_message_cap = 0
	osd_enable_op_tracker = false

	osd_data = /ceph-test/ceph-data/osd.$id
	osd_journal = /ceph-test/ceph-data/osd.$id/journal
	osd_journal_size = 256
	osd_scrub_load_threshold = 2.5
	;osd_objectstore = memstore
	osd_mkfs_type = xfs
	osd_mount_options_xfs = rw,noatime

	osd_class_dir = /opt/ceph/lib/rados-classes
	ms_bind_port_min = 3000

[osd.0]
	host = vlab-017
	;user = root

	devs = /dev/ram0
	ms_bind_port_min = 7100
	ms_bind_port_max = 7400

[mon]
	mon_data = /ceph-test/ceph-data/mon.$id

[mon.0]
	host = vlab-017
	mon_addr = 12.20.1.117:16789
	;user = root

[mds]
	; where the mds keeps its secret encryption keys
	keyring = /ceph-test/ceph-data/keyring.$name

	cluster_addr = 12.20.1.117:26789
	public_addr = 12.20.1.117:36789

	objecter_timeout = 10
	mds_reconnect_timeout = 5
	mds_beacon_interval = 2

[mds.0]
	host = vlab-017
	;user = root

^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: Re[2]: xio messenger prelim benchmark
@ 2015-02-06  7:24 Alexandre DERUMIER
  2015-02-06 18:41 ` Vu Pham
  0 siblings, 1 reply; 6+ messages in thread
From: Alexandre DERUMIER @ 2015-02-06  7:24 UTC (permalink / raw)
  To: Vu Pham; +Cc: ceph-devel

>>Yes, xio messenger which is implemented over Accelio can run over rdma 
>>transport (Infiniband, RoCE) and TCP. Please note that we have not 
>>enabled xio messenger / Accelio-tcp yet. 

Oh, ok, Great !


>>xio messenger is currently working with user mode clients. We only 
>>validated/tested user mode rbd client. 
>>Sandisk are working on krbd over kAccelio implementation. As of last 
>>week, SanDisk have basic I/Os working. 
>>Hopefully krbd/kAccelio will be available soon. 

Great, so we can expect even better results :)


BTW, do you have some client side cpu usage benchmark ?



----- Mail original -----
De: "Vu Pham" <vuhuong@mellanox.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Jeudi 5 Février 2015 19:15:21
Objet: Re[2]: xio messenger prelim benchmark

>Hi, 
> 
>I'm going to use same mellanox switchs in production next month, 
> 
>Isn't them ethernet switchs ? 
>How xio messenger is working, if they are not infiband ? 
> 
>Maybe with RoCE ? 
> 

Yes, xio messenger which is implemented over Accelio can run over rdma 
transport (Infiniband, RoCE) and TCP. Please note that we have not 
enabled xio messenger / Accelio-tcp yet. 

> 
>I'll try to benchmark xio messenger too. 
> 
Great. Let us know. 

> 
> 
>BTW, do you have tried to bench with fio + krbd instead fio-librbd ? 
>I think you should have better results with numjobs=1. 
> 
> 
xio messenger is currently working with user mode clients. We only 
validated/tested user mode rbd client. 
Sandisk are working on krbd over kAccelio implementation. As of last 
week, SanDisk have basic I/Os working. 
Hopefully krbd/kAccelio will be available soon. 

@Mark, I'll try out disabling auth, in-memory debugging & RHEL. 

-vu 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread
[parent not found: <CAJCPpWJkbqej6G6bxeW5m2JbCLpJerbR5_231EG33eC5JuoiAg@mail.gmail.com>]

end of thread, other threads:[~2015-02-07  6:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-04 20:08 xio messenger prelim benchmark Vu Pham
2015-02-05  2:08 ` Mark Nelson
  -- strict thread matches above, loose matches on Subject: below --
2015-02-06  7:24 Re[2]: " Alexandre DERUMIER
2015-02-06 18:41 ` Vu Pham
2015-02-06 20:56   ` Vu Pham
2015-02-06 23:30   ` Vu Pham
     [not found] <CAJCPpWJkbqej6G6bxeW5m2JbCLpJerbR5_231EG33eC5JuoiAg@mail.gmail.com>
2015-02-07  6:49 ` Vu Pham

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.