From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============8159719912823126149==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] Ceph/Bluestore SPDK based backend? Date: Tue, 07 Feb 2017 18:54:53 +0000 Message-ID: <1486493691.22338.1.camel@intel.com> In-Reply-To: 9ee92422-bd3f-b9dc-924d-7576abb4e052@gmail.com List-ID: To: spdk@lists.01.org --===============8159719912823126149== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Tue, 2017-02-07 at 19:20 +0100, Tobias Oberstein wrote: > Hi Nate, > = > Am 07.02.2017 um 14:03 schrieb Marushak, Nathan: > > Hi Tobias, > > = > > There has been some work done in Bluestore for this. If you search > > "SPDK Bluestore" or something similar you'll see some links. > = > I was trying to find conclusive info on the net before - with no=C2=A0 > definite result though, eg, after reading (collegue of yours): > = > Accelerate Ceph via SPDK > = > http://7xweck.com1.z0.glb.clouddn.com/cephdaybeijing201608/04-SPDK%E5 > %8A%A0%E9%80%9FCeph- > XSKY%20Bluestore%E6%A1%88%E4%BE%8B%E5%88%86%E4%BA%AB- > %E6%89%AC%E5%AD%90%E5%A4%9C-%E7%8E%8B%E8%B1%AA%E8%BF%88.pdf > = > My understanding is: > = > Bluestore seems to introduce a proper block device abstraction > within=C2=A0 > the Ceph OSD implementation. > = > And this new OSD internal block device abstraction is implemented > for=C2=A0 > one, over regular Linux block devices (already a step forward from > being=C2=A0 > forced to shuffle everything through a filesystem). Correct - Bluestore is a highly simplified user space filesystem. > = > But what I couldn't find in above or on the net: is there a SPDK > backed=C2=A0 > implementation of this new Bluestore OSD block device abstraction? > = > Do you have a link for me? I really tried to find it .. Here is a link to the actual code: https://github.com/ceph/ceph/blob/master/src/os/bluestore/NVMEDevice.cc This was not implemented by the SPDK team and I don't know what state it is in, but it is definitely there. > = > > The impact to performance of Ceph was somewhat limited however. > > There are bottlenecks in the Ceph OSD. > = > Ok=3D( Any public avail info on that? I don't have the actual numbers on hand, but it was a small improvement only. I'm speculating, but I can think of a number of problems in the above implementation that will limit performance. The biggest problem is that Ceph still relies on buffered I/O in a number of cases, but the SPDK implementation doesn't do any caching. Caching is of course the single most important aspect of storage performance. The above implementation also copies memory for every read and write into DMA- able buffers because Ceph doesn't allocate buffers from DMA-able memory by default. To fix that, Ceph would need to either make its memory manager pluggable as well, or just use SPDK/DPDK throughout for all data buffer allocations. Third, Ceph still does some blocking I/O in certain cases, and blocking I/O with SPDK, given there is no caching, is probably slower than the kernel. > = > In general: having a SPDK+DPDK backed implementation of Ceph/OSD > seems=C2=A0 > highly desirable with potentially big impact .. not? I think there is room to make it far faster than it is today using SPDK/DPDK, but it would take a much more dramatic set of changes to the structure of the OSD to actually realize the benefit. The whole OSD would probably need to be rewritten to do one thread per core with message passing and entirely asynchronous network and storage stacks. That's effectively a brand new OSD. > = > Thanks for your reply! > Cheers, > /Tobias > = > > = > > Thanks, > > Nate > > = > > On Feb 7, 2017, at 5:20 AM, Andrey Kuzmin > m> wrote: > > = > > Not that I know of, and likely because it belongs to Ceph, not > > SPDK. SPDK goal is to enable applications to utilize NVMe flash > > more efficiently, not to provide a backend for each and every > > application out there. > > = > > Regards, > > Andrey > > = > > On Feb 7, 2017 14:03, "Tobias Oberstein" > m> wrote: > > Hi, > > = > > the 16.2 release added a Ceph RBD block device as a backend for > > SPDK applications. I am wondering about the inverse? > > = > > As in: having Ceph RBD OSDs use SPDK to use NVMe flash as > > underlying block storage. > > = > > There seems to be efforts with Ceph/Bluestore > > = > > http://www.slideshare.net/sageweil1/bluestore-a-new-faster-storage- > > backend-for-ceph > > = > > to allow OSDs use raw block devices as underlying storage (instead > > of Filestore, which shuffles everything through a filesystem). > > = > > So put differently: is there a Ceph/Bluestore block device > > implementation using SPDK? > > = > > Cheers, > > /Tobias > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > = > > = > > = > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > = > = > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk --===============8159719912823126149== Content-Type: application/x-pkcs7-signature MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIKdTCCBOsw ggPToAMCAQICEFLpAsoR6ESdlGU4L6MaMLswDQYJKoZIhvcNAQEFBQAwbzELMAkGA1UEBhMCU0Ux FDASBgNVBAoTC0FkZFRydXN0IEFCMSYwJAYDVQQLEx1BZGRUcnVzdCBFeHRlcm5hbCBUVFAgTmV0 d29yazEiMCAGA1UEAxMZQWRkVHJ1c3QgRXh0ZXJuYWwgQ0EgUm9vdDAeFw0xMzAzMTkwMDAwMDBa Fw0yMDA1MzAxMDQ4MzhaMHkxCzAJBgNVBAYTAlVTMQswCQYDVQQIEwJDQTEUMBIGA1UEBxMLU2Fu dGEgQ2xhcmExGjAYBgNVBAoTEUludGVsIENvcnBvcmF0aW9uMSswKQYDVQQDEyJJbnRlbCBFeHRl cm5hbCBCYXNpYyBJc3N1aW5nIENBIDRBMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA 4LDMgJ3YSVX6A9sE+jjH3b+F3Xa86z3LLKu/6WvjIdvUbxnoz2qnvl9UKQI3sE1zURQxrfgvtP0b Pgt1uDwAfLc6H5eqnyi+7FrPsTGCR4gwDmq1WkTQgNDNXUgb71e9/6sfq+WfCDpi8ScaglyLCRp7 ph/V60cbitBvnZFelKCDBh332S6KG3bAdnNGB/vk86bwDlY6omDs6/RsfNwzQVwo/M3oPrux6y6z yIoRulfkVENbM0/9RrzQOlyK4W5Vk4EEsfW2jlCV4W83QKqRccAKIUxw2q/HoHVPbbETrrLmE6RR Z/+eWlkGWl+mtx42HOgOmX0BRdTRo9vH7yeBowIDAQABo4IBdzCCAXMwHwYDVR0jBBgwFoAUrb2Y ejS0Jvf6xCZU7wO94CTLVBowHQYDVR0OBBYEFB5pKrTcKP5HGE4hCz+8rBEv8Jj1MA4GA1UdDwEB /wQEAwIBhjASBgNVHRMBAf8ECDAGAQH/AgEAMDYGA1UdJQQvMC0GCCsGAQUFBwMEBgorBgEEAYI3 CgMEBgorBgEEAYI3CgMMBgkrBgEEAYI3FQUwFwYDVR0gBBAwDjAMBgoqhkiG+E0BBQFpMEkGA1Ud HwRCMEAwPqA8oDqGOGh0dHA6Ly9jcmwudHJ1c3QtcHJvdmlkZXIuY29tL0FkZFRydXN0RXh0ZXJu YWxDQVJvb3QuY3JsMDoGCCsGAQUFBwEBBC4wLDAqBggrBgEFBQcwAYYeaHR0cDovL29jc3AudHJ1 c3QtcHJvdmlkZXIuY29tMDUGA1UdHgQuMCygKjALgQlpbnRlbC5jb20wG6AZBgorBgEEAYI3FAID oAsMCWludGVsLmNvbTANBgkqhkiG9w0BAQUFAAOCAQEAKcLNo/2So1Jnoi8G7W5Q6FSPq1fmyKW3 sSDf1amvyHkjEgd25n7MKRHGEmRxxoziPKpcmbfXYU+J0g560nCo5gPF78Wd7ZmzcmCcm1UFFfIx fw6QA19bRpTC8bMMaSSEl8y39Pgwa+HENmoPZsM63DdZ6ziDnPqcSbcfYs8qd/m5d22rpXq5IGVU tX6LX7R/hSSw/3sfATnBLgiJtilVyY7OGGmYKCAS2I04itvSS1WtecXTt9OZDyNbl7LtObBrgMLh ZkpJW+pOR9f3h5VG2S5uKkA7Th9NC9EoScdwQCAIw+UWKbSQ0Isj2UFL7fHKvmqWKVTL98sRzvI3 seNC4DCCBYIwggRqoAMCAQICEzMAAIu5Kz5Fe8d0qN0AAAAAi7kwDQYJKoZIhvcNAQEFBQAweTEL MAkGA1UEBhMCVVMxCzAJBgNVBAgTAkNBMRQwEgYDVQQHEwtTYW50YSBDbGFyYTEaMBgGA1UEChMR SW50ZWwgQ29ycG9yYXRpb24xKzApBgNVBAMTIkludGVsIEV4dGVybmFsIEJhc2ljIElzc3Vpbmcg Q0EgNEEwHhcNMTcwMTA5MjEyMzU4WhcNMTgwMTA0MjEyMzU4WjBFMRkwFwYDVQQDExBXYWxrZXIs IEJlbmphbWluMSgwJgYJKoZIhvcNAQkBFhliZW5qYW1pbi53YWxrZXJAaW50ZWwuY29tMIIBIjAN BgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxFugJYk4Vd/Yvdmr8BdnGDdCkN1bc1KNCAQBhzC/ BWXw5nxpXWMYFBkTxahM78PtuwdtPDFqoHsMNEaX0miWeYjB6zKbKl7y0LEsSxlu9wjllEdWTYOP 9/m3UC0oITDn7L01adbsD5Sin6W1FMmjcBVrD51oy2orpwfvan3TNVRRQxt8dQz38hivXnona5tt toi+V8ved7o251HApvEwW7QtDfdML+RmBKBSf0MzGjZHPzoBfRrsBUZ0yRHJxlkYNeY99EAUUHwT npsySQSf0cxLmvA6/a4qPOUSitHit+cJQ58/EOt6PLrPGAbdu5sz9O+Iv+FUJakwUtg0sAY4RQID AQABo4ICNTCCAjEwHQYDVR0OBBYEFAU2hsr+3sx/M5e5WafmYD18VvX1MB8GA1UdIwQYMBaAFB5p KrTcKP5HGE4hCz+8rBEv8Jj1MGUGA1UdHwReMFwwWqBYoFaGVGh0dHA6Ly93d3cuaW50ZWwuY29t L3JlcG9zaXRvcnkvQ1JML0ludGVsJTIwRXh0ZXJuYWwlMjBCYXNpYyUyMElzc3VpbmclMjBDQSUy MDRBLmNybDCBnwYIKwYBBQUHAQEEgZIwgY8waQYIKwYBBQUHMAKGXWh0dHA6Ly93d3cuaW50ZWwu Y29tL3JlcG9zaXRvcnkvY2VydGlmaWNhdGVzL0ludGVsJTIwRXh0ZXJuYWwlMjBCYXNpYyUyMElz c3VpbmclMjBDQSUyMDRBLmNydDAiBggrBgEFBQcwAYYWaHR0cDovL29jc3AuaW50ZWwuY29tLzAL BgNVHQ8EBAMCB4AwPAYJKwYBBAGCNxUHBC8wLQYlKwYBBAGCNxUIhsOMdYSZ5VGD/YEohY6fU4KR wAlngd69OZXwQwIBZAIBCTAfBgNVHSUEGDAWBggrBgEFBQcDBAYKKwYBBAGCNwoDDDApBgkrBgEE AYI3FQoEHDAaMAoGCCsGAQUFBwMEMAwGCisGAQQBgjcKAwwwTwYDVR0RBEgwRqApBgorBgEEAYI3 FAIDoBsMGWJlbmphbWluLndhbGtlckBpbnRlbC5jb22BGWJlbmphbWluLndhbGtlckBpbnRlbC5j b20wDQYJKoZIhvcNAQEFBQADggEBAMQUzXgrfwDLl92M7wNqp24Xe1poeurJ8YVAy5a2UukwC/uX uXE8Duoz2jMJL90QETn17H7EQQu1J7kc059H6GyDU42MkzPA3mqZQimrTgOaalPXxWXoVl/UUoLB PJZXGF3Ef1p8b1UVdSnZZ8wTD/QTUw7UhgljKZ1td/raLV1h96x6lKCVkZ0UKU8be5M3FHQ/GZJ9 CgUjvN0m2mYOUHDkNzsUTJb4bsV7vZDa3zixm4Gxu2F/uq328AEJ6JJmXA+jjFOzQ0FI8sa7XOSR 1UPvZSrwyA00M/zFZaDTln+sFPFNseYYGYFU7P711D8Wj1Hv1V/C2G4rSRBJG5f1WF8xggIXMIIC EwIBATCBkDB5MQswCQYDVQQGEwJVUzELMAkGA1UECBMCQ0ExFDASBgNVBAcTC1NhbnRhIENsYXJh MRowGAYDVQQKExFJbnRlbCBDb3Jwb3JhdGlvbjErMCkGA1UEAxMiSW50ZWwgRXh0ZXJuYWwgQmFz aWMgSXNzdWluZyBDQSA0QQITMwAAi7krPkV7x3So3QAAAACLuTAJBgUrDgMCGgUAoF0wGAYJKoZI hvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTcwMjA3MTg1NDUxWjAjBgkqhkiG 9w0BCQQxFgQU2+cWpBWm1eJS0UyD7Cldfx2iIE4wDQYJKoZIhvcNAQEBBQAEggEAN9nfum4HQ9BB yh0bPAhucNUmCByB9NNscyFiJJrT2LaXsVZVmGg98vTkiWhzrAnBTwbPYgya4vCntMgBpW5aec9T 1NLmdxMMLLDgMC5IkwkMYa0s5IHBmwR1k25/ZRhLX1/nRbiuUN7/HLCNdZxTfWTJ2pBiWMcNED+3 0YC1rbsWlTkfyfhU1kE/5UGtdE2TYSBFZEe84KjA/VE9k77qat0fOkh/K+iSt+qIhOX2+Ksbu7rW cuu7mp6mFxuvELGODGiCma8CTDoS+B2TXlHres12ggKg/bWW4HQj+i+QIJYHbt48imWDmfx5DAcz o2HcPbNL0HH3ul8/l64FJF3PagAAAAAAAA== --===============8159719912823126149==--