From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============8345658633346611957==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] NVMf Target Date: Thu, 16 Feb 2017 18:05:19 +0000 Message-ID: <1487268317.38737.12.camel@intel.com> In-Reply-To: 9A947B1F6D0FF74CB5D6ACFF43B473F3A3AF6137@ORSMSX112.amr.corp.intel.com List-ID: To: spdk@lists.01.org --===============8345658633346611957== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Thu, 2017-02-16 at 16:35 +0000, Minturn, Dave B wrote: > I think you guys are talking about two different types of completion > queues; NVMe CQs and RDMA CQs.=C2=A0=C2=A0=C2=A0John is correct regarding= the NVMe > SQ/CQ pairing and that there is a 1x1 mapping of the NVMe SQ/CQ pair > and the RDMA QP. Dave is right (also Dave is one of the primary authors of the NVMe-oF specification, to provide context for everyone on the list). There are at least 3 different completion queues you could be talking about. There are the completion queues to the NVMe device, the completion queues as defined by NVMe-oF (which is by definition the receive side of an RDMA queue pair), and the RDMA completion queue which is a side- channel notification of events on the RDMA queue pair. I'll address all three just for posterity. 1) We explicitly choose to allocate NVMe submission and completion queues in pairs inside SPDK's NVMe driver. Our queues are lockless and it is universally the case that a completion must look up the original request, so unless that completion is executing on the same thread as the submission, that would require some sort of thread-safe data structure. Every known NVMe driver makes this same choice, and this was always the expected primary use case in NVMe when the spec was designed, so I don't think this is controversial. 2) The NVMe-oF specification requires that NVMe-oF submission and completion queues are allocated in pairs. This is a stronger requirement than the base NVMe specification, but in reality everyone was doing this at the NVMe level anyway, so again this was not a controversial choice in the NVMe-oF specification. Specifically, an RDMA queue pair is a 1:1 mapping with an NVMe-oF queue pair. 3) The SPDK NVMe-oF target also chooses to create one RDMA completion queue per RDMA queue pair. I think this is the completion queue you are talking about. In order to be fully lockless, SPDK must lay out its data structures very carefully. Specifically, we choose to process all connections that belong to the same subsystem on a single core. In practice, a single core is plenty fast to saturate any device backing a subsystem with lots of spare overhead. In fact, a single core can often do many subsystems on a single core prior to saturating. An NVMe-oF subsystem is the largest unit of shared state, so choosing to do all processing on a single core means that we need to no locks. In an ideal world, we'd create 1 RDMA completion queue per subsystem (or really, per NIC per subsystem). That would enable our code to poll a single completion queue to be notified of all events on all RDMA queue pairs for a given subsystem. RDMA requires us to select the completion queue to be used when the RDMA queue pair is created - prior to receiving the initial CONNECT message. Unfortunately, at that point we cannot deduce which subsystem that RDMA queue pair belongs to. There just isn't enough information. The only options today are to create an independent completion queue for each RDMA queue pair (what we do), or make a set of global ones but take locks to protect shared state when completions occur on disparate subsystems (what the Linux kernel does).The only way to fix this is to make a change to the NVMe-oF specification to provide additional information upon establishment of a new connection. I hope that makes things crystal clear. As the specification evolves we'll of course change our model to always be the most efficient one possible. > = > RDMA CQ's and their associated mappings to the RDMA QP's is > implementation specific.=C2=A0=C2=A0=C2=A0Think of RDMA CQ's as the mecha= nism used > to signal RDMA completion events. > ..Dave > = > > > -----Original Message----- > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of > > > Kariuki, John K > > > Sent: Thursday, February 16, 2017 8:26 AM > > > To: Storage Performance Development Kit > > > Subject: Re: [SPDK] NVMf Target > > > = > > > Param > > > Per the NVM Express over Fabrics 1.0 spec section 1.2 "There is a > > > 1:1 > > > mapping of a single Submission Queue to a single Completion > > > Queue. NVMe > > > over Fabrics does not support the mapping of Multiple Submission > > > Queues to > > > a single Completion Queue" > > > = > > > -----Original Message----- > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of > > > Kumaraparameshwaran Rathnavel > > > Sent: Wednesday, February 15, 2017 8:27 PM > > > To: spdk(a)lists.01.org > > > Subject: [SPDK] NVMf Target > > > = > > > = > > > Hi All, > > > = > > > Why are we not using the same completion queue for multiple queue > > > pairs. > > > Whenever we create a queue pair , I see that a completion queue > > > is also > > > created. But completion queue can be shared between queue pairs. > > > Will > > > Using shared completion queue impact the performance? > > > = > > > Thanking you, > > > Param. > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > = > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk --===============8345658633346611957== Content-Type: application/x-pkcs7-signature MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIKdTCCBOsw ggPToAMCAQICEFLpAsoR6ESdlGU4L6MaMLswDQYJKoZIhvcNAQEFBQAwbzELMAkGA1UEBhMCU0Ux FDASBgNVBAoTC0FkZFRydXN0IEFCMSYwJAYDVQQLEx1BZGRUcnVzdCBFeHRlcm5hbCBUVFAgTmV0 d29yazEiMCAGA1UEAxMZQWRkVHJ1c3QgRXh0ZXJuYWwgQ0EgUm9vdDAeFw0xMzAzMTkwMDAwMDBa Fw0yMDA1MzAxMDQ4MzhaMHkxCzAJBgNVBAYTAlVTMQswCQYDVQQIEwJDQTEUMBIGA1UEBxMLU2Fu dGEgQ2xhcmExGjAYBgNVBAoTEUludGVsIENvcnBvcmF0aW9uMSswKQYDVQQDEyJJbnRlbCBFeHRl cm5hbCBCYXNpYyBJc3N1aW5nIENBIDRBMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA 4LDMgJ3YSVX6A9sE+jjH3b+F3Xa86z3LLKu/6WvjIdvUbxnoz2qnvl9UKQI3sE1zURQxrfgvtP0b Pgt1uDwAfLc6H5eqnyi+7FrPsTGCR4gwDmq1WkTQgNDNXUgb71e9/6sfq+WfCDpi8ScaglyLCRp7 ph/V60cbitBvnZFelKCDBh332S6KG3bAdnNGB/vk86bwDlY6omDs6/RsfNwzQVwo/M3oPrux6y6z yIoRulfkVENbM0/9RrzQOlyK4W5Vk4EEsfW2jlCV4W83QKqRccAKIUxw2q/HoHVPbbETrrLmE6RR Z/+eWlkGWl+mtx42HOgOmX0BRdTRo9vH7yeBowIDAQABo4IBdzCCAXMwHwYDVR0jBBgwFoAUrb2Y ejS0Jvf6xCZU7wO94CTLVBowHQYDVR0OBBYEFB5pKrTcKP5HGE4hCz+8rBEv8Jj1MA4GA1UdDwEB /wQEAwIBhjASBgNVHRMBAf8ECDAGAQH/AgEAMDYGA1UdJQQvMC0GCCsGAQUFBwMEBgorBgEEAYI3 CgMEBgorBgEEAYI3CgMMBgkrBgEEAYI3FQUwFwYDVR0gBBAwDjAMBgoqhkiG+E0BBQFpMEkGA1Ud HwRCMEAwPqA8oDqGOGh0dHA6Ly9jcmwudHJ1c3QtcHJvdmlkZXIuY29tL0FkZFRydXN0RXh0ZXJu YWxDQVJvb3QuY3JsMDoGCCsGAQUFBwEBBC4wLDAqBggrBgEFBQcwAYYeaHR0cDovL29jc3AudHJ1 c3QtcHJvdmlkZXIuY29tMDUGA1UdHgQuMCygKjALgQlpbnRlbC5jb20wG6AZBgorBgEEAYI3FAID oAsMCWludGVsLmNvbTANBgkqhkiG9w0BAQUFAAOCAQEAKcLNo/2So1Jnoi8G7W5Q6FSPq1fmyKW3 sSDf1amvyHkjEgd25n7MKRHGEmRxxoziPKpcmbfXYU+J0g560nCo5gPF78Wd7ZmzcmCcm1UFFfIx fw6QA19bRpTC8bMMaSSEl8y39Pgwa+HENmoPZsM63DdZ6ziDnPqcSbcfYs8qd/m5d22rpXq5IGVU tX6LX7R/hSSw/3sfATnBLgiJtilVyY7OGGmYKCAS2I04itvSS1WtecXTt9OZDyNbl7LtObBrgMLh ZkpJW+pOR9f3h5VG2S5uKkA7Th9NC9EoScdwQCAIw+UWKbSQ0Isj2UFL7fHKvmqWKVTL98sRzvI3 seNC4DCCBYIwggRqoAMCAQICEzMAAIu5Kz5Fe8d0qN0AAAAAi7kwDQYJKoZIhvcNAQEFBQAweTEL MAkGA1UEBhMCVVMxCzAJBgNVBAgTAkNBMRQwEgYDVQQHEwtTYW50YSBDbGFyYTEaMBgGA1UEChMR SW50ZWwgQ29ycG9yYXRpb24xKzApBgNVBAMTIkludGVsIEV4dGVybmFsIEJhc2ljIElzc3Vpbmcg Q0EgNEEwHhcNMTcwMTA5MjEyMzU4WhcNMTgwMTA0MjEyMzU4WjBFMRkwFwYDVQQDExBXYWxrZXIs IEJlbmphbWluMSgwJgYJKoZIhvcNAQkBFhliZW5qYW1pbi53YWxrZXJAaW50ZWwuY29tMIIBIjAN BgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxFugJYk4Vd/Yvdmr8BdnGDdCkN1bc1KNCAQBhzC/ BWXw5nxpXWMYFBkTxahM78PtuwdtPDFqoHsMNEaX0miWeYjB6zKbKl7y0LEsSxlu9wjllEdWTYOP 9/m3UC0oITDn7L01adbsD5Sin6W1FMmjcBVrD51oy2orpwfvan3TNVRRQxt8dQz38hivXnona5tt toi+V8ved7o251HApvEwW7QtDfdML+RmBKBSf0MzGjZHPzoBfRrsBUZ0yRHJxlkYNeY99EAUUHwT npsySQSf0cxLmvA6/a4qPOUSitHit+cJQ58/EOt6PLrPGAbdu5sz9O+Iv+FUJakwUtg0sAY4RQID AQABo4ICNTCCAjEwHQYDVR0OBBYEFAU2hsr+3sx/M5e5WafmYD18VvX1MB8GA1UdIwQYMBaAFB5p KrTcKP5HGE4hCz+8rBEv8Jj1MGUGA1UdHwReMFwwWqBYoFaGVGh0dHA6Ly93d3cuaW50ZWwuY29t L3JlcG9zaXRvcnkvQ1JML0ludGVsJTIwRXh0ZXJuYWwlMjBCYXNpYyUyMElzc3VpbmclMjBDQSUy MDRBLmNybDCBnwYIKwYBBQUHAQEEgZIwgY8waQYIKwYBBQUHMAKGXWh0dHA6Ly93d3cuaW50ZWwu Y29tL3JlcG9zaXRvcnkvY2VydGlmaWNhdGVzL0ludGVsJTIwRXh0ZXJuYWwlMjBCYXNpYyUyMElz c3VpbmclMjBDQSUyMDRBLmNydDAiBggrBgEFBQcwAYYWaHR0cDovL29jc3AuaW50ZWwuY29tLzAL BgNVHQ8EBAMCB4AwPAYJKwYBBAGCNxUHBC8wLQYlKwYBBAGCNxUIhsOMdYSZ5VGD/YEohY6fU4KR wAlngd69OZXwQwIBZAIBCTAfBgNVHSUEGDAWBggrBgEFBQcDBAYKKwYBBAGCNwoDDDApBgkrBgEE AYI3FQoEHDAaMAoGCCsGAQUFBwMEMAwGCisGAQQBgjcKAwwwTwYDVR0RBEgwRqApBgorBgEEAYI3 FAIDoBsMGWJlbmphbWluLndhbGtlckBpbnRlbC5jb22BGWJlbmphbWluLndhbGtlckBpbnRlbC5j b20wDQYJKoZIhvcNAQEFBQADggEBAMQUzXgrfwDLl92M7wNqp24Xe1poeurJ8YVAy5a2UukwC/uX uXE8Duoz2jMJL90QETn17H7EQQu1J7kc059H6GyDU42MkzPA3mqZQimrTgOaalPXxWXoVl/UUoLB PJZXGF3Ef1p8b1UVdSnZZ8wTD/QTUw7UhgljKZ1td/raLV1h96x6lKCVkZ0UKU8be5M3FHQ/GZJ9 CgUjvN0m2mYOUHDkNzsUTJb4bsV7vZDa3zixm4Gxu2F/uq328AEJ6JJmXA+jjFOzQ0FI8sa7XOSR 1UPvZSrwyA00M/zFZaDTln+sFPFNseYYGYFU7P711D8Wj1Hv1V/C2G4rSRBJG5f1WF8xggIXMIIC EwIBATCBkDB5MQswCQYDVQQGEwJVUzELMAkGA1UECBMCQ0ExFDASBgNVBAcTC1NhbnRhIENsYXJh MRowGAYDVQQKExFJbnRlbCBDb3Jwb3JhdGlvbjErMCkGA1UEAxMiSW50ZWwgRXh0ZXJuYWwgQmFz aWMgSXNzdWluZyBDQSA0QQITMwAAi7krPkV7x3So3QAAAACLuTAJBgUrDgMCGgUAoF0wGAYJKoZI hvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTcwMjE2MTgwNTE3WjAjBgkqhkiG 9w0BCQQxFgQUBwjEqaSsYZYbHGnB/dLY//36ObEwDQYJKoZIhvcNAQEBBQAEggEAhcnQHrqjUyb9 +1gpfCRwv98GuE81mDRb0E5zDW/DZSvhuXLSoEWtlPbL3431WN107omC4yhVGICZ8TJJaQLrxGVl 7R+eF5WXngk1gcfWuRgtR9k9Fh7PlGuAhSFcgo72W87lhj2gJmNl0tKgoGtVfbdEhRABBWQvSIpz kgvqbINka0YXaB1rGbfVmjyrdbU90fzO4GviAVhH2NEIjoW3zMONYyeAfVGEEUXvp+7OEnfB231Z wc3aD3kpRjgUB76vwNwdS6mK0arBIA6ee7yseQbaTJLMJ7h5pvWwPLFMSbYs4cOXCi7ZUkaZuLdu oBLNN2WxDEH6T6kVBlYtBNbo7gAAAAAAAA== --===============8345658633346611957==--