From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============6624916556900252254==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively. Date: Thu, 03 Aug 2017 23:36:07 +0000 Message-ID: <1501803363.67512.3.camel@intel.com> In-Reply-To: 1f6ed155.e7db.15da2a6fb37.Coremail.cjj25233@163.com List-ID: To: spdk@lists.01.org --===============6624916556900252254== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, 2017-08-02 at 19:13 +0800, =E5=82=A8 wrote: > Answers: > (1) "access" =3D write. We experiment read and write operations respecti= vely, = > but only find the strange phenomenon in the writing experiments. > The comparison of experiments can be seen in accessories. > (2) We use a NAND based SSD, Intel P3608. > (3) The result presented in accessories is produced with no delay. = > We try to set "sleep(1)" between the two operations, but it seems d= oes > not work. > = > At 2017-08-02 07:49:13, "Walker, Benjamin" = wrote: > > Hi Jiajia, > > = > > I have a bunch of questions that will help me figure out what you are > > seeing. = > > = > > 1) When you say "access", do you mean read or write? The behavior of th= ese > > two operations is quite different. > > 2) Are you using a NAND based or 3D XPoint based SSD? These again work > > entirely differently. > > 3) When you access the same block repeatedly, what's the delay between = each > > access? None? I was able to verify the behavior you are seeing. I'm afraid I'm not going = to be able to give you an exact answer for your particular device - I don't have insight into the specifics of how each SSD is implemented. I brainstormed w= ith a few of my colleagues though, so what I can do is give you some idea of what= is happening inside of the device that will make it clear why writing to the s= ame block over and over may cause performance problems. A good mental model for an SSD is basically a log of (LBA, data) pairs. Whe= n you write to any LBA, it just appends to the end of the log and updates an inte= rnal map of the location of that LBA. It does this appending by buffering several writes into RAM located on the SSD, then it sends that batch of data to the= NAND all at once. The other important understanding is that the SSD is composed = of a large number of physical NAND dies, with some number of entirely parallel N= AND channels that can handle writes. Writing to the log sends the batched data = to each channel more or less round-robin. The final thing to remember is that = this whole process is implemented in hardware, not software, so adding things li= ke coordination between parallel operations is not as simple as just adding a = lock. When you write the same LBA over and over, a few things could happen inside= the SSD (I don't know how your SSD specifically works). = One possibility is that the SSD could see that the LBA is already buffered = in memory from a previous write and it could just update that memory. However,= that doesn't actually work in general. The data in that memory buffer may be currently in use as part of a write to actual NAND, or may even be currently being read. So the only option is to append to the end of the log for each = new write to the LBA. This could probably be coordinated with locking in softwa= re, but remember that the SSD controller is implemented in hardware. If handling this case makes the design far more complex, it may not be possible given p= ower, latency, and other budgets. Another possibility is that the data is appended to the log for each write = just like any other I/O. However, it is still more complicated than the case whe= re random LBAs are being written to. Once one buffer is filled up, a write to = NAND is issued. When that write completes, it has to update the map for the loca= tion of the LBA. If, while that write is outstanding, another buffer fills up wi= th new writes to the same LBA, the device has to figure out what to do. If it submits the second NAND write to a new channel, it's then effectively racing against the first write. If they complete out of order, the user will end up with stale data. This case could also probably be handled by better coordin= ation on the completion side, but again there is a complexity trade off when implementing this in actual hardware. = The easiest solution is probably to just detect if a NAND write is active f= or an LBA in a given buffer, and then just queue up the next write until the one before it finishes. That adds potentially a lot of latency, but it simplifi= es the hardware design considerably. Ultimately, I have no idea what that SSD is actually doing, but you can see= that it's fairly complex to handle this case. It is certainly more complex than handling random I/O. I hope that helps, Ben --===============6624916556900252254== Content-Type: application/x-pkcs7-signature MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIKdTCCBOsw ggPToAMCAQICEFLpAsoR6ESdlGU4L6MaMLswDQYJKoZIhvcNAQEFBQAwbzELMAkGA1UEBhMCU0Ux FDASBgNVBAoTC0FkZFRydXN0IEFCMSYwJAYDVQQLEx1BZGRUcnVzdCBFeHRlcm5hbCBUVFAgTmV0 d29yazEiMCAGA1UEAxMZQWRkVHJ1c3QgRXh0ZXJuYWwgQ0EgUm9vdDAeFw0xMzAzMTkwMDAwMDBa Fw0yMDA1MzAxMDQ4MzhaMHkxCzAJBgNVBAYTAlVTMQswCQYDVQQIEwJDQTEUMBIGA1UEBxMLU2Fu dGEgQ2xhcmExGjAYBgNVBAoTEUludGVsIENvcnBvcmF0aW9uMSswKQYDVQQDEyJJbnRlbCBFeHRl cm5hbCBCYXNpYyBJc3N1aW5nIENBIDRBMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA 4LDMgJ3YSVX6A9sE+jjH3b+F3Xa86z3LLKu/6WvjIdvUbxnoz2qnvl9UKQI3sE1zURQxrfgvtP0b Pgt1uDwAfLc6H5eqnyi+7FrPsTGCR4gwDmq1WkTQgNDNXUgb71e9/6sfq+WfCDpi8ScaglyLCRp7 ph/V60cbitBvnZFelKCDBh332S6KG3bAdnNGB/vk86bwDlY6omDs6/RsfNwzQVwo/M3oPrux6y6z yIoRulfkVENbM0/9RrzQOlyK4W5Vk4EEsfW2jlCV4W83QKqRccAKIUxw2q/HoHVPbbETrrLmE6RR Z/+eWlkGWl+mtx42HOgOmX0BRdTRo9vH7yeBowIDAQABo4IBdzCCAXMwHwYDVR0jBBgwFoAUrb2Y ejS0Jvf6xCZU7wO94CTLVBowHQYDVR0OBBYEFB5pKrTcKP5HGE4hCz+8rBEv8Jj1MA4GA1UdDwEB /wQEAwIBhjASBgNVHRMBAf8ECDAGAQH/AgEAMDYGA1UdJQQvMC0GCCsGAQUFBwMEBgorBgEEAYI3 CgMEBgorBgEEAYI3CgMMBgkrBgEEAYI3FQUwFwYDVR0gBBAwDjAMBgoqhkiG+E0BBQFpMEkGA1Ud HwRCMEAwPqA8oDqGOGh0dHA6Ly9jcmwudHJ1c3QtcHJvdmlkZXIuY29tL0FkZFRydXN0RXh0ZXJu YWxDQVJvb3QuY3JsMDoGCCsGAQUFBwEBBC4wLDAqBggrBgEFBQcwAYYeaHR0cDovL29jc3AudHJ1 c3QtcHJvdmlkZXIuY29tMDUGA1UdHgQuMCygKjALgQlpbnRlbC5jb20wG6AZBgorBgEEAYI3FAID oAsMCWludGVsLmNvbTANBgkqhkiG9w0BAQUFAAOCAQEAKcLNo/2So1Jnoi8G7W5Q6FSPq1fmyKW3 sSDf1amvyHkjEgd25n7MKRHGEmRxxoziPKpcmbfXYU+J0g560nCo5gPF78Wd7ZmzcmCcm1UFFfIx fw6QA19bRpTC8bMMaSSEl8y39Pgwa+HENmoPZsM63DdZ6ziDnPqcSbcfYs8qd/m5d22rpXq5IGVU tX6LX7R/hSSw/3sfATnBLgiJtilVyY7OGGmYKCAS2I04itvSS1WtecXTt9OZDyNbl7LtObBrgMLh ZkpJW+pOR9f3h5VG2S5uKkA7Th9NC9EoScdwQCAIw+UWKbSQ0Isj2UFL7fHKvmqWKVTL98sRzvI3 seNC4DCCBYIwggRqoAMCAQICEzMAAIu5Kz5Fe8d0qN0AAAAAi7kwDQYJKoZIhvcNAQEFBQAweTEL MAkGA1UEBhMCVVMxCzAJBgNVBAgTAkNBMRQwEgYDVQQHEwtTYW50YSBDbGFyYTEaMBgGA1UEChMR SW50ZWwgQ29ycG9yYXRpb24xKzApBgNVBAMTIkludGVsIEV4dGVybmFsIEJhc2ljIElzc3Vpbmcg Q0EgNEEwHhcNMTcwMTA5MjEyMzU4WhcNMTgwMTA0MjEyMzU4WjBFMRkwFwYDVQQDExBXYWxrZXIs IEJlbmphbWluMSgwJgYJKoZIhvcNAQkBFhliZW5qYW1pbi53YWxrZXJAaW50ZWwuY29tMIIBIjAN BgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxFugJYk4Vd/Yvdmr8BdnGDdCkN1bc1KNCAQBhzC/ BWXw5nxpXWMYFBkTxahM78PtuwdtPDFqoHsMNEaX0miWeYjB6zKbKl7y0LEsSxlu9wjllEdWTYOP 9/m3UC0oITDn7L01adbsD5Sin6W1FMmjcBVrD51oy2orpwfvan3TNVRRQxt8dQz38hivXnona5tt toi+V8ved7o251HApvEwW7QtDfdML+RmBKBSf0MzGjZHPzoBfRrsBUZ0yRHJxlkYNeY99EAUUHwT npsySQSf0cxLmvA6/a4qPOUSitHit+cJQ58/EOt6PLrPGAbdu5sz9O+Iv+FUJakwUtg0sAY4RQID AQABo4ICNTCCAjEwHQYDVR0OBBYEFAU2hsr+3sx/M5e5WafmYD18VvX1MB8GA1UdIwQYMBaAFB5p KrTcKP5HGE4hCz+8rBEv8Jj1MGUGA1UdHwReMFwwWqBYoFaGVGh0dHA6Ly93d3cuaW50ZWwuY29t L3JlcG9zaXRvcnkvQ1JML0ludGVsJTIwRXh0ZXJuYWwlMjBCYXNpYyUyMElzc3VpbmclMjBDQSUy MDRBLmNybDCBnwYIKwYBBQUHAQEEgZIwgY8waQYIKwYBBQUHMAKGXWh0dHA6Ly93d3cuaW50ZWwu Y29tL3JlcG9zaXRvcnkvY2VydGlmaWNhdGVzL0ludGVsJTIwRXh0ZXJuYWwlMjBCYXNpYyUyMElz c3VpbmclMjBDQSUyMDRBLmNydDAiBggrBgEFBQcwAYYWaHR0cDovL29jc3AuaW50ZWwuY29tLzAL BgNVHQ8EBAMCB4AwPAYJKwYBBAGCNxUHBC8wLQYlKwYBBAGCNxUIhsOMdYSZ5VGD/YEohY6fU4KR wAlngd69OZXwQwIBZAIBCTAfBgNVHSUEGDAWBggrBgEFBQcDBAYKKwYBBAGCNwoDDDApBgkrBgEE AYI3FQoEHDAaMAoGCCsGAQUFBwMEMAwGCisGAQQBgjcKAwwwTwYDVR0RBEgwRqApBgorBgEEAYI3 FAIDoBsMGWJlbmphbWluLndhbGtlckBpbnRlbC5jb22BGWJlbmphbWluLndhbGtlckBpbnRlbC5j b20wDQYJKoZIhvcNAQEFBQADggEBAMQUzXgrfwDLl92M7wNqp24Xe1poeurJ8YVAy5a2UukwC/uX uXE8Duoz2jMJL90QETn17H7EQQu1J7kc059H6GyDU42MkzPA3mqZQimrTgOaalPXxWXoVl/UUoLB PJZXGF3Ef1p8b1UVdSnZZ8wTD/QTUw7UhgljKZ1td/raLV1h96x6lKCVkZ0UKU8be5M3FHQ/GZJ9 CgUjvN0m2mYOUHDkNzsUTJb4bsV7vZDa3zixm4Gxu2F/uq328AEJ6JJmXA+jjFOzQ0FI8sa7XOSR 1UPvZSrwyA00M/zFZaDTln+sFPFNseYYGYFU7P711D8Wj1Hv1V/C2G4rSRBJG5f1WF8xggIXMIIC EwIBATCBkDB5MQswCQYDVQQGEwJVUzELMAkGA1UECBMCQ0ExFDASBgNVBAcTC1NhbnRhIENsYXJh MRowGAYDVQQKExFJbnRlbCBDb3Jwb3JhdGlvbjErMCkGA1UEAxMiSW50ZWwgRXh0ZXJuYWwgQmFz aWMgSXNzdWluZyBDQSA0QQITMwAAi7krPkV7x3So3QAAAACLuTAJBgUrDgMCGgUAoF0wGAYJKoZI hvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTcwODAzMjMzNjAzWjAjBgkqhkiG 9w0BCQQxFgQUmYLERHX7z6hX41CQAQDtpCIANnwwDQYJKoZIhvcNAQEBBQAEggEAwoXILyfaNLRu GtYKNVjlhQSQU21jqgIR+PbRmk6nPOTuXRvIaOanEaLn453R8j/T2AMuHIY02MuEMsYQGxemqGUJ BBvEbCH5WSoMlbT6W9pODz9q+zXFGkHyg2rSijhA6jIhiFoS9a/rUm6nq37sh8BkvvHUFpg9IJvS acU14byaDYkBrrTWHsmlrKTg3VT51n0a9Rrf0kCDyGS2RoTkNMYHZ8PpKFmFu/QeM64MZmQQKssa cGSna3evheLv/nkskcYq4mlF+05zuEwQBccEQTLj3cGOSsWphVZLUoii3s+ZdyLyU5lWmblJYWdm ahIEz0tjjaD6bQMJkhaP1KL6YwAAAAAAAA== --===============6624916556900252254==--