* [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-07-31 13:34
0 siblings, 0 replies; 11+ messages in thread
From: @ 2017-07-31 13:34 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 2545 bytes --]
Hi, all
Recently, we use a demo to obverse the latency.
The demo is based on 'hello_world.c' in 'spdk/examples/nvme/hello_world'.
The modifications are described as following.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
static void write_complete(void *arg, const struct spdk_nvme_cpl *completion) {
struct hello_world_sequence *sequence = arg;
spdk_free(sequence->buf);
sequence->is_completed = 1;
}
---------------------------------------------------------------------------------------------------------------------------------------------------------------
hello_world(int id) {
...
clock_gettime(CLOCK_REALTIME, &time1);
rc = spdk_nvme_ns_cmd_write(ns_entry->ns, ns_entry->qpair, sequence.buf,
id, /* LBA start */
1, /* number of LBAs */
write_complete, &sequence, 0);
...
while (!sequence.is_completed) {
spdk_nvme_qpair_process_completions(ns_entry->qpair, 0);
}
clock_gettime(CLOCK_REALTIME, &time2);
printf("%ld \n", diff(time1,time2).tv_nsec);
...
}
---------------------------------------------------------------------------------------------------------------------------------------------------------------
int main() {
...
int i = 500;
while (i > 0) {
if (i-- % 4 == 0) {
id += 10;
}
hello_world(id);
}
...
}
---------------------------------------------------------------------------------------------------------------------------------------------------------------
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent.
Additionally, they can reach even 2-3 ms and present a periodical change.
The related experiment results can be seen in accessory 'result'. The results are like:
Why?
(1) As shown in the file 'result', for the same block, the latency of the first accessing is about 10-12 μs while the second, third and the forth accessing can reach 700-900 μs even 2-3 ms?
I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change as shown in the above figure?
Best wishes,
Jiajia Chu
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 4188 bytes --]
[-- Attachment #3: aaa.jpg --]
[-- Type: image/jpeg, Size: 2997763 bytes --]
[-- Attachment #4: result.obj --]
[-- Type: application/octet-stream, Size: 4850 bytes --]
EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: PRIMARY
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
Initializing NVMe Controllers
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
Attaching to 0000:06:00.0
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
Attaching to 0000:07:00.0
Attached to 0000:06:00.0
Using controller INTEL SSDPECME032T4 (CVF8547400EL3P2CGN-1) with 1 namespaces.
Namespace ID: 1 size: 1600GB
Attached to 0000:07:00.0
Using controller INTEL SSDPECME032T4 (CVF8547400EL3P2CGN-2) with 1 namespaces.
Namespace ID: 1 size: 1600GB
Initialization complete.
578399
627879
905910
837393
11471
700092
912493
838944
11584
701555
843587
842619
10645
771692
841840
908734
11367
701584
908298
807404
11041
706581
841209
842818
36212
646781
815384
812124
10798
700494
844009
1607305
11832
704208
908099
907596
10538
705744
861717
822366
14275
697649
942942
910912
10215
771716
909750
872705
11281
772450
907437
910800
11122
771479
913026
910220
89390
662337
909971
906250
10535
768601
903570
904516
11360
736202
909135
906195
11013
769150
909958
908431
10507
805149
910051
906410
9597
735814
943164
911742
10949
2921442
2990648
2929058
10868
3052099
2929099
2960655
10946
2741933
2859248
3190659
10795
2891705
3027894
3093697
11312
3053440
2796344
2857903
15189
60041
842576
815144
8911
690323
861399
790160
8798
692363
848078
916440
9968
721094
848282
813205
8849
756824
849552
817706
9118
685775
798765
880288
8239
685569
849474
848680
9102
725363
850520
814352
8814
661112
847351
845621
8776
658597
852557
815763
8637
721205
951891
915204
9186
790936
849939
823263
8878
717462
846285
914700
8768
791119
806713
841805
8620
790915
846406
877685
8977
759485
913598
914523
8847
793607
915836
813466
9287
786108
916540
851126
8701
727961
915065
848209
8621
723374
943375
854388
8948
2843957
3129337
3167804
11156
2833548
3065357
3123325
9674
2709908
3066700
2898656
9000
2711598
3098088
2932727
9083
2876809
3067148
3136661
9445
2798517
2937410
2828013
12717
751586
2925563
2977357
10485
3022226
2861060
2928720
10725
3092796
2995656
2959634
10586
2891287
3029278
2978744
9834
2860501
3229775
2993495
10322
2881087
2997150
2997206
11680
2892875
2796735
2996026
10694
2777882
2931152
2927744
9399
2754641
2947965
2995322
10690
2857909
2861345
2826033
11089
2759787
2898713
2830202
10141
2763660
2697726
2694289
12058
2857148
2992986
3027870
10709
2825695
2860569
2894848
11875
2892414
2934415
840244
11426
706501
844426
909611
10324
696709
805883
841287
11300
658565
905693
843638
10190
690019
844678
874235
11113
672904
821171
796841
26545
603991
837883
843716
11839
669807
839930
838525
11008
670258
810951
843516
10403
673710
826949
845045
10682
714685
840581
909143
11568
52565
3094342
2967298
8885
2879380
2966165
2997362
8675
2875300
2801773
3065171
8586
2886870
2904994
3126807
9209
2908824
2902670
2867949
8769
2547333
2721576
2834544
9485
2544404
2934658
2769151
9234
2612107
2701303
2966823
9438
2611603
2800440
2800576
8636
2580223
2900025
2833152
9015
2760412
2735484
2767416
8814
2801794
2864001
2668729
8996
2810770
2649450
2807850
9019
2779255
2793581
847331
9879
677229
814258
843209
9092
684278
808301
808814
9407
780840
809448
803590
8556
673711
807511
837984
9173
682692
809983
836602
8771
744812
791488
842810
8816
713337
853851
826364
8753
677448
809705
797800
9738
683720
808645
843893
8499
682338
807833
937091
9168
776193
906420
836243
9023
681016
843287
826765
14769
61883
925356
906256
18237
642868
830139
903435
10249
772422
843322
874604
10119
763203
911613
908964
13556
705753
841360
822748
10038
774141
840083
908594
10763
705537
843154
910711
10007
771118
946150
842905
10669
706851
871806
956606
10295
762283
3119618
2990178
10188
2759875
3291427
2932225
11920
2825092
2960664
2959704
10115
3056410
3027189
3127736
9580
2959419
3192336
2789758
14167
2855895
2928339
2928103
11438
2957086
3292255
2894909
10094
2859753
2995448
2928224
9707
2820687
3027967
2931493
11449
2922926
2993807
3032904
23598
2827235
3028553
2981587
9997
2846871
3068628
2783704
10053
2858511
2925205
2929744
11193
2788741
2883847
2960650
^ permalink raw reply [flat|nested] 11+ messages in thread
* [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-01 7:58
0 siblings, 0 replies; 11+ messages in thread
From: @ 2017-08-01 7:58 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 680 bytes --]
Hi, all
Recently, we use a demo to obverse the latency.
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent.
Additionally, they can reach even 2-3 ms and present a periodical change.
Why?
(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third
and the forth accessing can reach 700-900 μs even 2-3 ms?
I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change?
Best wishes,
Jiajia Chu
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 1401 bytes --]
[-- Attachment #3: result.obj --]
[-- Type: application/octet-stream, Size: 4850 bytes --]
EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: PRIMARY
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
Initializing NVMe Controllers
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
Attaching to 0000:06:00.0
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
Attaching to 0000:07:00.0
Attached to 0000:06:00.0
Using controller INTEL SSDPECME032T4 (CVF8547400EL3P2CGN-1) with 1 namespaces.
Namespace ID: 1 size: 1600GB
Attached to 0000:07:00.0
Using controller INTEL SSDPECME032T4 (CVF8547400EL3P2CGN-2) with 1 namespaces.
Namespace ID: 1 size: 1600GB
Initialization complete.
578399
627879
905910
837393
11471
700092
912493
838944
11584
701555
843587
842619
10645
771692
841840
908734
11367
701584
908298
807404
11041
706581
841209
842818
36212
646781
815384
812124
10798
700494
844009
1607305
11832
704208
908099
907596
10538
705744
861717
822366
14275
697649
942942
910912
10215
771716
909750
872705
11281
772450
907437
910800
11122
771479
913026
910220
89390
662337
909971
906250
10535
768601
903570
904516
11360
736202
909135
906195
11013
769150
909958
908431
10507
805149
910051
906410
9597
735814
943164
911742
10949
2921442
2990648
2929058
10868
3052099
2929099
2960655
10946
2741933
2859248
3190659
10795
2891705
3027894
3093697
11312
3053440
2796344
2857903
15189
60041
842576
815144
8911
690323
861399
790160
8798
692363
848078
916440
9968
721094
848282
813205
8849
756824
849552
817706
9118
685775
798765
880288
8239
685569
849474
848680
9102
725363
850520
814352
8814
661112
847351
845621
8776
658597
852557
815763
8637
721205
951891
915204
9186
790936
849939
823263
8878
717462
846285
914700
8768
791119
806713
841805
8620
790915
846406
877685
8977
759485
913598
914523
8847
793607
915836
813466
9287
786108
916540
851126
8701
727961
915065
848209
8621
723374
943375
854388
8948
2843957
3129337
3167804
11156
2833548
3065357
3123325
9674
2709908
3066700
2898656
9000
2711598
3098088
2932727
9083
2876809
3067148
3136661
9445
2798517
2937410
2828013
12717
751586
2925563
2977357
10485
3022226
2861060
2928720
10725
3092796
2995656
2959634
10586
2891287
3029278
2978744
9834
2860501
3229775
2993495
10322
2881087
2997150
2997206
11680
2892875
2796735
2996026
10694
2777882
2931152
2927744
9399
2754641
2947965
2995322
10690
2857909
2861345
2826033
11089
2759787
2898713
2830202
10141
2763660
2697726
2694289
12058
2857148
2992986
3027870
10709
2825695
2860569
2894848
11875
2892414
2934415
840244
11426
706501
844426
909611
10324
696709
805883
841287
11300
658565
905693
843638
10190
690019
844678
874235
11113
672904
821171
796841
26545
603991
837883
843716
11839
669807
839930
838525
11008
670258
810951
843516
10403
673710
826949
845045
10682
714685
840581
909143
11568
52565
3094342
2967298
8885
2879380
2966165
2997362
8675
2875300
2801773
3065171
8586
2886870
2904994
3126807
9209
2908824
2902670
2867949
8769
2547333
2721576
2834544
9485
2544404
2934658
2769151
9234
2612107
2701303
2966823
9438
2611603
2800440
2800576
8636
2580223
2900025
2833152
9015
2760412
2735484
2767416
8814
2801794
2864001
2668729
8996
2810770
2649450
2807850
9019
2779255
2793581
847331
9879
677229
814258
843209
9092
684278
808301
808814
9407
780840
809448
803590
8556
673711
807511
837984
9173
682692
809983
836602
8771
744812
791488
842810
8816
713337
853851
826364
8753
677448
809705
797800
9738
683720
808645
843893
8499
682338
807833
937091
9168
776193
906420
836243
9023
681016
843287
826765
14769
61883
925356
906256
18237
642868
830139
903435
10249
772422
843322
874604
10119
763203
911613
908964
13556
705753
841360
822748
10038
774141
840083
908594
10763
705537
843154
910711
10007
771118
946150
842905
10669
706851
871806
956606
10295
762283
3119618
2990178
10188
2759875
3291427
2932225
11920
2825092
2960664
2959704
10115
3056410
3027189
3127736
9580
2959419
3192336
2789758
14167
2855895
2928339
2928103
11438
2957086
3292255
2894909
10094
2859753
2995448
2928224
9707
2820687
3027967
2931493
11449
2922926
2993807
3032904
23598
2827235
3028553
2981587
9997
2846871
3068628
2783704
10053
2858511
2925205
2929744
11193
2788741
2883847
2960650
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-01 8:05 Danielle Costantino
0 siblings, 0 replies; 11+ messages in thread
From: Danielle Costantino @ 2017-08-01 8:05 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 1243 bytes --]
What you are observing is the characteristics of how the drives firmware handles writes. This can be from write amplification caused by the drives GC, and journaling + checkpointing. This behavior may vary greatly between vendors and even firmware versions.
- Danielle Costantino
________________________________
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of 储 <cjj25233(a)163.com>
Sent: Tuesday, August 1, 2017 12:58:33 AM
To: spdk(a)lists.01.org
Subject: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
Hi, all
Recently, we use a demo to obverse the latency.
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent.
Additionally, they can reach even 2-3 ms and present a periodical change.
Why?
(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third
and the forth accessing can reach 700-900 μs even 2-3 ms?
I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change?
Best wishes,
Jiajia Chu
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 2756 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-01 8:46
0 siblings, 0 replies; 11+ messages in thread
From: @ 2017-08-01 8:46 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 1775 bytes --]
Hi, Danielle
We also did experiments to access random blocks of 500 times,
the results seem like more stable, most of the latencies are 10-30 μs and the maximum is 284 μs.
We have tested the random method many times and each experiment has similar results.
We wonder that why these same mechanisms, e.g., GC, journaling and checkpointing, cause different influence between the random and consecutive accessing?
Thanks a lot.
At 2017-08-01 16:05:54, "Danielle Costantino" <dcostantino(a)vmem.com> wrote:
What you are observing is the characteristics of how the drives firmware handles writes. This can be from write amplification caused by the drives GC, and journaling + checkpointing. This behavior may vary greatly between vendors and even firmware versions.
- Danielle Costantino
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of 储 <cjj25233(a)163.com>
Sent: Tuesday, August 1, 2017 12:58:33 AM
To:spdk(a)lists.01.org
Subject: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
Hi, all
Recently, we use a demo to obverse the latency.
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent.
Additionally, they can reach even 2-3 ms and present a periodical change.
Why?
(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third
and the forth accessing can reach 700-900 μs even 2-3 ms?
I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change?
Best wishes,
Jiajia Chu
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 3764 bytes --]
[-- Attachment #3: random_test_result.obj --]
[-- Type: application/octet-stream, Size: 4088 bytes --]
EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: PRIMARY
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
Initializing NVMe Controllers
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
Attaching to 0000:06:00.0
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
Attaching to 0000:07:00.0
Attached to 0000:06:00.0
Using controller INTEL SSDPECME032T4 (CVF8547400EL3P2CGN-1) with 1 namespaces.
Namespace ID: 1 size: 1600GB
Attached to 0000:07:00.0
Using controller INTEL SSDPECME032T4 (CVF8547400EL3P2CGN-2) with 1 namespaces.
Namespace ID: 1 size: 1600GB
Initialization complete.
37465
284410
14842
10994
14047
11934
16770
11722
9825
10999
15324
10751
122388
9436
9975
8737
10724
8142
8833
8409
8500
14361
8300
12171
9974
8205
10180
9524
10843
27500
10581
8944
9647
8723
9564
11018
13335
9749
9201
9762
7951
7731
10858
7736
12477
10311
8308
7872
10142
15584
6973
10968
8427
11045
9880
7662
11556
10341
8460
9600
11637
87031
8474
8484
7871
7861
12809
9972
8451
10778
12617
9816
9080
11807
8720
8604
9028
12504
7437
8297
11321
9399
8240
7845
7898
8360
10292
9909
8841
11244
10304
9999
7792
7823
27841
10971
11733
10412
9495
8022
10567
10001
8056
12517
7928
10490
12786
9678
9329
22426
7858
7196
6993
8775
12447
11689
10390
11564
8353
17171
8745
12312
10357
8037
7886
9877
82512
11867
8794
17011
10318
7851
12245
7637
10373
8727
10798
10995
7651
11034
7195
11632
8167
9581
7466
22666
11001
8414
7743
7107
7781
10124
7951
7644
11763
8294
10746
8119
7551
9920
12824
6968
8159
9811
9593
14649
9096
8272
9816
7973
10105
9641
9155
9922
12066
9813
9365
8690
11560
10607
9661
11609
11935
9878
7533
10112
10581
7690
13197
10411
7885
73926
7317
8165
11545
8032
8421
11064
8435
8076
12373
9727
13740
7957
11011
11403
9412
8662
8373
10079
8777
10005
7398
7719
9843
12640
9931
8555
7813
10568
10190
7701
7564
7602
26195
10116
10234
8797
10522
8148
8117
8759
11886
9961
8041
10463
13065
10083
10230
8673
10116
11402
11199
15409
8449
8047
10312
9916
12358
7591
7366
7653
8218
8074
10085
7403
7706
12630
18491
9676
9564
8242
7834
9177
10645
12606
9639
7787
10239
10267
7400
12004
8175
9513
9013
8845
7992
13122
7964
7684
10319
7004
7756
7260
16706
10391
12192
8763
7752
7684
7918
10254
7327
11603
8735
11739
10482
7083
10022
7793
7794
10881
12821
8193
10080
11190
10315
10204
8396
17804
11581
9329
10348
15045
9541
9114
12842
12774
8490
9255
8045
81070
7799
18063
11954
10182
8978
9124
10224
8514
7532
10644
9380
8972
7872
10394
7593
9953
9088
7462
10886
7496
8102
21891
8519
8248
10914
15002
11874
9402
7529
7597
10527
10332
8690
10023
19136
7871
9539
7932
7182
12823
7758
11296
8105
7998
8015
9013
10335
11291
8134
10124
8199
9635
9964
10788
8334
10418
13839
7902
10168
7813
10188
10438
13373
9450
80209
8399
16338
8042
7436
7329
10206
10336
13363
10069
7841
10484
9045
10038
8173
7482
9879
9958
9566
10209
9314
10501
7916
7519
9717
7655
7964
9258
8484
13017
7506
7771
20716
28968
10194
10994
10658
8948
10172
10194
9432
8980
7571
9739
12462
7550
10758
11853
10003
9442
7843
10191
7314
9754
7653
8140
12068
11226
8156
9609
9792
9290
10114
10271
7434
9073
10721
9062
10935
9610
13447
8210
10063
7802
7478
10159
10442
8749
8155
6806
9702
7801
8292
7263
10026
7849
9995
8245
9230
7501
7786
7942
13863
11902
10136
9305
9195
9126
27205
10060
8506
9857
8100
7033
7937
10706
7549
8001
8176
10978
7922
11726
9885
8686
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-01 19:41 Luse, Paul E
0 siblings, 0 replies; 11+ messages in thread
From: Luse, Paul E @ 2017-08-01 19:41 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 2665 bytes --]
Sounds like Danielle’s theory still holds here… the impact of background tasks in SSD firmware is at least partially dependent on the location of the operation on the media , ie if your write hits an area that is being relocated as part of GC it’s going to suffer more and as GC is done in localized areas it stands to reason that random traffic will be less affected as you’re hitting as a smaller % of the IOs will be hitting LBA regions that GC is affecting. I’m sure there are a host of other factors as well
-Paul
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of ?
Sent: Tuesday, August 1, 2017 1:47 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
Hi, Danielle
We also did experiments to access random blocks of 500 times,
the results seem like more stable, most of the latencies are 10-30 μs and the maximum is 284 μs.
We have tested the random method many times and each experiment has similar results.
We wonder that why these same mechanisms, e.g., GC, journaling and checkpointing, cause different influence between the random and consecutive accessing?
Thanks a lot.
At 2017-08-01 16:05:54, "Danielle Costantino" <dcostantino(a)vmem.com> wrote:
What you are observing is the characteristics of how the drives firmware handles writes. This can be from write amplification caused by the drives GC, and journaling + checkpointing. This behavior may vary greatly between vendors and even firmware versions.
- Danielle Costantino
________________________________
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of 储 <cjj25233(a)163.com<mailto:cjj25233(a)163.com>>
Sent: Tuesday, August 1, 2017 12:58:33 AM
To: spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>
Subject: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
Hi, all
Recently, we use a demo to obverse the latency.
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent.
Additionally, they can reach even 2-3 ms and present a periodical change.
Why?
(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third
and the forth accessing can reach 700-900 μs even 2-3 ms?
I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change?
Best wishes,
Jiajia Chu
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 11723 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-01 23:49 Walker, Benjamin
0 siblings, 0 replies; 11+ messages in thread
From: Walker, Benjamin @ 2017-08-01 23:49 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 3481 bytes --]
Hi Jiajia,
I have a bunch of questions that will help me figure out what you are seeing.
1) When you say "access", do you mean read or write? The behavior of these two operations is quite different.
2) Are you using a NAND based or 3D XPoint based SSD? These again work entirely differently.
3) When you access the same block repeatedly, what's the delay between each access? None?
4) For whatever SSD you are using, have you confirmed the firmware is up to date? This can make a big difference.
Thanks,
Ben
-------- Original message --------
From: "Luse, Paul E" <paul.e.luse(a)intel.com>
Date: 8/1/17 12:42 PM (GMT-07:00)
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
Sounds like Danielle’s theory still holds here… the impact of background tasks in SSD firmware is at least partially dependent on the location of the operation on the media , ie if your write hits an area that is being relocated as part of GC it’s going to suffer more and as GC is done in localized areas it stands to reason that random traffic will be less affected as you’re hitting as a smaller % of the IOs will be hitting LBA regions that GC is affecting. I’m sure there are a host of other factors as well
-Paul
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of ?
Sent: Tuesday, August 1, 2017 1:47 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
Hi, Danielle
We also did experiments to access random blocks of 500 times,
the results seem like more stable, most of the latencies are 10-30 μs and the maximum is 284 μs.
We have tested the random method many times and each experiment has similar results.
We wonder that why these same mechanisms, e.g., GC, journaling and checkpointing, cause different influence between the random and consecutive accessing?
Thanks a lot.
At 2017-08-01 16:05:54, "Danielle Costantino" <dcostantino(a)vmem.com> wrote:
What you are observing is the characteristics of how the drives firmware handles writes. This can be from write amplification caused by the drives GC, and journaling + checkpointing. This behavior may vary greatly between vendors and even firmware versions.
- Danielle Costantino
________________________________
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of 储 <cjj25233(a)163.com<mailto:cjj25233(a)163.com>>
Sent: Tuesday, August 1, 2017 12:58:33 AM
To: spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>
Subject: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
Hi, all
Recently, we use a demo to obverse the latency.
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent.
Additionally, they can reach even 2-3 ms and present a periodical change.
Why?
(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third
and the forth accessing can reach 700-900 μs even 2-3 ms?
I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change?
Best wishes,
Jiajia Chu
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 11170 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-02 11:13
0 siblings, 0 replies; 11+ messages in thread
From: @ 2017-08-02 11:13 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 4557 bytes --]
Hi, Paul and Ben
We also considerate GC influences the write latencies, but have no ideas about how to demonstrate it by experiments.
Would you like to give us some advices about experiments design?
We also believe that it exists other factors which integrate to cause the abnormal phenomenon.
Answers:
(1) "access" = write. We experiment read and write operations respectively,
but only find the strange phenomenon in the writing experiments.
The comparison of experiments can be seen in accessories.
(2) We use a NAND based SSD, Intel P3608.
(3) The result presented in accessories is produced with no delay.
We try to set "sleep(1)" between the two operations, but it seems does not work.
(4) We did not update the firmware...
The device is shared by many fellows, we are afraid that the update is irreversible and affects other works.
If it could be solved by firmware update, we also want to comprehend the reason?
Thanks a lot.
At 2017-08-02 07:49:13, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
Hi Jiajia,
I have a bunch of questions that will help me figure out what you are seeing.
1) When you say "access", do you mean read or write? The behavior of these two operations is quite different.
2) Are you using a NAND based or 3D XPoint based SSD? These again work entirely differently.
3) When you access the same block repeatedly, what's the delay between each access? None?
4) For whatever SSD you are using, have you confirmed the firmware is up to date? This can make a big difference.
Thanks,
Ben
-------- Original message --------
From: "Luse, Paul E" <paul.e.luse(a)intel.com>
Date: 8/1/17 12:42 PM (GMT-07:00)
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
Sounds like Danielle’s theory still holds here… the impact of background tasks in SSD firmware is at least partially dependent on the location of the operation on the media , ie if your write hits an area that is being relocated as part of GC it’s going to suffer more and as GC is done in localized areas it stands to reason that random traffic will be less affected as you’re hitting as a smaller % of the IOs will be hitting LBA regions that GC is affecting. I’m sure there are a host of other factors as well
-Paul
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of ?
Sent: Tuesday, August 1, 2017 1:47 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
Hi, Danielle
We also did experiments to access random blocks of 500 times,
the results seem like more stable, most of the latencies are 10-30 μs and the maximum is 284 μs.
We have tested the random method many times and each experiment has similar results.
We wonder that why these same mechanisms, e.g., GC, journaling and checkpointing, cause different influence between the random and consecutive accessing?
Thanks a lot.
At 2017-08-01 16:05:54, "Danielle Costantino" <dcostantino(a)vmem.com> wrote:
What you are observing is the characteristics of how the drives firmware handles writes. This can be from write amplification caused by the drives GC, and journaling + checkpointing. This behavior may vary greatly between vendors and even firmware versions.
- Danielle Costantino
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of 储 <cjj25233(a)163.com>
Sent: Tuesday, August 1, 2017 12:58:33 AM
To:spdk(a)lists.01.org
Subject: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
Hi, all
Recently, we use a demo to obverse the latency.
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent.
Additionally, they can reach even 2-3 ms and present a periodical change.
Why?
(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third
and the forth accessing can reach 700-900 μs even 2-3 ms?
I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change?
Best wishes,
Jiajia Chu
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 12934 bytes --]
[-- Attachment #3: result_read.jpg --]
[-- Type: image/jpeg, Size: 39459 bytes --]
[-- Attachment #4: result_write.jpg --]
[-- Type: image/jpeg, Size: 49841 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-03 23:36 Walker, Benjamin
0 siblings, 0 replies; 11+ messages in thread
From: Walker, Benjamin @ 2017-08-03 23:36 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 4392 bytes --]
On Wed, 2017-08-02 at 19:13 +0800, 储 wrote:
> Answers:
> (1) "access" = write. We experiment read and write operations respectively,
> but only find the strange phenomenon in the writing experiments.
> The comparison of experiments can be seen in accessories.
> (2) We use a NAND based SSD, Intel P3608.
> (3) The result presented in accessories is produced with no delay.
> We try to set "sleep(1)" between the two operations, but it seems does
> not work.
>
> At 2017-08-02 07:49:13, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
> > Hi Jiajia,
> >
> > I have a bunch of questions that will help me figure out what you are
> > seeing.
> >
> > 1) When you say "access", do you mean read or write? The behavior of these
> > two operations is quite different.
> > 2) Are you using a NAND based or 3D XPoint based SSD? These again work
> > entirely differently.
> > 3) When you access the same block repeatedly, what's the delay between each
> > access? None?
I was able to verify the behavior you are seeing. I'm afraid I'm not going to be
able to give you an exact answer for your particular device - I don't have
insight into the specifics of how each SSD is implemented. I brainstormed with a
few of my colleagues though, so what I can do is give you some idea of what is
happening inside of the device that will make it clear why writing to the same
block over and over may cause performance problems.
A good mental model for an SSD is basically a log of (LBA, data) pairs. When you
write to any LBA, it just appends to the end of the log and updates an internal
map of the location of that LBA. It does this appending by buffering several
writes into RAM located on the SSD, then it sends that batch of data to the NAND
all at once. The other important understanding is that the SSD is composed of a
large number of physical NAND dies, with some number of entirely parallel NAND
channels that can handle writes. Writing to the log sends the batched data to
each channel more or less round-robin. The final thing to remember is that this
whole process is implemented in hardware, not software, so adding things like
coordination between parallel operations is not as simple as just adding a lock.
When you write the same LBA over and over, a few things could happen inside the
SSD (I don't know how your SSD specifically works).
One possibility is that the SSD could see that the LBA is already buffered in
memory from a previous write and it could just update that memory. However, that
doesn't actually work in general. The data in that memory buffer may be
currently in use as part of a write to actual NAND, or may even be currently
being read. So the only option is to append to the end of the log for each new
write to the LBA. This could probably be coordinated with locking in software,
but remember that the SSD controller is implemented in hardware. If handling
this case makes the design far more complex, it may not be possible given power,
latency, and other budgets.
Another possibility is that the data is appended to the log for each write just
like any other I/O. However, it is still more complicated than the case where
random LBAs are being written to. Once one buffer is filled up, a write to NAND
is issued. When that write completes, it has to update the map for the location
of the LBA. If, while that write is outstanding, another buffer fills up with
new writes to the same LBA, the device has to figure out what to do. If it
submits the second NAND write to a new channel, it's then effectively racing
against the first write. If they complete out of order, the user will end up
with stale data. This case could also probably be handled by better coordination
on the completion side, but again there is a complexity trade off when
implementing this in actual hardware.
The easiest solution is probably to just detect if a NAND write is active for an
LBA in a given buffer, and then just queue up the next write until the one
before it finishes. That adds potentially a lot of latency, but it simplifies
the hardware design considerably.
Ultimately, I have no idea what that SSD is actually doing, but you can see that
it's fairly complex to handle this case. It is certainly more complex than
handling random I/O.
I hope that helps,
Ben
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3274 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-07 6:40
0 siblings, 0 replies; 11+ messages in thread
From: @ 2017-08-07 6:40 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 5096 bytes --]
Hi, Ben
We truly appreciate your concrete analysis and help in resolving the problem.
As you say, Different devices have their different and complex mechanisms.
We do not know how the hardware processes each request actually, but we think you make sense.
Maybe it will be relieved with the hardware upgrade and development.
We hope that we can touch other devices and verify whether this problem exists in the future.
We also expect other fellows using other devices could share the test results about this issue.
Thanks a lot.
At 2017-08-04 07:36:07, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
>On Wed, 2017-08-02 at 19:13 +0800, 储 wrote:
>> Answers:
>> (1) "access" = write. We experiment read and write operations respectively,
>> but only find the strange phenomenon in the writing experiments.
>> The comparison of experiments can be seen in accessories.
>> (2) We use a NAND based SSD, Intel P3608.
>> (3) The result presented in accessories is produced with no delay.
>> We try to set "sleep(1)" between the two operations, but it seems does
>> not work.
>>
>> At 2017-08-02 07:49:13, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
>> > Hi Jiajia,
>> >
>> > I have a bunch of questions that will help me figure out what you are
>> > seeing.
>> >
>> > 1) When you say "access", do you mean read or write? The behavior of these
>> > two operations is quite different.
>> > 2) Are you using a NAND based or 3D XPoint based SSD? These again work
>> > entirely differently.
>> > 3) When you access the same block repeatedly, what's the delay between each
>> > access? None?
>
>I was able to verify the behavior you are seeing. I'm afraid I'm not going to be
>able to give you an exact answer for your particular device - I don't have
>insight into the specifics of how each SSD is implemented. I brainstormed with a
>few of my colleagues though, so what I can do is give you some idea of what is
>happening inside of the device that will make it clear why writing to the same
>block over and over may cause performance problems.
>
>A good mental model for an SSD is basically a log of (LBA, data) pairs. When you
>write to any LBA, it just appends to the end of the log and updates an internal
>map of the location of that LBA. It does this appending by buffering several
>writes into RAM located on the SSD, then it sends that batch of data to the NAND
>all at once. The other important understanding is that the SSD is composed of a
>large number of physical NAND dies, with some number of entirely parallel NAND
>channels that can handle writes. Writing to the log sends the batched data to
>each channel more or less round-robin. The final thing to remember is that this
>whole process is implemented in hardware, not software, so adding things like
>coordination between parallel operations is not as simple as just adding a lock.
>
>When you write the same LBA over and over, a few things could happen inside the
>SSD (I don't know how your SSD specifically works).
>
>One possibility is that the SSD could see that the LBA is already buffered in
>memory from a previous write and it could just update that memory. However, that
>doesn't actually work in general. The data in that memory buffer may be
>currently in use as part of a write to actual NAND, or may even be currently
>being read. So the only option is to append to the end of the log for each new
>write to the LBA. This could probably be coordinated with locking in software,
>but remember that the SSD controller is implemented in hardware. If handling
>this case makes the design far more complex, it may not be possible given power,
>latency, and other budgets.
>
>Another possibility is that the data is appended to the log for each write just
>like any other I/O. However, it is still more complicated than the case where
>random LBAs are being written to. Once one buffer is filled up, a write to NAND
>is issued. When that write completes, it has to update the map for the location
>of the LBA. If, while that write is outstanding, another buffer fills up with
>new writes to the same LBA, the device has to figure out what to do. If it
>submits the second NAND write to a new channel, it's then effectively racing
>against the first write. If they complete out of order, the user will end up
>with stale data. This case could also probably be handled by better coordination
> on the completion side, but again there is a complexity trade off when
>implementing this in actual hardware.
>
>The easiest solution is probably to just detect if a NAND write is active for an
>LBA in a given buffer, and then just queue up the next write until the one
>before it finishes. That adds potentially a lot of latency, but it simplifies
>the hardware design considerably.
>
>Ultimately, I have no idea what that SSD is actually doing, but you can see that
>it's fairly complex to handle this case. It is certainly more complex than
>handling random I/O.
>
>I hope that helps,
>Ben
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 5595 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-09 5:34 Crane Chu
0 siblings, 0 replies; 11+ messages in thread
From: Crane Chu @ 2017-08-09 5:34 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 6469 bytes --]
Hi, Jiajia,
Here are some of my suggestion:
1. try Secure Erase before the test. It will eliminate most of the FTL
background operations, and put SSD into so-called FOB state.
2. most of the enterprise-grade SSD provides Quality of Service in
speications. For example, P3608 ensures 99.99% 4K write IO complete in 4.7
ms when QD=1. Spec also provides another write latency value like 20us. You
can consider them as worst case and best case respectively. But the
distribution between them is hard to predict in non-FOB state, due to the
diversity of firmware, host working load pattern, and NAND Flash
characteristics.
3. mapping granularity is 4K in most of the SSD firmware design. So, if
LBA size is 512B in your test, it will make scenario more complex. Try to
test with 4K aligned IO.
4.
http://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/
Thanks,
-Crane Chu
On Mon, Aug 7, 2017 at 2:40 PM, 储 <cjj25233(a)163.com> wrote:
> Hi, Ben
> We truly appreciate your concrete analysis and help in resolving the
> problem.
>
> As you say, Different devices have their different and complex mechanisms.
> We do not know how the hardware processes each request actually, but we
> think you make sense.
> Maybe it will be relieved with the hardware upgrade and development.
> We hope that we can touch other devices and verify whether this problem
> exists in the future.
> We also expect other fellows using other devices could share the test
> results about this issue.
> Thanks a lot.
>
> At 2017-08-04 07:36:07, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
> >On Wed, 2017-08-02 at 19:13 +0800, 储 wrote:
> >> Answers:
> >> (1) "access" = write. We experiment read and write operations respectively,
> >> but only find the strange phenomenon in the writing experiments.
> >> The comparison of experiments can be seen in accessories.
> >> (2) We use a NAND based SSD, Intel P3608.
> >> (3) The result presented in accessories is produced with no delay.
> >> We try to set "sleep(1)" between the two operations, but it seems does
> >> not work.
> >>
> >> At 2017-08-02 07:49:13, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
> >> > Hi Jiajia,
> >> >
> >> > I have a bunch of questions that will help me figure out what you are
> >> > seeing.
> >> >
> >> > 1) When you say "access", do you mean read or write? The behavior of these
> >> > two operations is quite different.
> >> > 2) Are you using a NAND based or 3D XPoint based SSD? These again work
> >> > entirely differently.
> >> > 3) When you access the same block repeatedly, what's the delay between each
> >> > access? None?
> >
> >I was able to verify the behavior you are seeing. I'm afraid I'm not going to be
> >able to give you an exact answer for your particular device - I don't have
> >insight into the specifics of how each SSD is implemented. I brainstormed with a
> >few of my colleagues though, so what I can do is give you some idea of what is
> >happening inside of the device that will make it clear why writing to the same
> >block over and over may cause performance problems.
> >
> >A good mental model for an SSD is basically a log of (LBA, data) pairs. When you
> >write to any LBA, it just appends to the end of the log and updates an internal
> >map of the location of that LBA. It does this appending by buffering several
> >writes into RAM located on the SSD, then it sends that batch of data to the NAND
> >all at once. The other important understanding is that the SSD is composed of a
> >large number of physical NAND dies, with some number of entirely parallel NAND
> >channels that can handle writes. Writing to the log sends the batched data to
> >each channel more or less round-robin. The final thing to remember is that this
> >whole process is implemented in hardware, not software, so adding things like
> >coordination between parallel operations is not as simple as just adding a lock.
> >
> >When you write the same LBA over and over, a few things could happen inside the
> >SSD (I don't know how your SSD specifically works).
> >
> >One possibility is that the SSD could see that the LBA is already buffered in
> >memory from a previous write and it could just update that memory. However, that
> >doesn't actually work in general. The data in that memory buffer may be
> >currently in use as part of a write to actual NAND, or may even be currently
> >being read. So the only option is to append to the end of the log for each new
> >write to the LBA. This could probably be coordinated with locking in software,
> >but remember that the SSD controller is implemented in hardware. If handling
> >this case makes the design far more complex, it may not be possible given power,
> >latency, and other budgets.
> >
> >Another possibility is that the data is appended to the log for each write just
> >like any other I/O. However, it is still more complicated than the case where
> >random LBAs are being written to. Once one buffer is filled up, a write to NAND
> >is issued. When that write completes, it has to update the map for the location
> >of the LBA. If, while that write is outstanding, another buffer fills up with
> >new writes to the same LBA, the device has to figure out what to do. If it
> >submits the second NAND write to a new channel, it's then effectively racing
> >against the first write. If they complete out of order, the user will end up
> >with stale data. This case could also probably be handled by better coordination
> > on the completion side, but again there is a complexity trade off when
> >implementing this in actual hardware.
> >
> >The easiest solution is probably to just detect if a NAND write is active for an
> >LBA in a given buffer, and then just queue up the next write until the one
> >before it finishes. That adds potentially a lot of latency, but it simplifies
> >the hardware design considerably.
> >
> >Ultimately, I have no idea what that SSD is actually doing, but you can see that
> >it's fairly complex to handle this case. It is certainly more complex than
> >handling random I/O.
> >
> >I hope that helps,
> >Ben
>
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 7569 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-11 13:27
0 siblings, 0 replies; 11+ messages in thread
From: @ 2017-08-11 13:27 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 6891 bytes --]
Hi, Crane,
Thanks for your enlightening suggestions,
1. we are afraid that we can not do secure erasing currently...The device is shared and we need coordinating with other fellows before erasing.
2. we did not considerate the influence of the device state deeply, and we think you make sense.
We will provide a new test report for the FOB state to verify your suggestion.
3. our test demo is based on the spdk/examples/nvme/hello_world/hello_world.c,and the granularity of I/O is set as 4K.
4. thanks for your knowledge sharing^_^
Best wishes,
Jiajia Chu
At 2017-08-09 13:34:17, "Crane Chu" <cranechu(a)gmail.com> wrote:
Hi, Jiajia,
Here are some of my suggestion:
try Secure Erase before the test. It will eliminate most of the FTL background operations, and put SSD into so-called FOB state.
most of the enterprise-grade SSD provides Quality of Service in speications. For example, P3608 ensures 99.99% 4K write IO complete in 4.7 ms when QD=1. Spec also provides another write latency value like 20us. You can consider them as worst case and best case respectively. But the distribution between them is hard to predict in non-FOB state, due to the diversity of firmware, host working load pattern, and NAND Flash characteristics.
mapping granularity is 4K in most of the SSD firmware design. So, if LBA size is 512B in your test, it will make scenario more complex. Try to test with 4K aligned IO.
http://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/
Thanks,
-Crane Chu
On Mon, Aug 7, 2017 at 2:40 PM, 储 <cjj25233(a)163.com> wrote:
Hi, Ben
We truly appreciate your concrete analysis and help in resolving the problem.
As you say, Different devices have their different and complex mechanisms.
We do not know how the hardware processes each request actually, but we think you make sense.
Maybe it will be relieved with the hardware upgrade and development.
We hope that we can touch other devices and verify whether this problem exists in the future.
We also expect other fellows using other devices could share the test results about this issue.
Thanks a lot.
At 2017-08-04 07:36:07, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
>On Wed, 2017-08-02 at 19:13 +0800, 储 wrote:
>> Answers:
>> (1) "access" = write. We experiment read and write operations respectively,
>> but only find the strange phenomenon in the writing experiments.
>> The comparison of experiments can be seen in accessories.
>> (2) We use a NAND based SSD, Intel P3608.
>> (3) The result presented in accessories is produced with no delay.
>> We try to set "sleep(1)" between the two operations, but it seems does
>> not work.
>>
>> At 2017-08-02 07:49:13, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
>> > Hi Jiajia,
>> >
>> > I have a bunch of questions that will help me figure out what you are
>> > seeing.
>> >
>> > 1) When you say "access", do you mean read or write? The behavior of these
>> > two operations is quite different.
>> > 2) Are you using a NAND based or 3D XPoint based SSD? These again work
>> > entirely differently.
>> > 3) When you access the same block repeatedly, what's the delay between each
>> > access? None?
>
>I was able to verify the behavior you are seeing. I'm afraid I'm not going to be
>able to give you an exact answer for your particular device - I don't have
>insight into the specifics of how each SSD is implemented. I brainstormed with a
>few of my colleagues though, so what I can do is give you some idea of what is
>happening inside of the device that will make it clear why writing to the same
>block over and over may cause performance problems.
>
>A good mental model for an SSD is basically a log of (LBA, data) pairs. When you
>write to any LBA, it just appends to the end of the log and updates an internal
>map of the location of that LBA. It does this appending by buffering several
>writes into RAM located on the SSD, then it sends that batch of data to the NAND
>all at once. The other important understanding is that the SSD is composed of a
>large number of physical NAND dies, with some number of entirely parallel NAND
>channels that can handle writes. Writing to the log sends the batched data to
>each channel more or less round-robin. The final thing to remember is that this
>whole process is implemented in hardware, not software, so adding things like
>coordination between parallel operations is not as simple as just adding a lock.
>
>When you write the same LBA over and over, a few things could happen inside the
>SSD (I don't know how your SSD specifically works).
>
>One possibility is that the SSD could see that the LBA is already buffered in
>memory from a previous write and it could just update that memory. However, that
>doesn't actually work in general. The data in that memory buffer may be
>currently in use as part of a write to actual NAND, or may even be currently
>being read. So the only option is to append to the end of the log for each new
>write to the LBA. This could probably be coordinated with locking in software,
>but remember that the SSD controller is implemented in hardware. If handling
>this case makes the design far more complex, it may not be possible given power,
>latency, and other budgets.
>
>Another possibility is that the data is appended to the log for each write just
>like any other I/O. However, it is still more complicated than the case where
>random LBAs are being written to. Once one buffer is filled up, a write to NAND
>is issued. When that write completes, it has to update the map for the location
>of the LBA. If, while that write is outstanding, another buffer fills up with
>new writes to the same LBA, the device has to figure out what to do. If it
>submits the second NAND write to a new channel, it's then effectively racing
>against the first write. If they complete out of order, the user will end up
>with stale data. This case could also probably be handled by better coordination
> on the completion side, but again there is a complexity trade off when
>implementing this in actual hardware.
>
>The easiest solution is probably to just detect if a NAND write is active for an
>LBA in a given buffer, and then just queue up the next write until the one
>before it finishes. That adds potentially a lot of latency, but it simplifies
>the hardware design considerably.
>
>Ultimately, I have no idea what that SSD is actually doing, but you can see that
>it's fairly complex to handle this case. It is certainly more complex than
>handling random I/O.
>
>I hope that helps,
>Ben
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 8674 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-08-11 13:27 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-03 23:36 [SPDK] A issue about maximums of write latency when we access the same block consecutively Walker, Benjamin
-- strict thread matches above, loose matches on Subject: below --
2017-08-11 13:27
2017-08-09 5:34 Crane Chu
2017-08-07 6:40
2017-08-02 11:13
2017-08-01 23:49 Walker, Benjamin
2017-08-01 19:41 Luse, Paul E
2017-08-01 8:46
2017-08-01 8:05 Danielle Costantino
2017-08-01 7:58
2017-07-31 13:34
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.