All of lore.kernel.org
 help / color / mirror / Atom feed
* [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-07-31 13:34 
  0 siblings, 0 replies; 11+ messages in thread
From:  @ 2017-07-31 13:34 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2545 bytes --]

Hi, all
Recently, we use a demo to obverse the latency.
The demo is based on 'hello_world.c' in 'spdk/examples/nvme/hello_world'.
The modifications are described as following.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
static void write_complete(void *arg, const struct spdk_nvme_cpl *completion) {
    struct hello_world_sequence   *sequence = arg;
    spdk_free(sequence->buf);
    sequence->is_completed = 1;
}
---------------------------------------------------------------------------------------------------------------------------------------------------------------
hello_world(int id) {
...
    clock_gettime(CLOCK_REALTIME, &time1);
    rc = spdk_nvme_ns_cmd_write(ns_entry->ns, ns_entry->qpair, sequence.buf,
                                                       id, /* LBA start */ 
                                                      1, /* number of LBAs */
                                                       write_complete, &sequence, 0);
...
    while (!sequence.is_completed)  {
            spdk_nvme_qpair_process_completions(ns_entry->qpair, 0);
     }
     clock_gettime(CLOCK_REALTIME, &time2);
     printf("%ld \n", diff(time1,time2).tv_nsec);
...
}
---------------------------------------------------------------------------------------------------------------------------------------------------------------
int main() {
... 
   int i = 500;
    while (i > 0) {
            if (i-- % 4 == 0) {
                 id += 10;
            }
            hello_world(id);
    }
...
}
---------------------------------------------------------------------------------------------------------------------------------------------------------------
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent. 
Additionally,  they can reach even 2-3 ms and present a periodical change. 
The related experiment results can be seen in accessory 'result'. The results are like:
 Why?
(1) As shown in the file 'result', for the same block, the latency of the first accessing is about 10-12 μs while the second, third and the forth accessing can reach 700-900 μs even 2-3 ms? 
     I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change as shown in the above figure?


  Best wishes,
  Jiajia Chu

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 4188 bytes --]

[-- Attachment #3: aaa.jpg --]
[-- Type: image/jpeg, Size: 2997763 bytes --]

[-- Attachment #4: result.obj --]
[-- Type: application/octet-stream, Size: 4850 bytes --]

EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: PRIMARY
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
Initializing NVMe Controllers
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
Attaching to 0000:06:00.0
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
Attaching to 0000:07:00.0
Attached to 0000:06:00.0
Using controller INTEL SSDPECME032T4  (CVF8547400EL3P2CGN-1) with 1 namespaces.
  Namespace ID: 1 size: 1600GB
Attached to 0000:07:00.0
Using controller INTEL SSDPECME032T4  (CVF8547400EL3P2CGN-2) with 1 namespaces.
  Namespace ID: 1 size: 1600GB
Initialization complete.
578399 
627879 
905910 
837393 
11471 
700092 
912493 
838944 
11584 
701555 
843587 
842619 
10645 
771692 
841840 
908734 
11367 
701584 
908298 
807404 
11041 
706581 
841209 
842818 
36212 
646781 
815384 
812124 
10798 
700494 
844009 
1607305 
11832 
704208 
908099 
907596 
10538 
705744 
861717 
822366 
14275 
697649 
942942 
910912 
10215 
771716 
909750 
872705 
11281 
772450 
907437 
910800 
11122 
771479 
913026 
910220 
89390 
662337 
909971 
906250 
10535 
768601 
903570 
904516 
11360 
736202 
909135 
906195 
11013 
769150 
909958 
908431 
10507 
805149 
910051 
906410 
9597 
735814 
943164 
911742 
10949 
2921442 
2990648 
2929058 
10868 
3052099 
2929099 
2960655 
10946 
2741933 
2859248 
3190659 
10795 
2891705 
3027894 
3093697 
11312 
3053440 
2796344 
2857903 
15189 
60041 
842576 
815144 
8911 
690323 
861399 
790160 
8798 
692363 
848078 
916440 
9968 
721094 
848282 
813205 
8849 
756824 
849552 
817706 
9118 
685775 
798765 
880288 
8239 
685569 
849474 
848680 
9102 
725363 
850520 
814352 
8814 
661112 
847351 
845621 
8776 
658597 
852557 
815763 
8637 
721205 
951891 
915204 
9186 
790936 
849939 
823263 
8878 
717462 
846285 
914700 
8768 
791119 
806713 
841805 
8620 
790915 
846406 
877685 
8977 
759485 
913598 
914523 
8847 
793607 
915836 
813466 
9287 
786108 
916540 
851126 
8701 
727961 
915065 
848209 
8621 
723374 
943375 
854388 
8948 
2843957 
3129337 
3167804 
11156 
2833548 
3065357 
3123325 
9674 
2709908 
3066700 
2898656 
9000 
2711598 
3098088 
2932727 
9083 
2876809 
3067148 
3136661 
9445 
2798517 
2937410 
2828013 
12717 
751586 
2925563 
2977357 
10485 
3022226 
2861060 
2928720 
10725 
3092796 
2995656 
2959634 
10586 
2891287 
3029278 
2978744 
9834 
2860501 
3229775 
2993495 
10322 
2881087 
2997150 
2997206 
11680 
2892875 
2796735 
2996026 
10694 
2777882 
2931152 
2927744 
9399 
2754641 
2947965 
2995322 
10690 
2857909 
2861345 
2826033 
11089 
2759787 
2898713 
2830202 
10141 
2763660 
2697726 
2694289 
12058 
2857148 
2992986 
3027870 
10709 
2825695 
2860569 
2894848 
11875 
2892414 
2934415 
840244 
11426 
706501 
844426 
909611 
10324 
696709 
805883 
841287 
11300 
658565 
905693 
843638 
10190 
690019 
844678 
874235 
11113 
672904 
821171 
796841 
26545 
603991 
837883 
843716 
11839 
669807 
839930 
838525 
11008 
670258 
810951 
843516 
10403 
673710 
826949 
845045 
10682 
714685 
840581 
909143 
11568 
52565 
3094342 
2967298 
8885 
2879380 
2966165 
2997362 
8675 
2875300 
2801773 
3065171 
8586 
2886870 
2904994 
3126807 
9209 
2908824 
2902670 
2867949 
8769 
2547333 
2721576 
2834544 
9485 
2544404 
2934658 
2769151 
9234 
2612107 
2701303 
2966823 
9438 
2611603 
2800440 
2800576 
8636 
2580223 
2900025 
2833152 
9015 
2760412 
2735484 
2767416 
8814 
2801794 
2864001 
2668729 
8996 
2810770 
2649450 
2807850 
9019 
2779255 
2793581 
847331 
9879 
677229 
814258 
843209 
9092 
684278 
808301 
808814 
9407 
780840 
809448 
803590 
8556 
673711 
807511 
837984 
9173 
682692 
809983 
836602 
8771 
744812 
791488 
842810 
8816 
713337 
853851 
826364 
8753 
677448 
809705 
797800 
9738 
683720 
808645 
843893 
8499 
682338 
807833 
937091 
9168 
776193 
906420 
836243 
9023 
681016 
843287 
826765 
14769 
61883 
925356 
906256 
18237 
642868 
830139 
903435 
10249 
772422 
843322 
874604 
10119 
763203 
911613 
908964 
13556 
705753 
841360 
822748 
10038 
774141 
840083 
908594 
10763 
705537 
843154 
910711 
10007 
771118 
946150 
842905 
10669 
706851 
871806 
956606 
10295 
762283 
3119618 
2990178 
10188 
2759875 
3291427 
2932225 
11920 
2825092 
2960664 
2959704 
10115 
3056410 
3027189 
3127736 
9580 
2959419 
3192336 
2789758 
14167 
2855895 
2928339 
2928103 
11438 
2957086 
3292255 
2894909 
10094 
2859753 
2995448 
2928224 
9707 
2820687 
3027967 
2931493 
11449 
2922926 
2993807 
3032904 
23598 
2827235 
3028553 
2981587 
9997 
2846871 
3068628 
2783704 
10053 
2858511 
2925205 
2929744 
11193 
2788741 
2883847 
2960650 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-01  7:58 
  0 siblings, 0 replies; 11+ messages in thread
From:  @ 2017-08-01  7:58 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 680 bytes --]

Hi, all
Recently, we use a demo to obverse the latency.
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent. 
Additionally,  they can reach even 2-3 ms and present a periodical change. 


 Why?
(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third 
     and the forth accessing can reach 700-900 μs even 2-3 ms? 
     I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change?


  Best wishes,
  Jiajia Chu




 





 





 

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 1401 bytes --]

[-- Attachment #3: result.obj --]
[-- Type: application/octet-stream, Size: 4850 bytes --]

EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: PRIMARY
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
Initializing NVMe Controllers
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
Attaching to 0000:06:00.0
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
Attaching to 0000:07:00.0
Attached to 0000:06:00.0
Using controller INTEL SSDPECME032T4  (CVF8547400EL3P2CGN-1) with 1 namespaces.
  Namespace ID: 1 size: 1600GB
Attached to 0000:07:00.0
Using controller INTEL SSDPECME032T4  (CVF8547400EL3P2CGN-2) with 1 namespaces.
  Namespace ID: 1 size: 1600GB
Initialization complete.
578399 
627879 
905910 
837393 
11471 
700092 
912493 
838944 
11584 
701555 
843587 
842619 
10645 
771692 
841840 
908734 
11367 
701584 
908298 
807404 
11041 
706581 
841209 
842818 
36212 
646781 
815384 
812124 
10798 
700494 
844009 
1607305 
11832 
704208 
908099 
907596 
10538 
705744 
861717 
822366 
14275 
697649 
942942 
910912 
10215 
771716 
909750 
872705 
11281 
772450 
907437 
910800 
11122 
771479 
913026 
910220 
89390 
662337 
909971 
906250 
10535 
768601 
903570 
904516 
11360 
736202 
909135 
906195 
11013 
769150 
909958 
908431 
10507 
805149 
910051 
906410 
9597 
735814 
943164 
911742 
10949 
2921442 
2990648 
2929058 
10868 
3052099 
2929099 
2960655 
10946 
2741933 
2859248 
3190659 
10795 
2891705 
3027894 
3093697 
11312 
3053440 
2796344 
2857903 
15189 
60041 
842576 
815144 
8911 
690323 
861399 
790160 
8798 
692363 
848078 
916440 
9968 
721094 
848282 
813205 
8849 
756824 
849552 
817706 
9118 
685775 
798765 
880288 
8239 
685569 
849474 
848680 
9102 
725363 
850520 
814352 
8814 
661112 
847351 
845621 
8776 
658597 
852557 
815763 
8637 
721205 
951891 
915204 
9186 
790936 
849939 
823263 
8878 
717462 
846285 
914700 
8768 
791119 
806713 
841805 
8620 
790915 
846406 
877685 
8977 
759485 
913598 
914523 
8847 
793607 
915836 
813466 
9287 
786108 
916540 
851126 
8701 
727961 
915065 
848209 
8621 
723374 
943375 
854388 
8948 
2843957 
3129337 
3167804 
11156 
2833548 
3065357 
3123325 
9674 
2709908 
3066700 
2898656 
9000 
2711598 
3098088 
2932727 
9083 
2876809 
3067148 
3136661 
9445 
2798517 
2937410 
2828013 
12717 
751586 
2925563 
2977357 
10485 
3022226 
2861060 
2928720 
10725 
3092796 
2995656 
2959634 
10586 
2891287 
3029278 
2978744 
9834 
2860501 
3229775 
2993495 
10322 
2881087 
2997150 
2997206 
11680 
2892875 
2796735 
2996026 
10694 
2777882 
2931152 
2927744 
9399 
2754641 
2947965 
2995322 
10690 
2857909 
2861345 
2826033 
11089 
2759787 
2898713 
2830202 
10141 
2763660 
2697726 
2694289 
12058 
2857148 
2992986 
3027870 
10709 
2825695 
2860569 
2894848 
11875 
2892414 
2934415 
840244 
11426 
706501 
844426 
909611 
10324 
696709 
805883 
841287 
11300 
658565 
905693 
843638 
10190 
690019 
844678 
874235 
11113 
672904 
821171 
796841 
26545 
603991 
837883 
843716 
11839 
669807 
839930 
838525 
11008 
670258 
810951 
843516 
10403 
673710 
826949 
845045 
10682 
714685 
840581 
909143 
11568 
52565 
3094342 
2967298 
8885 
2879380 
2966165 
2997362 
8675 
2875300 
2801773 
3065171 
8586 
2886870 
2904994 
3126807 
9209 
2908824 
2902670 
2867949 
8769 
2547333 
2721576 
2834544 
9485 
2544404 
2934658 
2769151 
9234 
2612107 
2701303 
2966823 
9438 
2611603 
2800440 
2800576 
8636 
2580223 
2900025 
2833152 
9015 
2760412 
2735484 
2767416 
8814 
2801794 
2864001 
2668729 
8996 
2810770 
2649450 
2807850 
9019 
2779255 
2793581 
847331 
9879 
677229 
814258 
843209 
9092 
684278 
808301 
808814 
9407 
780840 
809448 
803590 
8556 
673711 
807511 
837984 
9173 
682692 
809983 
836602 
8771 
744812 
791488 
842810 
8816 
713337 
853851 
826364 
8753 
677448 
809705 
797800 
9738 
683720 
808645 
843893 
8499 
682338 
807833 
937091 
9168 
776193 
906420 
836243 
9023 
681016 
843287 
826765 
14769 
61883 
925356 
906256 
18237 
642868 
830139 
903435 
10249 
772422 
843322 
874604 
10119 
763203 
911613 
908964 
13556 
705753 
841360 
822748 
10038 
774141 
840083 
908594 
10763 
705537 
843154 
910711 
10007 
771118 
946150 
842905 
10669 
706851 
871806 
956606 
10295 
762283 
3119618 
2990178 
10188 
2759875 
3291427 
2932225 
11920 
2825092 
2960664 
2959704 
10115 
3056410 
3027189 
3127736 
9580 
2959419 
3192336 
2789758 
14167 
2855895 
2928339 
2928103 
11438 
2957086 
3292255 
2894909 
10094 
2859753 
2995448 
2928224 
9707 
2820687 
3027967 
2931493 
11449 
2922926 
2993807 
3032904 
23598 
2827235 
3028553 
2981587 
9997 
2846871 
3068628 
2783704 
10053 
2858511 
2925205 
2929744 
11193 
2788741 
2883847 
2960650 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-01  8:05 Danielle Costantino
  0 siblings, 0 replies; 11+ messages in thread
From: Danielle Costantino @ 2017-08-01  8:05 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1243 bytes --]

What you are observing is the characteristics of how the drives firmware handles writes. This can be from write amplification caused by the drives GC, and journaling + checkpointing. This behavior may vary greatly between vendors and even firmware versions.


    - Danielle Costantino
________________________________
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of 储 <cjj25233(a)163.com>
Sent: Tuesday, August 1, 2017 12:58:33 AM
To: spdk(a)lists.01.org
Subject: [SPDK] A issue about maximums of write latency when we access the same block consecutively.

Hi, all
Recently, we use a demo to obverse the latency.
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent.
Additionally,  they can reach even 2-3 ms and present a periodical change.

 Why?
(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third
     and the forth accessing can reach 700-900 μs even 2-3 ms?
     I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change?

  Best wishes,
  Jiajia Chu

















[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 2756 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-01  8:46 
  0 siblings, 0 replies; 11+ messages in thread
From:  @ 2017-08-01  8:46 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1775 bytes --]

Hi, Danielle


We also did experiments to access random blocks of 500 times, 
the results seem like more stable, most of the latencies are 10-30 μs and the maximum is 284 μs.
We have tested the random method  many times and each experiment has similar results.


We wonder that why these same mechanisms, e.g., GC, journaling and checkpointing, cause different influence between the random and consecutive accessing?      
Thanks a lot.


At 2017-08-01 16:05:54, "Danielle Costantino" <dcostantino(a)vmem.com> wrote:


What you are observing is the characteristics of how the drives firmware handles writes. This can be from write amplification caused by the drives GC, and journaling + checkpointing. This behavior may vary greatly between vendors and even firmware versions.




    - Danielle Costantino
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of 储 <cjj25233(a)163.com>
Sent: Tuesday, August 1, 2017 12:58:33 AM
To:spdk(a)lists.01.org
Subject: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
 
Hi, all
Recently, we use a demo to obverse the latency.
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent. 
Additionally,  they can reach even 2-3 ms and present a periodical change. 


 Why?
(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third 
     and the forth accessing can reach 700-900 μs even 2-3 ms? 
     I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change?


  Best wishes,
  Jiajia Chu




 





 





 





 

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 3764 bytes --]

[-- Attachment #3: random_test_result.obj --]
[-- Type: application/octet-stream, Size: 4088 bytes --]

EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: PRIMARY
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
Initializing NVMe Controllers
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
Attaching to 0000:06:00.0
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
Attaching to 0000:07:00.0
Attached to 0000:06:00.0
Using controller INTEL SSDPECME032T4  (CVF8547400EL3P2CGN-1) with 1 namespaces.
  Namespace ID: 1 size: 1600GB
Attached to 0000:07:00.0
Using controller INTEL SSDPECME032T4  (CVF8547400EL3P2CGN-2) with 1 namespaces.
  Namespace ID: 1 size: 1600GB
Initialization complete.
37465 
284410 
14842 
10994 
14047 
11934 
16770 
11722 
9825 
10999 
15324 
10751 
122388 
9436 
9975 
8737 
10724 
8142 
8833 
8409 
8500 
14361 
8300 
12171 
9974 
8205 
10180 
9524 
10843 
27500 
10581 
8944 
9647 
8723 
9564 
11018 
13335 
9749 
9201 
9762 
7951 
7731 
10858 
7736 
12477 
10311 
8308 
7872 
10142 
15584 
6973 
10968 
8427 
11045 
9880 
7662 
11556 
10341 
8460 
9600 
11637 
87031 
8474 
8484 
7871 
7861 
12809 
9972 
8451 
10778 
12617 
9816 
9080 
11807 
8720 
8604 
9028 
12504 
7437 
8297 
11321 
9399 
8240 
7845 
7898 
8360 
10292 
9909 
8841 
11244 
10304 
9999 
7792 
7823 
27841 
10971 
11733 
10412 
9495 
8022 
10567 
10001 
8056 
12517 
7928 
10490 
12786 
9678 
9329 
22426 
7858 
7196 
6993 
8775 
12447 
11689 
10390 
11564 
8353 
17171 
8745 
12312 
10357 
8037 
7886 
9877 
82512 
11867 
8794 
17011 
10318 
7851 
12245 
7637 
10373 
8727 
10798 
10995 
7651 
11034 
7195 
11632 
8167 
9581 
7466 
22666 
11001 
8414 
7743 
7107 
7781 
10124 
7951 
7644 
11763 
8294 
10746 
8119 
7551 
9920 
12824 
6968 
8159 
9811 
9593 
14649 
9096 
8272 
9816 
7973 
10105 
9641 
9155 
9922 
12066 
9813 
9365 
8690 
11560 
10607 
9661 
11609 
11935 
9878 
7533 
10112 
10581 
7690 
13197 
10411 
7885 
73926 
7317 
8165 
11545 
8032 
8421 
11064 
8435 
8076 
12373 
9727 
13740 
7957 
11011 
11403 
9412 
8662 
8373 
10079 
8777 
10005 
7398 
7719 
9843 
12640 
9931 
8555 
7813 
10568 
10190 
7701 
7564 
7602 
26195 
10116 
10234 
8797 
10522 
8148 
8117 
8759 
11886 
9961 
8041 
10463 
13065 
10083 
10230 
8673 
10116 
11402 
11199 
15409 
8449 
8047 
10312 
9916 
12358 
7591 
7366 
7653 
8218 
8074 
10085 
7403 
7706 
12630 
18491 
9676 
9564 
8242 
7834 
9177 
10645 
12606 
9639 
7787 
10239 
10267 
7400 
12004 
8175 
9513 
9013 
8845 
7992 
13122 
7964 
7684 
10319 
7004 
7756 
7260 
16706 
10391 
12192 
8763 
7752 
7684 
7918 
10254 
7327 
11603 
8735 
11739 
10482 
7083 
10022 
7793 
7794 
10881 
12821 
8193 
10080 
11190 
10315 
10204 
8396 
17804 
11581 
9329 
10348 
15045 
9541 
9114 
12842 
12774 
8490 
9255 
8045 
81070 
7799 
18063 
11954 
10182 
8978 
9124 
10224 
8514 
7532 
10644 
9380 
8972 
7872 
10394 
7593 
9953 
9088 
7462 
10886 
7496 
8102 
21891 
8519 
8248 
10914 
15002 
11874 
9402 
7529 
7597 
10527 
10332 
8690 
10023 
19136 
7871 
9539 
7932 
7182 
12823 
7758 
11296 
8105 
7998 
8015 
9013 
10335 
11291 
8134 
10124 
8199 
9635 
9964 
10788 
8334 
10418 
13839 
7902 
10168 
7813 
10188 
10438 
13373 
9450 
80209 
8399 
16338 
8042 
7436 
7329 
10206 
10336 
13363 
10069 
7841 
10484 
9045 
10038 
8173 
7482 
9879 
9958 
9566 
10209 
9314 
10501 
7916 
7519 
9717 
7655 
7964 
9258 
8484 
13017 
7506 
7771 
20716 
28968 
10194 
10994 
10658 
8948 
10172 
10194 
9432 
8980 
7571 
9739 
12462 
7550 
10758 
11853 
10003 
9442 
7843 
10191 
7314 
9754 
7653 
8140 
12068 
11226 
8156 
9609 
9792 
9290 
10114 
10271 
7434 
9073 
10721 
9062 
10935 
9610 
13447 
8210 
10063 
7802 
7478 
10159 
10442 
8749 
8155 
6806 
9702 
7801 
8292 
7263 
10026 
7849 
9995 
8245 
9230 
7501 
7786 
7942 
13863 
11902 
10136 
9305 
9195 
9126 
27205 
10060 
8506 
9857 
8100 
7033 
7937 
10706 
7549 
8001 
8176 
10978 
7922 
11726 
9885 
8686 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-01 19:41 Luse, Paul E
  0 siblings, 0 replies; 11+ messages in thread
From: Luse, Paul E @ 2017-08-01 19:41 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2665 bytes --]

Sounds like Danielle’s theory still holds here… the impact of background tasks in SSD firmware is at least partially dependent on the location of the operation on the media , ie if your write hits an area that is being relocated as part of GC it’s going to suffer more and as GC is done in localized areas it stands to reason that random traffic will be less affected as you’re hitting as a smaller % of the IOs will be hitting LBA regions that GC is affecting.  I’m sure there are a host of other factors as well

-Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of ?
Sent: Tuesday, August 1, 2017 1:47 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.

Hi, Danielle

We also did experiments to access random blocks of 500 times,
the results seem like more stable, most of the latencies are 10-30 μs and the maximum is 284 μs.
We have tested the random method  many times and each experiment has similar results.

We wonder that why these same mechanisms, e.g., GC, journaling and checkpointing, cause different influence between the random and consecutive accessing?
Thanks a lot.

At 2017-08-01 16:05:54, "Danielle Costantino" <dcostantino(a)vmem.com> wrote:


What you are observing is the characteristics of how the drives firmware handles writes. This can be from write amplification caused by the drives GC, and journaling + checkpointing. This behavior may vary greatly between vendors and even firmware versions.


    - Danielle Costantino
________________________________
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of 储 <cjj25233(a)163.com<mailto:cjj25233(a)163.com>>
Sent: Tuesday, August 1, 2017 12:58:33 AM
To: spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>
Subject: [SPDK] A issue about maximums of write latency when we access the same block consecutively.

Hi, all
Recently, we use a demo to obverse the latency.
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent.
Additionally,  they can reach even 2-3 ms and present a periodical change.

 Why?
(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third
     and the forth accessing can reach 700-900 μs even 2-3 ms?
     I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change?

  Best wishes,
  Jiajia Chu
















[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 11723 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-01 23:49 Walker, Benjamin
  0 siblings, 0 replies; 11+ messages in thread
From: Walker, Benjamin @ 2017-08-01 23:49 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3481 bytes --]

Hi Jiajia,

I have a bunch of questions that will help me figure out what you are seeing.

1) When you say "access", do you mean read or write? The behavior of these two operations is quite different.
2) Are you using a NAND based or 3D XPoint based SSD? These again work entirely differently.
3) When you access the same block repeatedly, what's the delay between each access? None?
4) For whatever SSD you are using, have you confirmed the firmware is up to date? This can make a big difference.

Thanks,
Ben


-------- Original message --------
From: "Luse, Paul E" <paul.e.luse(a)intel.com>
Date: 8/1/17 12:42 PM (GMT-07:00)
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.

Sounds like Danielle’s theory still holds here… the impact of background tasks in SSD firmware is at least partially dependent on the location of the operation on the media , ie if your write hits an area that is being relocated as part of GC it’s going to suffer more and as GC is done in localized areas it stands to reason that random traffic will be less affected as you’re hitting as a smaller % of the IOs will be hitting LBA regions that GC is affecting.  I’m sure there are a host of other factors as well

-Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of ?
Sent: Tuesday, August 1, 2017 1:47 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.

Hi, Danielle

We also did experiments to access random blocks of 500 times,
the results seem like more stable, most of the latencies are 10-30 μs and the maximum is 284 μs.
We have tested the random method  many times and each experiment has similar results.

We wonder that why these same mechanisms, e.g., GC, journaling and checkpointing, cause different influence between the random and consecutive accessing?
Thanks a lot.

At 2017-08-01 16:05:54, "Danielle Costantino" <dcostantino(a)vmem.com> wrote:


What you are observing is the characteristics of how the drives firmware handles writes. This can be from write amplification caused by the drives GC, and journaling + checkpointing. This behavior may vary greatly between vendors and even firmware versions.


    - Danielle Costantino
________________________________
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of 储 <cjj25233(a)163.com<mailto:cjj25233(a)163.com>>
Sent: Tuesday, August 1, 2017 12:58:33 AM
To: spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>
Subject: [SPDK] A issue about maximums of write latency when we access the same block consecutively.

Hi, all
Recently, we use a demo to obverse the latency.
We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent.
Additionally,  they can reach even 2-3 ms and present a periodical change.

 Why?
(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third
     and the forth accessing can reach 700-900 μs even 2-3 ms?
     I want to know the reason why the operation difference between the first accessing and the others exists.
(2) Why the maximums of 2-3 ms have a periodical change?

  Best wishes,
  Jiajia Chu
















[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 11170 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-02 11:13 
  0 siblings, 0 replies; 11+ messages in thread
From:  @ 2017-08-02 11:13 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4557 bytes --]

Hi, Paul and Ben


We also considerate GC influences the write latencies, but have no ideas about how to demonstrate it by experiments.
Would you like to give us some advices about experiments design?
We also believe that it exists other factors which integrate to cause the abnormal phenomenon. 


Answers:
(1) "access" = write. We experiment read and write operations  respectively, 
      but only find the strange phenomenon in the writing experiments.
      The comparison of experiments can be seen in accessories.
(2) We use a NAND based SSD, Intel P3608.
(3) The result presented in accessories is produced with no delay. 
      We try to set "sleep(1)" between the two operations, but it seems does not work.
(4) We did not update the firmware...
     The device is shared by many fellows, we are afraid that the update is irreversible and affects other works.
     If it could be solved by firmware update, we also want to comprehend the reason?   


Thanks a lot.


At 2017-08-02 07:49:13, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
Hi Jiajia,


I have a bunch of questions that will help me figure out what you are seeing. 


1) When you say "access", do you mean read or write? The behavior of these two operations is quite different.
2) Are you using a NAND based or 3D XPoint based SSD? These again work entirely differently.
3) When you access the same block repeatedly, what's the delay between each access? None?
4) For whatever SSD you are using, have you confirmed the firmware is up to date? This can make a big difference.


Thanks, 
Ben




-------- Original message --------
From: "Luse, Paul E" <paul.e.luse(a)intel.com>
Date: 8/1/17 12:42 PM (GMT-07:00)
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.



Sounds like Danielle’s theory still holds here… the impact of background tasks in SSD firmware is at least partially dependent on the location of the operation on the media , ie if your write hits an area that is being relocated as part of GC it’s going to suffer more and as GC is done in localized areas it stands to reason that random traffic will be less affected as you’re hitting as a smaller % of the IOs will be hitting LBA regions that GC is affecting.  I’m sure there are a host of other factors as well

 

-Paul

 

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of ?
Sent: Tuesday, August 1, 2017 1:47 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.

 

Hi, Danielle

 

We also did experiments to access random blocks of 500 times, 

the results seem like more stable, most of the latencies are 10-30 μs and the maximum is 284 μs.

We have tested the random method  many times and each experiment has similar results.

 

We wonder that why these same mechanisms, e.g., GC, journaling and checkpointing, cause different influence between the random and consecutive accessing?      

Thanks a lot.

 

At 2017-08-01 16:05:54, "Danielle Costantino" <dcostantino(a)vmem.com> wrote:



What you are observing is the characteristics of how the drives firmware handles writes. This can be from write amplification caused by the drives GC, and journaling + checkpointing. This behavior may vary greatly between vendors and even firmware versions.

 

    - Danielle Costantino

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of 储 <cjj25233(a)163.com>
Sent: Tuesday, August 1, 2017 12:58:33 AM
To:spdk(a)lists.01.org
Subject: [SPDK] A issue about maximums of write latency when we access the same block consecutively.

 

Hi, all

Recently, we use a demo to obverse the latency.

We find that when we access the same block consecutively, the occurrence of maximum latencies will become more frequent. 

Additionally,  they can reach even 2-3 ms and present a periodical change. 

 

 Why?

(1) For the same block, the latency of the first accessing is about 10-12 μs while the second, third 

     and the forth accessing can reach 700-900 μs even 2-3 ms? 

     I want to know the reason why the operation difference between the first accessing and the others exists.

(2) Why the maximums of 2-3 ms have a periodical change?

 

  Best wishes,

  Jiajia Chu

 

 

 

 

 

 

 

 

 

 

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 12934 bytes --]

[-- Attachment #3: result_read.jpg --]
[-- Type: image/jpeg, Size: 39459 bytes --]

[-- Attachment #4: result_write.jpg --]
[-- Type: image/jpeg, Size: 49841 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-03 23:36 Walker, Benjamin
  0 siblings, 0 replies; 11+ messages in thread
From: Walker, Benjamin @ 2017-08-03 23:36 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4392 bytes --]

On Wed, 2017-08-02 at 19:13 +0800, 储 wrote:
> Answers:
> (1) "access" = write. We experiment read and write operations  respectively, 
>       but only find the strange phenomenon in the writing experiments.
>       The comparison of experiments can be seen in accessories.
> (2) We use a NAND based SSD, Intel P3608.
> (3) The result presented in accessories is produced with no delay. 
>       We try to set "sleep(1)" between the two operations, but it seems does
> not work.
> 
> At 2017-08-02 07:49:13, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
> > Hi Jiajia,
> > 
> > I have a bunch of questions that will help me figure out what you are
> > seeing. 
> > 
> > 1) When you say "access", do you mean read or write? The behavior of these
> > two operations is quite different.
> > 2) Are you using a NAND based or 3D XPoint based SSD? These again work
> > entirely differently.
> > 3) When you access the same block repeatedly, what's the delay between each
> > access? None?

I was able to verify the behavior you are seeing. I'm afraid I'm not going to be
able to give you an exact answer for your particular device - I don't have
insight into the specifics of how each SSD is implemented. I brainstormed with a
few of my colleagues though, so what I can do is give you some idea of what is
happening inside of the device that will make it clear why writing to the same
block over and over may cause performance problems.

A good mental model for an SSD is basically a log of (LBA, data) pairs. When you
write to any LBA, it just appends to the end of the log and updates an internal
map of the location of that LBA. It does this appending by buffering several
writes into RAM located on the SSD, then it sends that batch of data to the NAND
all at once. The other important understanding is that the SSD is composed of a
large number of physical NAND dies, with some number of entirely parallel NAND
channels that can handle writes. Writing to the log sends the batched data to
each channel more or less round-robin. The final thing to remember is that this
whole process is implemented in hardware, not software, so adding things like
coordination between parallel operations is not as simple as just adding a lock.

When you write the same LBA over and over, a few things could happen inside the
SSD (I don't know how your SSD specifically works). 

One possibility is that the SSD could see that the LBA is already buffered in
memory from a previous write and it could just update that memory. However, that
doesn't actually work in general. The data in that memory buffer may be
currently in use as part of a write to actual NAND, or may even be currently
being read. So the only option is to append to the end of the log for each new
write to the LBA. This could probably be coordinated with locking in software,
but remember that the SSD controller is implemented in hardware. If handling
this case makes the design far more complex, it may not be possible given power,
latency, and other budgets.

Another possibility is that the data is appended to the log for each write just
like any other I/O. However, it is still more complicated than the case where
random LBAs are being written to. Once one buffer is filled up, a write to NAND
is issued. When that write completes, it has to update the map for the location
of the LBA. If, while that write is outstanding, another buffer fills up with
new writes to the same LBA, the device has to figure out what to do. If it
submits the second NAND write to a new channel, it's then effectively racing
against the first write. If they complete out of order, the user will end up
with stale data. This case could also probably be handled by better coordination
 on the completion side, but again there is a complexity trade off when
implementing this in actual hardware. 

The easiest solution is probably to just detect if a NAND write is active for an
LBA in a given buffer, and then just queue up the next write until the one
before it finishes. That adds potentially a lot of latency, but it simplifies
the hardware design considerably.

Ultimately, I have no idea what that SSD is actually doing, but you can see that
it's fairly complex to handle this case. It is certainly more complex than
handling random I/O.

I hope that helps,
Ben

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3274 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-07  6:40 
  0 siblings, 0 replies; 11+ messages in thread
From:  @ 2017-08-07  6:40 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5096 bytes --]

Hi, Ben

We truly appreciate your concrete analysis and help in resolving the problem.


As you say, Different devices have their different and complex mechanisms. 
We do not know how the hardware processes each request actually, but we think you make sense.
Maybe it will be relieved with the hardware upgrade and development.
We hope that we can touch other devices and verify whether this problem exists in the future.
We also expect other fellows using other devices could share the test results about this issue.
Thanks a lot.   
At 2017-08-04 07:36:07, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
>On Wed, 2017-08-02 at 19:13 +0800, 储 wrote:
>> Answers:
>> (1) "access" = write. We experiment read and write operations  respectively, 
>>       but only find the strange phenomenon in the writing experiments.
>>       The comparison of experiments can be seen in accessories.
>> (2) We use a NAND based SSD, Intel P3608.
>> (3) The result presented in accessories is produced with no delay. 
>>       We try to set "sleep(1)" between the two operations, but it seems does
>> not work.
>> 
>> At 2017-08-02 07:49:13, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
>> > Hi Jiajia,
>> > 
>> > I have a bunch of questions that will help me figure out what you are
>> > seeing. 
>> > 
>> > 1) When you say "access", do you mean read or write? The behavior of these
>> > two operations is quite different.
>> > 2) Are you using a NAND based or 3D XPoint based SSD? These again work
>> > entirely differently.
>> > 3) When you access the same block repeatedly, what's the delay between each
>> > access? None?
>
>I was able to verify the behavior you are seeing. I'm afraid I'm not going to be
>able to give you an exact answer for your particular device - I don't have
>insight into the specifics of how each SSD is implemented. I brainstormed with a
>few of my colleagues though, so what I can do is give you some idea of what is
>happening inside of the device that will make it clear why writing to the same
>block over and over may cause performance problems.
>
>A good mental model for an SSD is basically a log of (LBA, data) pairs. When you
>write to any LBA, it just appends to the end of the log and updates an internal
>map of the location of that LBA. It does this appending by buffering several
>writes into RAM located on the SSD, then it sends that batch of data to the NAND
>all at once. The other important understanding is that the SSD is composed of a
>large number of physical NAND dies, with some number of entirely parallel NAND
>channels that can handle writes. Writing to the log sends the batched data to
>each channel more or less round-robin. The final thing to remember is that this
>whole process is implemented in hardware, not software, so adding things like
>coordination between parallel operations is not as simple as just adding a lock.
>
>When you write the same LBA over and over, a few things could happen inside the
>SSD (I don't know how your SSD specifically works). 
>
>One possibility is that the SSD could see that the LBA is already buffered in
>memory from a previous write and it could just update that memory. However, that
>doesn't actually work in general. The data in that memory buffer may be
>currently in use as part of a write to actual NAND, or may even be currently
>being read. So the only option is to append to the end of the log for each new
>write to the LBA. This could probably be coordinated with locking in software,
>but remember that the SSD controller is implemented in hardware. If handling
>this case makes the design far more complex, it may not be possible given power,
>latency, and other budgets.
>
>Another possibility is that the data is appended to the log for each write just
>like any other I/O. However, it is still more complicated than the case where
>random LBAs are being written to. Once one buffer is filled up, a write to NAND
>is issued. When that write completes, it has to update the map for the location
>of the LBA. If, while that write is outstanding, another buffer fills up with
>new writes to the same LBA, the device has to figure out what to do. If it
>submits the second NAND write to a new channel, it's then effectively racing
>against the first write. If they complete out of order, the user will end up
>with stale data. This case could also probably be handled by better coordination
> on the completion side, but again there is a complexity trade off when
>implementing this in actual hardware. 
>
>The easiest solution is probably to just detect if a NAND write is active for an
>LBA in a given buffer, and then just queue up the next write until the one
>before it finishes. That adds potentially a lot of latency, but it simplifies
>the hardware design considerably.
>
>Ultimately, I have no idea what that SSD is actually doing, but you can see that
>it's fairly complex to handle this case. It is certainly more complex than
>handling random I/O.
>
>I hope that helps,
>Ben

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 5595 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-09  5:34 Crane Chu
  0 siblings, 0 replies; 11+ messages in thread
From: Crane Chu @ 2017-08-09  5:34 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6469 bytes --]

Hi, Jiajia,

Here are some of my suggestion:

   1. try Secure Erase before the test. It will eliminate most of the FTL
   background operations, and put SSD into so-called FOB state.
   2. most of the enterprise-grade SSD provides Quality of Service in
   speications. For example, P3608 ensures 99.99% 4K write IO complete in 4.7
   ms when QD=1. Spec also provides another write latency value like 20us. You
   can consider them as worst case and best case respectively. But the
   distribution between them is hard to predict in non-FOB state, due to the
   diversity of firmware, host working load pattern, and NAND Flash
   characteristics.
   3. mapping granularity is 4K in most of the SSD firmware design. So, if
   LBA size is 512B in your test, it will make scenario more complex. Try to
   test with 4K aligned IO.
   4.
   http://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/

Thanks,
-Crane Chu

On Mon, Aug 7, 2017 at 2:40 PM, 储 <cjj25233(a)163.com> wrote:

> Hi, Ben
> We truly appreciate your concrete analysis and help in resolving the
> problem.
>
> As you say, Different devices have their different and complex mechanisms.
> We do not know how the hardware processes each request actually, but we
> think you make sense.
> Maybe it will be relieved with the hardware upgrade and development.
> We hope that we can touch other devices and verify whether this problem
> exists in the future.
> We also expect other fellows using other devices could share the test
> results about this issue.
> Thanks a lot.
>
> At 2017-08-04 07:36:07, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
> >On Wed, 2017-08-02 at 19:13 +0800, 储 wrote:
> >> Answers:
> >> (1) "access" = write. We experiment read and write operations  respectively,
> >>       but only find the strange phenomenon in the writing experiments.
> >>       The comparison of experiments can be seen in accessories.
> >> (2) We use a NAND based SSD, Intel P3608.
> >> (3) The result presented in accessories is produced with no delay.
> >>       We try to set "sleep(1)" between the two operations, but it seems does
> >> not work.
> >>
> >> At 2017-08-02 07:49:13, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
> >> > Hi Jiajia,
> >> >
> >> > I have a bunch of questions that will help me figure out what you are
> >> > seeing.
> >> >
> >> > 1) When you say "access", do you mean read or write? The behavior of these
> >> > two operations is quite different.
> >> > 2) Are you using a NAND based or 3D XPoint based SSD? These again work
> >> > entirely differently.
> >> > 3) When you access the same block repeatedly, what's the delay between each
> >> > access? None?
> >
> >I was able to verify the behavior you are seeing. I'm afraid I'm not going to be
> >able to give you an exact answer for your particular device - I don't have
> >insight into the specifics of how each SSD is implemented. I brainstormed with a
> >few of my colleagues though, so what I can do is give you some idea of what is
> >happening inside of the device that will make it clear why writing to the same
> >block over and over may cause performance problems.
> >
> >A good mental model for an SSD is basically a log of (LBA, data) pairs. When you
> >write to any LBA, it just appends to the end of the log and updates an internal
> >map of the location of that LBA. It does this appending by buffering several
> >writes into RAM located on the SSD, then it sends that batch of data to the NAND
> >all at once. The other important understanding is that the SSD is composed of a
> >large number of physical NAND dies, with some number of entirely parallel NAND
> >channels that can handle writes. Writing to the log sends the batched data to
> >each channel more or less round-robin. The final thing to remember is that this
> >whole process is implemented in hardware, not software, so adding things like
> >coordination between parallel operations is not as simple as just adding a lock.
> >
> >When you write the same LBA over and over, a few things could happen inside the
> >SSD (I don't know how your SSD specifically works).
> >
> >One possibility is that the SSD could see that the LBA is already buffered in
> >memory from a previous write and it could just update that memory. However, that
> >doesn't actually work in general. The data in that memory buffer may be
> >currently in use as part of a write to actual NAND, or may even be currently
> >being read. So the only option is to append to the end of the log for each new
> >write to the LBA. This could probably be coordinated with locking in software,
> >but remember that the SSD controller is implemented in hardware. If handling
> >this case makes the design far more complex, it may not be possible given power,
> >latency, and other budgets.
> >
> >Another possibility is that the data is appended to the log for each write just
> >like any other I/O. However, it is still more complicated than the case where
> >random LBAs are being written to. Once one buffer is filled up, a write to NAND
> >is issued. When that write completes, it has to update the map for the location
> >of the LBA. If, while that write is outstanding, another buffer fills up with
> >new writes to the same LBA, the device has to figure out what to do. If it
> >submits the second NAND write to a new channel, it's then effectively racing
> >against the first write. If they complete out of order, the user will end up
> >with stale data. This case could also probably be handled by better coordination
> > on the completion side, but again there is a complexity trade off when
> >implementing this in actual hardware.
> >
> >The easiest solution is probably to just detect if a NAND write is active for an
> >LBA in a given buffer, and then just queue up the next write until the one
> >before it finishes. That adds potentially a lot of latency, but it simplifies
> >the hardware design considerably.
> >
> >Ultimately, I have no idea what that SSD is actually doing, but you can see that
> >it's fairly complex to handle this case. It is certainly more complex than
> >handling random I/O.
> >
> >I hope that helps,
> >Ben
>
>
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 7569 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] A issue about maximums of write latency when we access the same block consecutively.
@ 2017-08-11 13:27 
  0 siblings, 0 replies; 11+ messages in thread
From:  @ 2017-08-11 13:27 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6891 bytes --]

Hi, Crane,


Thanks for your enlightening suggestions,
1. we are afraid that we can not do secure erasing currently...The device is shared and we need coordinating with other fellows before erasing. 
2. we did not considerate the influence of the device state deeply, and we think you make sense.  
    We will provide a new test report for the FOB state to verify your suggestion.
3. our test demo is based on the spdk/examples/nvme/hello_world/hello_world.c,and the granularity of I/O is set as 4K.
4. thanks for your knowledge sharing^_^


Best wishes,
 Jiajia Chu



At 2017-08-09 13:34:17, "Crane Chu" <cranechu(a)gmail.com> wrote:

Hi, Jiajia, 


Here are some of my suggestion:
try Secure Erase before the test. It will eliminate most of the FTL background operations, and put SSD into so-called FOB state. 
most of the enterprise-grade SSD provides Quality of Service in speications. For example, P3608 ensures 99.99% 4K write IO complete in 4.7 ms when QD=1. Spec also provides another write latency value like 20us. You can consider them as worst case and best case respectively. But the distribution between them is hard to predict in non-FOB state, due to the diversity of firmware, host working load pattern, and NAND Flash characteristics. 
mapping granularity is 4K in most of the SSD firmware design. So, if LBA size is 512B in your test, it will make scenario more complex. Try to test with 4K aligned IO.
http://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/

Thanks, 
-Crane Chu


On Mon, Aug 7, 2017 at 2:40 PM, 储 <cjj25233(a)163.com> wrote:

Hi, Ben

We truly appreciate your concrete analysis and help in resolving the problem.


As you say, Different devices have their different and complex mechanisms. 
We do not know how the hardware processes each request actually, but we think you make sense.
Maybe it will be relieved with the hardware upgrade and development.
We hope that we can touch other devices and verify whether this problem exists in the future.
We also expect other fellows using other devices could share the test results about this issue.
Thanks a lot.   
At 2017-08-04 07:36:07, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
>On Wed, 2017-08-02 at 19:13 +0800, 储 wrote:
>> Answers:
>> (1) "access" = write. We experiment read and write operations  respectively, 
>>       but only find the strange phenomenon in the writing experiments.
>>       The comparison of experiments can be seen in accessories.
>> (2) We use a NAND based SSD, Intel P3608.
>> (3) The result presented in accessories is produced with no delay. 
>>       We try to set "sleep(1)" between the two operations, but it seems does
>> not work.
>> 
>> At 2017-08-02 07:49:13, "Walker, Benjamin" <benjamin.walker(a)intel.com> wrote:
>> > Hi Jiajia,
>> > 
>> > I have a bunch of questions that will help me figure out what you are
>> > seeing. 
>> > 
>> > 1) When you say "access", do you mean read or write? The behavior of these
>> > two operations is quite different.
>> > 2) Are you using a NAND based or 3D XPoint based SSD? These again work
>> > entirely differently.
>> > 3) When you access the same block repeatedly, what's the delay between each
>> > access? None?
>
>I was able to verify the behavior you are seeing. I'm afraid I'm not going to be
>able to give you an exact answer for your particular device - I don't have
>insight into the specifics of how each SSD is implemented. I brainstormed with a
>few of my colleagues though, so what I can do is give you some idea of what is
>happening inside of the device that will make it clear why writing to the same
>block over and over may cause performance problems.
>
>A good mental model for an SSD is basically a log of (LBA, data) pairs. When you
>write to any LBA, it just appends to the end of the log and updates an internal
>map of the location of that LBA. It does this appending by buffering several
>writes into RAM located on the SSD, then it sends that batch of data to the NAND
>all at once. The other important understanding is that the SSD is composed of a
>large number of physical NAND dies, with some number of entirely parallel NAND
>channels that can handle writes. Writing to the log sends the batched data to
>each channel more or less round-robin. The final thing to remember is that this
>whole process is implemented in hardware, not software, so adding things like
>coordination between parallel operations is not as simple as just adding a lock.
>
>When you write the same LBA over and over, a few things could happen inside the
>SSD (I don't know how your SSD specifically works). 
>
>One possibility is that the SSD could see that the LBA is already buffered in
>memory from a previous write and it could just update that memory. However, that
>doesn't actually work in general. The data in that memory buffer may be
>currently in use as part of a write to actual NAND, or may even be currently
>being read. So the only option is to append to the end of the log for each new
>write to the LBA. This could probably be coordinated with locking in software,
>but remember that the SSD controller is implemented in hardware. If handling
>this case makes the design far more complex, it may not be possible given power,
>latency, and other budgets.
>
>Another possibility is that the data is appended to the log for each write just
>like any other I/O. However, it is still more complicated than the case where
>random LBAs are being written to. Once one buffer is filled up, a write to NAND
>is issued. When that write completes, it has to update the map for the location
>of the LBA. If, while that write is outstanding, another buffer fills up with
>new writes to the same LBA, the device has to figure out what to do. If it
>submits the second NAND write to a new channel, it's then effectively racing
>against the first write. If they complete out of order, the user will end up
>with stale data. This case could also probably be handled by better coordination
> on the completion side, but again there is a complexity trade off when
>implementing this in actual hardware. 
>
>The easiest solution is probably to just detect if a NAND write is active for an
>LBA in a given buffer, and then just queue up the next write until the one
>before it finishes. That adds potentially a lot of latency, but it simplifies
>the hardware design considerably.
>
>Ultimately, I have no idea what that SSD is actually doing, but you can see that
>it's fairly complex to handle this case. It is certainly more complex than
>handling random I/O.
>
>I hope that helps,
>Ben





 


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk




[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 8674 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-08-11 13:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-03 23:36 [SPDK] A issue about maximums of write latency when we access the same block consecutively Walker, Benjamin
  -- strict thread matches above, loose matches on Subject: below --
2017-08-11 13:27 
2017-08-09  5:34 Crane Chu
2017-08-07  6:40 
2017-08-02 11:13 
2017-08-01 23:49 Walker, Benjamin
2017-08-01 19:41 Luse, Paul E
2017-08-01  8:46 
2017-08-01  8:05 Danielle Costantino
2017-08-01  7:58 
2017-07-31 13:34 

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.