* [PATCH v1] doc: prog guide update for eal multi-pthread @ 2015-02-16 7:34 Cunming Liang [not found] ` <1424072050-23956-1-git-send-email-cunming.liang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Cunming Liang @ 2015-02-16 7:34 UTC (permalink / raw) To: dev-VfR2kkLFssw The patch add the multi-pthread section under EAL chapter of prog_guide. Signed-off-by: Cunming Liang <cunming.liang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> --- doc/guides/prog_guide/env_abstraction_layer.rst | 157 ++++++++++++++++++++++++ 1 file changed, 157 insertions(+) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 231e266..06bcfae 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -212,4 +212,161 @@ Memory zones can be reserved with specific start address alignment by supplying The alignment value should be a power of two and not less than the cache line size (64 bytes). Memory zones can also be reserved from either 2 MB or 1 GB hugepages, provided that both are available on the system. + +Multiple pthread +---------------- + +DPDK usually pin one pthread per core to avoid task switch overhead. It gains +performance a lot, but it's not flexible and not always efficient. + +Power management helps to improve the cpu efficient by limiting the cpu runtime frequency. +But there's more reasonable motivation to utilize the ineffective idle cycles under the full capability of cpu. + +By OS scheduing and cgroup, to each pthread on specified cpu, it can simply assign the cpu quota. +It gives another way to improve the cpu efficiency. But the prerequisite is to run DPDK execution conext from multiple pthread on one core. + +For flexibility, it's also useful to allow the pthread affinity not only to a cpu but to a cpu set. + + +EAL pthread and lcore Affinity +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In terms of lcore, it stands for an EAL execution unit in the EAL pthread. +EAL pthread indicates all the pthreads created/managed by EAL, they execute the tasks issued by *remote_launch*. +In each EAL pthread, there's a TLS called *_lcore_id* for the unique identification. +As EAL pthreads usually 1:1 bind to the physical cpu, *_lcore_id* typically equals to the cpu id. + +In multiple pthread case, EAL pthread is no longer always bind to one specific physical cpu. +It may affinity to a cpuset. Then the *_lcore_id* won't always be the same as cpu id. +So there's an EAL long option '--lcores' defined to assign the cpu affinity of lcores. +For a specified lcore id or id group, it allows to set the cpuset for that EAL pthread. + +The format pattern: + --lcores='<lcore_set>[@cpu_set][,<lcore_set>[@cpu_set],...]' + +'lcore_set' and 'cpu_set' can be a single number, range or a group. + +A number is a "digit([0-9]+)"; a range is "<number>-<number>"; a group is "(<number|range>[,<number|range>,...])". + +If not supply a '\@cpu_set', the value of 'cpu_set' uses the same value as 'lcore_set'. + + :: + + For example, "--lcores='1,2@(5-7),(3-5)@(0,2),(0,6),7-8'" which means start 9 EAL thread; + lcore 0 runs on cpuset 0x41 (cpu 0,6); + lcore 1 runs on cpuset 0x2 (cpu 1); + lcore 2 runs on cpuset 0xe0 (cpu 5,6,7); + lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2); + lcore 6 runs on cpuset 0x41 (cpu 0,6); + lcore 7 runs on cpuset 0x80 (cpu 7); + lcore 8 runs on cpuset 0x100 (cpu 8). + +By this option, for each given lcore id, the associated cpus can be assigned. +It's also compatible with the pattern of corelist('-l') option. + +non-EAL pthread support +~~~~~~~~~~~~~~~~~~~~~~~ + +It allows to use DPDK execution context in any user pthread(aka. non-EAL pthread). + +In a non-EAL pthread, the *_lcore_id* is always LCORE_ID_ANY which means it's not an EAL thread along with a valid *_lcore_id*. +Then the libraries won't take *_lcore_id* as unique id. Instead of it, some libraries use another alternative unique id(e.g. tid); +some are totaly no impact; and some work with some limitation(e.g. timer, mempool). + +All these impacts are mentioned in :ref:`known_issue_label` section. + +Public Thread API +~~~~~~~~~~~~~~~~~ + +There are two public API ``rte_thread_set_affinity()`` and ``rte_pthread_get_affinity()`` introduced for threads. +When they're used in any pthread context, the Thread Local Storage(TLS) will be set/get. + +Those TLS include *_cpuset* and *_socket_id*: + +* *_cpuset* stores the cpus bitmap to which the pthread affinity. + +* *_socket_id* stores the NUMA node of the cpuset. If the cpus in cpuset belong to different NUMA node, the *_socket_id* set to SOCKTE_ID_ANY. + + +.. _known_issue_label: + +Known Issues +~~~~~~~~~~~~ + ++ rte_mempool + + The rte_mempool uses a per-lcore cache inside mempool. + For non-EAL pthread, ``rte_lcore_id()`` will not return a valid number. + So for now, when rte_mempool is used in non-EAL pthread, the put/get operations will bypass the mempool cache. + There's performance penalty if bypassing the mempool cache. The work for none-EAL mempool cache support is in progress. + + However, there's another problem. The rte_mempool is not preemptable. This comes from rte_ring. + ++ rte_ring + + rte_ring supports multi-producer enqueue and multi-consumer dequeue. But it's non-preemptive. + + .. note:: + + The "non-preemptive" constraint means: + + - a pthread doing multi-producers enqueues on a given ring must not + be preempted by another pthread doing a multi-producer enqueue on + the same ring. + - a pthread doing multi-consumers dequeues on a given ring must not + be preempted by another pthread doing a multi-consumer dequeue on + the same ring. + + Bypassing this constraints may cause the 2nd pthread to spin until the 1st one is scheduled again. + Moreover, if the 1st pthread is preempted by a context that has an higher priority, it may even cause a dead lock. + + But it doesn't means we can't use. Just need to narrow down the situation when it's used by multi-pthread on the same core. + + 1. It CAN be used for any single-producer or single-consumer situation. + + 2. It MAY be used by multi-producer/consumer pthread whose scheduling policy are all SCHED_OTHER(cfs). User SHOULD aware of the performance penalty befor using it. + + 3. It MUST not be used by multi-producer/consumer pthread, while some of their scheduling policies is SCHED_FIFO or SCHED_RR. + + ``RTE_RING_PAUSE_REP_COUNT`` is defined for rte_ring to reduce contention. It's mainly for case 2, a yield is issued after number of times pause repeat. + + It adds a sched_yield() syscall if the thread spins for too long, waiting other thread to finish its operations on the ring. + That gives pre-empted thread a chance to proceed and finish with ring enqnue/dequeue operation. + ++ rte_timer + + It's not allowed to run ``rte_timer_manager()`` on a non-EAL pthread. But it's all right to reset/stop the timer from a non-EAL pthread. + ++ rte_log + + In non-EAL pthread, there's no per thread loglevel and logtype. It uses the global loglevel. + ++ misc + + The debug statistics of rte_ring, rte_mempool and rte_timer are not suppoted in a non-EAL pthread. + +cgroup control +~~~~~~~~~~~~~~ + +Here's a simple example, there's two pthreads(t0 and t1) doing packet IO on the same core($cpu). +We expect only 50% of CPU spend on packet IO. + + .. code:: + + mkdir /sys/fs/cgroup/cpu/pkt_io + mkdir /sys/fs/cgroup/cpuset/pkt_io + + echo $cpu > /sys/fs/cgroup/cpuset/cpuset.cpus + + echo $t0 > /sys/fs/cgroup/cpu/pkt_io/tasks + echo $t0 > /sys/fs/cgroup/cpuset/pkt_io/tasks + + echo $t1 > /sys/fs/cgroup/cpu/pkt_io/tasks + echo $t1 > /sys/fs/cgroup/cpuset/pkt_io/tasks + + cd /sys/fs/cgroup/cpu/pkt_io + echo 100000 > pkt_io/cpu.cfs_period_us + echo 50000 > pkt_io/cpu.cfs_quota_us + + .. |linuxapp_launch| image:: img/linuxapp_launch.svg -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 4+ messages in thread
[parent not found: <1424072050-23956-1-git-send-email-cunming.liang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH v1] doc: prog guide update for eal multi-pthread [not found] ` <1424072050-23956-1-git-send-email-cunming.liang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> @ 2015-02-24 19:11 ` Thomas Monjalon 2015-02-25 1:38 ` Liang, Cunming 2015-03-06 8:06 ` Butler, Siobhan A 1 sibling, 1 reply; 4+ messages in thread From: Thomas Monjalon @ 2015-02-24 19:11 UTC (permalink / raw) To: Cunming Liang; +Cc: dev-VfR2kkLFssw 2015-02-16 15:34, Cunming Liang: > The patch add the multi-pthread section under EAL chapter of prog_guide. > > Signed-off-by: Cunming Liang <cunming.liang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> I guess this documentation has been co-written with a native english? Applied, thanks ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v1] doc: prog guide update for eal multi-pthread 2015-02-24 19:11 ` Thomas Monjalon @ 2015-02-25 1:38 ` Liang, Cunming 0 siblings, 0 replies; 4+ messages in thread From: Liang, Cunming @ 2015-02-25 1:38 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev-VfR2kkLFssw@public.gmane.org I'm afraid not yet, so appreciate for any revision suggestion. > -----Original Message----- > From: Thomas Monjalon [mailto:thomas.monjalon-pdR9zngts4EAvxtiuMwx3w@public.gmane.org] > Sent: Wednesday, February 25, 2015 3:11 AM > To: Liang, Cunming > Cc: dev-VfR2kkLFssw@public.gmane.org > Subject: Re: [dpdk-dev] [PATCH v1] doc: prog guide update for eal multi-pthread > > 2015-02-16 15:34, Cunming Liang: > > The patch add the multi-pthread section under EAL chapter of prog_guide. > > > > Signed-off-by: Cunming Liang <cunming.liang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > > I guess this documentation has been co-written with a native english? > > Applied, thanks ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v1] doc: prog guide update for eal multi-pthread [not found] ` <1424072050-23956-1-git-send-email-cunming.liang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> 2015-02-24 19:11 ` Thomas Monjalon @ 2015-03-06 8:06 ` Butler, Siobhan A 1 sibling, 0 replies; 4+ messages in thread From: Butler, Siobhan A @ 2015-03-06 8:06 UTC (permalink / raw) To: Liang, Cunming, dev-VfR2kkLFssw@public.gmane.org > -----Original Message----- > From: dev [mailto:dev-bounces-VfR2kkLFssw@public.gmane.org] On Behalf Of Cunming Liang > Sent: Monday, February 16, 2015 7:34 AM > To: dev-VfR2kkLFssw@public.gmane.org > Subject: [dpdk-dev] [PATCH v1] doc: prog guide update for eal multi-pthread > > The patch add the multi-pthread section under EAL chapter of prog_guide. > > Signed-off-by: Cunming Liang <cunming.liang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > --- > doc/guides/prog_guide/env_abstraction_layer.rst | 157 > ++++++++++++++++++++++++ > 1 file changed, 157 insertions(+) > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst > b/doc/guides/prog_guide/env_abstraction_layer.rst > index 231e266..06bcfae 100644 > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > @@ -212,4 +212,161 @@ Memory zones can be reserved with specific start > address alignment by supplying The alignment value should be a power of > two and not less than the cache line size (64 bytes). > Memory zones can also be reserved from either 2 MB or 1 GB hugepages, > provided that both are available on the system. > > + > +Multiple pthread > +---------------- > + > +DPDK usually pin one pthread per core to avoid task switch overhead. It > +gains performance a lot, but it's not flexible and not always efficient. > + > +Power management helps to improve the cpu efficient by limiting the cpu > runtime frequency. > +But there's more reasonable motivation to utilize the ineffective idle cycles > under the full capability of cpu. > + > +By OS scheduing and cgroup, to each pthread on specified cpu, it can simply > assign the cpu quota. > +It gives another way to improve the cpu efficiency. But the prerequisite is to > run DPDK execution conext from multiple pthread on one core. > + > +For flexibility, it's also useful to allow the pthread affinity not only to a cpu > but to a cpu set. > + > + > +EAL pthread and lcore Affinity > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +In terms of lcore, it stands for an EAL execution unit in the EAL pthread. > +EAL pthread indicates all the pthreads created/managed by EAL, they > execute the tasks issued by *remote_launch*. > +In each EAL pthread, there's a TLS called *_lcore_id* for the unique > identification. > +As EAL pthreads usually 1:1 bind to the physical cpu, *_lcore_id* typically > equals to the cpu id. > + > +In multiple pthread case, EAL pthread is no longer always bind to one > specific physical cpu. > +It may affinity to a cpuset. Then the *_lcore_id* won't always be the same > as cpu id. > +So there's an EAL long option '--lcores' defined to assign the cpu affinity of > lcores. > +For a specified lcore id or id group, it allows to set the cpuset for that EAL > pthread. > + > +The format pattern: > + --lcores='<lcore_set>[@cpu_set][,<lcore_set>[@cpu_set],...]' > + > +'lcore_set' and 'cpu_set' can be a single number, range or a group. > + > +A number is a "digit([0-9]+)"; a range is "<number>-<number>"; a group is > "(<number|range>[,<number|range>,...])". > + > +If not supply a '\@cpu_set', the value of 'cpu_set' uses the same value as > 'lcore_set'. > + > + :: > + > + For example, "--lcores='1,2@(5-7),(3-5)@(0,2),(0,6),7-8'" which > means start 9 EAL thread; > + lcore 0 runs on cpuset 0x41 (cpu 0,6); > + lcore 1 runs on cpuset 0x2 (cpu 1); > + lcore 2 runs on cpuset 0xe0 (cpu 5,6,7); > + lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2); > + lcore 6 runs on cpuset 0x41 (cpu 0,6); > + lcore 7 runs on cpuset 0x80 (cpu 7); > + lcore 8 runs on cpuset 0x100 (cpu 8). > + > +By this option, for each given lcore id, the associated cpus can be assigned. > +It's also compatible with the pattern of corelist('-l') option. > + > +non-EAL pthread support > +~~~~~~~~~~~~~~~~~~~~~~~ > + > +It allows to use DPDK execution context in any user pthread(aka. non-EAL > pthread). > + > +In a non-EAL pthread, the *_lcore_id* is always LCORE_ID_ANY which > means it's not an EAL thread along with a valid *_lcore_id*. > +Then the libraries won't take *_lcore_id* as unique id. Instead of it, > +some libraries use another alternative unique id(e.g. tid); some are totaly > no impact; and some work with some limitation(e.g. timer, mempool). > + > +All these impacts are mentioned in :ref:`known_issue_label` section. > + > +Public Thread API > +~~~~~~~~~~~~~~~~~ > + > +There are two public API ``rte_thread_set_affinity()`` and > ``rte_pthread_get_affinity()`` introduced for threads. > +When they're used in any pthread context, the Thread Local Storage(TLS) > will be set/get. > + > +Those TLS include *_cpuset* and *_socket_id*: > + > +* *_cpuset* stores the cpus bitmap to which the pthread affinity. > + > +* *_socket_id* stores the NUMA node of the cpuset. If the cpus in > cpuset belong to different NUMA node, the *_socket_id* set to > SOCKTE_ID_ANY. > + > + > +.. _known_issue_label: > + > +Known Issues > +~~~~~~~~~~~~ > + > ++ rte_mempool > + > + The rte_mempool uses a per-lcore cache inside mempool. > + For non-EAL pthread, ``rte_lcore_id()`` will not return a valid number. > + So for now, when rte_mempool is used in non-EAL pthread, the put/get > operations will bypass the mempool cache. > + There's performance penalty if bypassing the mempool cache. The work > for none-EAL mempool cache support is in progress. > + > + However, there's another problem. The rte_mempool is not preemptable. > This comes from rte_ring. > + > ++ rte_ring > + > + rte_ring supports multi-producer enqueue and multi-consumer dequeue. > But it's non-preemptive. > + > + .. note:: > + > + The "non-preemptive" constraint means: > + > + - a pthread doing multi-producers enqueues on a given ring must not > + be preempted by another pthread doing a multi-producer enqueue on > + the same ring. > + - a pthread doing multi-consumers dequeues on a given ring must not > + be preempted by another pthread doing a multi-consumer dequeue on > + the same ring. > + > + Bypassing this constraints may cause the 2nd pthread to spin until the 1st > one is scheduled again. > + Moreover, if the 1st pthread is preempted by a context that has an higher > priority, it may even cause a dead lock. > + > + But it doesn't means we can't use. Just need to narrow down the situation > when it's used by multi-pthread on the same core. > + > + 1. It CAN be used for any single-producer or single-consumer situation. > + > + 2. It MAY be used by multi-producer/consumer pthread whose scheduling > policy are all SCHED_OTHER(cfs). User SHOULD aware of the performance > penalty befor using it. > + > + 3. It MUST not be used by multi-producer/consumer pthread, while some > of their scheduling policies is SCHED_FIFO or SCHED_RR. > + > + ``RTE_RING_PAUSE_REP_COUNT`` is defined for rte_ring to reduce > contention. It's mainly for case 2, a yield is issued after number of times > pause repeat. > + > + It adds a sched_yield() syscall if the thread spins for too long, waiting other > thread to finish its operations on the ring. > + That gives pre-empted thread a chance to proceed and finish with ring > enqnue/dequeue operation. > + > ++ rte_timer > + > + It's not allowed to run ``rte_timer_manager()`` on a non-EAL pthread. But > it's all right to reset/stop the timer from a non-EAL pthread. > + > ++ rte_log > + > + In non-EAL pthread, there's no per thread loglevel and logtype. It uses the > global loglevel. > + > ++ misc > + > + The debug statistics of rte_ring, rte_mempool and rte_timer are not > suppoted in a non-EAL pthread. > + > +cgroup control > +~~~~~~~~~~~~~~ > + > +Here's a simple example, there's two pthreads(t0 and t1) doing packet IO > on the same core($cpu). > +We expect only 50% of CPU spend on packet IO. > + > + .. code:: > + > + mkdir /sys/fs/cgroup/cpu/pkt_io > + mkdir /sys/fs/cgroup/cpuset/pkt_io > + > + echo $cpu > /sys/fs/cgroup/cpuset/cpuset.cpus > + > + echo $t0 > /sys/fs/cgroup/cpu/pkt_io/tasks > + echo $t0 > /sys/fs/cgroup/cpuset/pkt_io/tasks > + > + echo $t1 > /sys/fs/cgroup/cpu/pkt_io/tasks > + echo $t1 > /sys/fs/cgroup/cpuset/pkt_io/tasks > + > + cd /sys/fs/cgroup/cpu/pkt_io > + echo 100000 > pkt_io/cpu.cfs_period_us > + echo 50000 > pkt_io/cpu.cfs_quota_us > + > + > .. |linuxapp_launch| image:: img/linuxapp_launch.svg > -- > 1.8.1.4 Acked-by: Siobhan Butler <siobhan.a.butler-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-03-06 8:06 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-16 7:34 [PATCH v1] doc: prog guide update for eal multi-pthread Cunming Liang [not found] ` <1424072050-23956-1-git-send-email-cunming.liang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> 2015-02-24 19:11 ` Thomas Monjalon 2015-02-25 1:38 ` Liang, Cunming 2015-03-06 8:06 ` Butler, Siobhan A
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).