A question of DPDK ring buffer

dev.dpdk.org archive mirror
 help / color / mirror / Atom feed

* A question of DPDK ring buffer
@ 2013-08-20  4:38 Bob Chen
       [not found] ` <tencent_2D9492351C7651E246BF3AA2-9uewiaClKEY@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Bob Chen @ 2013-08-20  4:38 UTC (permalink / raw)
  To: dev

[-- Attachment #1.1: Type: text/plain, Size: 1882 bytes --]

Hi folks,

Is there anyone who has researched the mechanism of DPDK ring buffer by any chance? I was trying to understand the multi-producer and muti-consumer scenario, the CAS(compare and swap) operation is not an obstacle to me, and from what I can tell, the method which DPDK adopts is basically lock-free, not wait-free.

What troubles me is the last wait operation when each core has fulfilled its enqueued objects and then stop there waiting the public structure prod_tail to match its private per core structure prod_head. Only if public prod_tail equals to its private prod_head, it will increase the public prod_tail to private prod_next. See below, from DPDK programmer guide.

	/*
	 * If there are other enqueues in progress that preceeded us,
	 * we need to wait for them to complete
	 */
	while (unlikely(r->prod.tail != prod_head))
		rte_pause();

The final position of public prod_tail is the same to public prod_head. That means they have reached to the initial state.

OK, here is the question: Why DPDK has to maintain that public prod_tail structure? Is it really necessary to endure a while loop here? I think in a circumstance of heavy workload, it might be very likely that a core enqueues its own data in advance, however, it needs to wait the others to finish enqueueing even though it already has nothing to do at that time. It seems to me that even if we remove the public prod_tail structure, the enqueue operation is still able to work. Because each core has known the exact enqueue position from its private prod_head and prod_next. In another word, we just need to assure the correctness of per core pointer and leave the rest of the enqueue work to that particular core. Those per core pointers they never overlap according to the CAS atomic operation, that is our assumption.

What do you reckon?

Regards,
Bob

[-- Attachment #1.2: Type: text/html, Size: 2992 bytes --]

[-- Attachment #2: D908ECAB-HDABOk3wHfDFK5wrMmnPdU6tT9ArhstX5eRjBShjQK4@public.gmane.org --]
[-- Type: application/octet-stream, Size: 11844 bytes --]

[-- Attachment #3: D908D9B8-HDABOk3wHfDFK5wrMmnPdU6tT9ArhstX5eRjBShjQK4@public.gmane.org --]
[-- Type: application/octet-stream, Size: 46451 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <tencent_2D9492351C7651E246BF3AA2-9uewiaClKEY@public.gmane.org>]

* Re: A question of DPDK ring buffer
       [not found] ` <tencent_2D9492351C7651E246BF3AA2-9uewiaClKEY@public.gmane.org>
@ 2013-08-20  8:22   ` Olivier MATZ
       [not found]     ` <5213272C.4060101-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Olivier MATZ @ 2013-08-20  8:22 UTC (permalink / raw)
  To: Bob Chen; +Cc: dev

Hello Ben,

> OK, here is the question: Why DPDK has to maintain that public prod_tail
> structure? Is it really necessary to endure a while loop here?

If you remove this wait loop, you can trigger an issue. Imagine a case
where core 0 wants to add an object in the ring: it does the CAS,
modifying prod_head. At this time it is interrupted for some reason
(maybe by the kernel) before writing the object pointer in the ring,
and thus before the modification of prod_tail.

During this time, core 1 wants to enqueue another object: it does the
CAS, then writes the object pointer, then modifies prod_head (without
waiting the core 0 as we removed the wait loop).

Now the state ring is wrong: it shows 2 objects, but one object pointer
is invalid. If you try to dequeue the objects, it will return an
bad pointer.

Of course, the interruption by the kernel should be avoided as much as
possible, but even without beeing interrupted, a similar scenario can
occur if a core is slower than another to enqueue its data (due to a
cache miss for instance, or because the first core enqueues more
objects than the other).

To convince you, I think you can remove the wait loop and run the ring
test in app/test/test_ring.c, I suppose it won't work.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <5213272C.4060101-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>]

* Re: A question of DPDK ring buffer
       [not found]     ` <5213272C.4060101-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
@ 2013-08-20  9:13       ` Chen, Bo D
       [not found]         ` <D2837EF4E9C3744B938BFA76D82FC9AB01E0987F-0J0gbvR4kTiiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Chen, Bo D @ 2013-08-20  9:13 UTC (permalink / raw)
  To: Olivier MATZ; +Cc: dev

Hi Olivier,

Please see my comments.
	do {
		prod_head = r->prod.head;
		cons_tail = r->cons.tail;
		prod_next = prod_head + n;
		success = rte_atomic32_cmpset(&r->prod.head, prod_head, prod_next);

		/*
		  * Why not enqueue data here? It would be just a couple of pointers assignment, not taking too much time. 
		  * Then the entire CAS loop contains both pointer adjustment and data enqueue, and the dequeue operation would not have a chance to interfere data producing.
    		  * The next wait loop can be removed accordingly.
		/*		

	} while (unlikely(success == 0));

	/*
	while (unlikely(r->prod.tail != prod_head))
		rte_pause();

	r->prod.tail = prod_next;
	*/

Regards,
Bob

-----Original Message-----
From: dev [mailto:dev-bounces-VfR2kkLFssw@public.gmane.org] On Behalf Of Olivier MATZ
Sent: Tuesday, August 20, 2013 4:22 PM
To: Bob Chen
Cc: dev
Subject: Re: [dpdk-dev] A question of DPDK ring buffer

Hello Ben,

> OK, here is the question: Why DPDK has to maintain that public 
> prod_tail structure? Is it really necessary to endure a while loop here?

If you remove this wait loop, you can trigger an issue. Imagine a case where core 0 wants to add an object in the ring: it does the CAS, modifying prod_head. At this time it is interrupted for some reason (maybe by the kernel) before writing the object pointer in the ring, and thus before the modification of prod_tail.

During this time, core 1 wants to enqueue another object: it does the CAS, then writes the object pointer, then modifies prod_head (without waiting the core 0 as we removed the wait loop).

Now the state ring is wrong: it shows 2 objects, but one object pointer is invalid. If you try to dequeue the objects, it will return an bad pointer.

Of course, the interruption by the kernel should be avoided as much as possible, but even without beeing interrupted, a similar scenario can occur if a core is slower than another to enqueue its data (due to a cache miss for instance, or because the first core enqueues more objects than the other).

To convince you, I think you can remove the wait loop and run the ring test in app/test/test_ring.c, I suppose it won't work.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <D2837EF4E9C3744B938BFA76D82FC9AB01E0987F-0J0gbvR4kTiiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org>]

* Re: A question of DPDK ring buffer
       [not found]         ` <D2837EF4E9C3744B938BFA76D82FC9AB01E0987F-0J0gbvR4kTiiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2013-08-21  8:31           ` Olivier MATZ
  0 siblings, 0 replies; 6+ messages in thread
From: Olivier MATZ @ 2013-08-21  8:31 UTC (permalink / raw)
  To: Chen, Bo D; +Cc: dev

Hi Bob,

> 	do {
> 		prod_head = r->prod.head;
> 		cons_tail = r->cons.tail;
> 		prod_next = prod_head + n;
> 		success = rte_atomic32_cmpset(&r->prod.head, prod_head, prod_next);
> 		
> 		/*
> 		  * Why not enqueue data here? It would be just a couple of pointers assignment, not taking too much time.
> 		  * Then the entire CAS loop contains both pointer adjustment and data enqueue, and the dequeue operation would not have a chance to interfere data producing.
>      		  * The next wait loop can be removed accordingly.
> 		/*		

You cannot enqueue your data here: before writing the objects, you must
first check that the cmpset is succesful. In your example, if the cmpset
fails, it would write the objects pointer in a zone already reserved
by another producer core.

The writing of objects must be done after the "do - while" loop, once
cmpset is succesful. But even with that, you cannot remove the wait
loop (r->prod.tail != prod_head) for the reasons described in my
previous mail.

The ring test in app/test/test_ring.c is quite stressful for the rings
so you can use it to check that your solution is working.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 6+ messages in thread

* A question of DPDK ring buffer
@ 2013-08-20  4:37 Bob Chen
       [not found] ` <tencent_44D597E64569F4DC48205129-9uewiaClKEY@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Bob Chen @ 2013-08-20  4:37 UTC (permalink / raw)
  To: dev

[-- Attachment #1.1: Type: text/plain, Size: 1882 bytes --]

Hi folks,

Is there anyone who has researched the mechanism of DPDK ring buffer by any chance? I was trying to understand the multi-producer and muti-consumer scenario, the CAS(compare and swap) operation is not an obstacle to me, and from what I can tell, the method which DPDK adopts is basically lock-free, not wait-free.

What troubles me is the last wait operation when each core has fulfilled its enqueued objects and then stop there waiting the public structure prod_tail to match its private per core structure prod_head. Only if public prod_tail equals to its private prod_head, it will increase the public prod_tail to private prod_next. See below, from DPDK programmer guide.

	/*
	 * If there are other enqueues in progress that preceeded us,
	 * we need to wait for them to complete
	 */
	while (unlikely(r->prod.tail != prod_head))
		rte_pause();

The final position of public prod_tail is the same to public prod_head. That means they have reached to the initial state.

OK, here is the question: Why DPDK has to maintain that public prod_tail structure? Is it really necessary to endure a while loop here? I think in a circumstance of heavy workload, it might be very likely that a core enqueues its own data in advance, however, it needs to wait the others to finish enqueueing even though it already has nothing to do at that time. It seems to me that even if we remove the public prod_tail structure, the enqueue operation is still able to work. Because each core has known the exact enqueue position from its private prod_head and prod_next. In another word, we just need to assure the correctness of per core pointer and leave the rest of the enqueue work to that particular core. Those per core pointers they never overlap according to the CAS atomic operation, that is our assumption.

What do you reckon?

Regards,
Bob

[-- Attachment #1.2: Type: text/html, Size: 2992 bytes --]

[-- Attachment #2: D8D96AE0-87lCl9Y/1jHcUAp93qTJ7U6tT9ArhstX5eRjBShjQK4@public.gmane.org --]
[-- Type: application/octet-stream, Size: 11844 bytes --]

[-- Attachment #3: D8E46A1B-87lCl9Y/1jHcUAp93qTJ7U6tT9ArhstX5eRjBShjQK4@public.gmane.org --]
[-- Type: application/octet-stream, Size: 46451 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <tencent_44D597E64569F4DC48205129-9uewiaClKEY@public.gmane.org>]

* Re: A question of DPDK ring buffer
       [not found] ` <tencent_44D597E64569F4DC48205129-9uewiaClKEY@public.gmane.org>
@ 2013-08-24 14:34   ` Beef
  0 siblings, 0 replies; 6+ messages in thread
From: Beef @ 2013-08-24 14:34 UTC (permalink / raw)
  To: dev

[-- Attachment #1: Type: text/plain, Size: 2144 bytes --]


cv
在 2013-8-20，12:37，"Bob Chen" <beef9999-9uewiaClKEY@public.gmane.org> 写道：

> Hi folks,
> 
> Is there anyone who has researched the mechanism of DPDK ring buffer by any chance? I was trying to understand the multi-producer and muti-consumer scenario, the CAS(compare and swap) operation is not an obstacle to me, and from what I can tell, the method which DPDK adopts is basically lock-free, not wait-free.
> 
> What troubles me is the last wait operation when each core has fulfilled its enqueued objects and then stop there waiting the public structure prod_tail to match its private per core structure prod_head. Only if public prod_tail equals to its private prod_head, it will increase the public prod_tail to private prod_next. See below, from DPDK programmer guide.
> 
> <D8E46A1B-87lCl9Y/1jHcUAp93qTJ7U6tT9ArhstX5eRjBShjQK4@public.gmane.org>
> 	/*
> 	 * If there are other enqueues in progress that preceeded us,
> 	 * we need to wait for them to complete
> 	 */
> 	while (unlikely(r->prod.tail != prod_head))
> 		rte_pause();
> 
> The final position of public prod_tail is the same to public prod_head. That means they have reached to the initial state.
> <D8D96AE0-87lCl9Y/1jHcUAp93qTJ7U6tT9ArhstX5eRjBShjQK4@public.gmane.org>
> 
> OK, here is the question: Why DPDK has to maintain that public prod_tail structure? Is it really necessary to endure a while loop here? I think in a circumstance of heavy workload, it might be very likely that a core enqueues its own data in advance, however, it needs to wait the others to finish enqueueing even though it already has nothing to do at that time. It seems to me that even if we remove the public prod_tail structure, the enqueue operation is still able to work. Because each core has known the exact enqueue position from its private prod_head and prod_next. In another word, we just need to assure the correctness of per core pointer and leave the rest of the enqueue work to that particular core. Those per core pointers they never overlap according to the CAS atomic operation, that is our assumption.
> 
> What do you reckon?
> 
> Regards,
> Bob

[-- Attachment #2: Type: text/html, Size: 3546 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-08-24 14:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-20  4:38 A question of DPDK ring buffer Bob Chen
     [not found] ` <tencent_2D9492351C7651E246BF3AA2-9uewiaClKEY@public.gmane.org>
2013-08-20  8:22   ` Olivier MATZ
     [not found]     ` <5213272C.4060101-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
2013-08-20  9:13       ` Chen, Bo D
     [not found]         ` <D2837EF4E9C3744B938BFA76D82FC9AB01E0987F-0J0gbvR4kTiiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-08-21  8:31           ` Olivier MATZ
  -- strict thread matches above, loose matches on Subject: below --
2013-08-20  4:37 Bob Chen
     [not found] ` <tencent_44D597E64569F4DC48205129-9uewiaClKEY@public.gmane.org>
2013-08-24 14:34   ` Beef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).