* Re: ask for help to implement a high performance sliding window
[not found] <51fd70b2.1a26.16033b658e1.Coremail.apprentice89@126.com>
@ 2017-12-11 18:10 ` Paul E. McKenney
0 siblings, 0 replies; only message in thread
From: Paul E. McKenney @ 2017-12-11 18:10 UTC (permalink / raw)
To: LoopJump; +Cc: perfbook
Hello, LoopJump,
Glad you like the book! Adding perfbook@vger.kernel.org on CC in case
someone there has some good ideas.
First, I do feel the need to ask if you have actually benchmarked the
system and seen maximum computation to be a problem, perhaps on a profile.
After all, there is little payoff in optimizing portions of the system
that are not bottlenecks!
Second, let me make sure that I understand the problem:
1. Your system receives sequence numbers on incoming packets.
2. Each packets is processed, after which the corresponding sequence
number should be included in a global "max" computation.
3. If the whole thing isn't protected by a global lock, we know
that the thread reading out the maximum value will get a snapshot
that immediately becomes stale. The reason for this instant
staleness is that some other thread might immediately complete
processing for another packet, and that packet's sequence number
might update the maximum.
4. I am assuming that the sequence number is large enough that
overflow is not a problem. If overflow is possible, things
become quite interesting when a given packet's processing takes
a very long time.
5. I am assuming that packets are processed very quickly. If this
assumption is incorrect and packets take (say) ten milliseconds
to process, then you should be able to use pretty much any method
of updating the maximum. But if packet processing was that slow,
you probably wouldn't be asking me this question.
Third, some thoughts on implementing this efficiently.
By #3 above, we know that the thread reading out the maximum can have
a stale value. Just how much staleness can be tolerated?
If (say) a millisecond of staleness is OK, one approach is to have each
thread maintain the maximum sequence number that it has encountered
thus far. Maintaining this per-thread value should be extremely
efficient. Then a separate thread (or function or whatever) can run
every millisecond, taking the maximum of the per-thread maxima.
Alternatively, each thread can periodically update the global maximum
based on that thread's local state.
Either way, the point is to update the global maximum less frequently
and thus have less problem with cache misses associated with such updates.
Thanx, Paul
On Fri, Dec 08, 2017 at 09:21:16AM +0800, LoopJump wrote:
> Hi Paul,
>
> I am a database kernel developer, I really love your book about parallel programming.
>
> Now I encounter a question about a high performance sliding window in a multi threads program.
>
> In my question, incoming request has a consecutive sequence(integer), these requests will be pushed into a thread pool(16 threads), what I want is to track the max_consecutive_sequence that has been processed.
>
> I tried to use a bitmap (a bit for one sequence or replace a bit with 64bytes item to avoid false sharing) to mark each sequence. Everytime a request is processed, it check if it can advance the max_consecutive_sequence and determine how many sequence it can advance.
>
> I do not know if there is a better way to implement such a sliding window.
>
> Thanks a lot for your help.
>
>
>
> LoopJump
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2017-12-11 18:10 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <51fd70b2.1a26.16033b658e1.Coremail.apprentice89@126.com>
2017-12-11 18:10 ` ask for help to implement a high performance sliding window Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox