Re: A top 10 statistics module?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wang Jian <lark@linux.net.cn>
To: "Bill Rugolsky Jr." <brugolsky@telemetry-investments.com>
Cc: netfilter-devel@lists.netfilter.org
Subject: Re: A top 10 statistics module?
Date: Thu, 21 Apr 2005 00:14:29 +0800	[thread overview]
Message-ID: <20050420235712.03BD.LARK@linux.net.cn> (raw)
In-Reply-To: <20050420145235.GC6027@ti64.telemetry-investments.com>

Hi Bill Rugolsky Jr.,

Thanks for your input.

On Wed, 20 Apr 2005 10:52:35 -0400, "Bill Rugolsky Jr." <brugolsky@telemetry-investments.com> wrote:

> On Wed, Apr 20, 2005 at 08:40:20PM +0800, Wang Jian wrote:
> > Top10 is used to monitor a while and then disabled. It could be expensive,
> > but is useful to investigate.
> > 
> > I will implement it anyway to complete the task, but before I code, I am
> > willing to listen to any one who has comment and suggestion.
> 
> To be generally useful, the table must be bounded.  That will result in
> inaccurate order statistics, but often that doesn't matter much, if the
> table is an order of magnitude or two larger than the required order
> statistic, (i.e., 100-1000 entries for estimating the top 10).
> 
> Given that the table must be bounded, there needs to be a replacement
> policy once it fills up.  The best choice of replacement strategy of
> course depends on the distribution of new entries; the problem is similar
> to that of the code table management in certain data compression algorithms.
> Heuristics and data structures tend to be somewhat intimate, as fast
> updates are required.
> 
> A common heuristic is to prioritize the entries via a scaled frequency
> that decays with a time (or packet count, etc.) scaling constant, and
> replace the lowest frequency members in the table with the new entries.
> This heuristic lends itself to being implemented with a heap-based
> priority queue whose top is the next entry to be expired.
> Complexity: O(log N)

This is the first thing appears in my mind :)

> 
> Other heuristics derive from various queuing models, the simplest
> being "move-to-front" -- every time an entry is referenced, it is moved
> to the front of the active list; entries are expired from the
> tail of the list. Complexity: O(1)

This is very easy to implement. I will try this first :)

> 
> In the distant past I've used the scaled frequency heuristic to good
> effect on long time-series data.
> 
> Regards,
> 
> 	Bill Rugolsky



-- 
  lark

next prev parent reply	other threads:[~2005-04-20 16:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-20 12:40 A top 10 statistics module? Wang Jian
2005-04-20 14:52 ` Bill Rugolsky Jr.
2005-04-20 16:14   ` Wang Jian [this message]
     [not found] <200504201804.j3KI4WJ19340@isis.cs3-inc.com>
2005-04-21  5:25 ` Wang Jian
     [not found]   ` <16999.19485.501493.784034@isis.cs3-inc.com>
2005-04-21  7:12     ` Wang Jian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050420235712.03BD.LARK@linux.net.cn \
    --to=lark@linux.net.cn \
    --cc=brugolsky@telemetry-investments.com \
    --cc=netfilter-devel@lists.netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.