public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [Lse-tech] [RFC] [PATCH] Scalable Statistics Counters
@ 2001-12-06 16:10 Niels Christiansen
  2001-12-07  8:54 ` Dipankar Sarma
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Niels Christiansen @ 2001-12-06 16:10 UTC (permalink / raw)
  To: kiran; +Cc: lse-tech, linux-kernel


Hi Kiran,

> Are you concerned with increase in memory used per counter Here? I
suppose
> that must not be that much of an issue for a 16 processor box....

Nope, I'm concerned that if this mechanism is to be used for all counters,
the improvement in cache coherence might be less significant to the point
where the additional overhead isn't worth it.

Arjab van de Ven voiced similar concerns but he also said:

> There's several things where per cpu data is useful; low frequency
> statistics is not one of them in my opinion.

...which may be true for 4-ways and even 8-ways but when you get to
32-ways and greater, you start seeing cache problems.  That was the
case on AIX and per-cpu counters was one of the changes that helped
get the spectacular scalability on Regatta.

Anyway, since we just had a long thread going on NUMA topology, maybe
it would be proper to investigate if there is a better way, such as
using the topology to decide where to put counters?  I think so, seeing
as it is that most Intel based 8-ways and above will have at least some
NUMA in them.

> Well, I wrote a simple kernel module which just increments a shared
global
> counter a million times per processor in parallel, and compared it with
> the statctr which would be incremented a million times per processor in
> parallel..

I suspected that.  Would it be possible to do the test on the real
counters?

Niels Christiansen
IBM LTC, Kernel Performance


^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [Lse-tech] [RFC] [PATCH] Scalable Statistics Counters
@ 2001-12-09 10:57 Manfred Spraul
  2001-12-10 16:32 ` Jack Steiner
  0 siblings, 1 reply; 24+ messages in thread
From: Manfred Spraul @ 2001-12-09 10:57 UTC (permalink / raw)
  To: Jack Steiner; +Cc: linux-kernel, lse-tech

>
>
>Assuming the slab allocator manages by node, kmem_cache_alloc_node() & 
>kmem_cache_alloc_cpu() would be identical (exzcept for spelling :-). 
>Each would pick up the nodeid from the cpu_data struct, then allocate 
>from the slab cache for that node.
>

kmem_cache_alloc is simple - the complex operation is kmem_cache_free.

The current implementation
- assumes that virt_to_page() and reading one cacheline from the page 
structure is fast. Is that true for your setups?
- uses an array to batch several free calls together: If the array 
overflows, then up to 120 objects are freed in one call, to reduce 
cacheline trashing.

If virt_to_page is fast, then a NUMA allocator would be a simple 
extention of the current implementation:

* one slab chain for each node, one spinlock for each node.
* 2 per-cpu arrays for each cpu: one for "correct node" kmem_cache_free 
calls , one for "foreign node" kmem_cache_free calls.
* kmem_cache_alloc allocates from the "correct node" per-cpu array, 
fallback to the per-node slab chain, then fallback to __get_free_pages.
* kmem_cache_free checks to which node the freed object belongs and adds 
it to the appropriate per-cpu array. The array overflow function then 
sorts the objects into the correct slab chains.

If virt_to_page is slow we need a different design. Currently it's 
called in every kmem_cache_free/kfree call.

--
    Manfred


^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [Lse-tech] [RFC] [PATCH] Scalable Statistics Counters
@ 2001-12-08 17:43 Niels Christiansen
  2001-12-09 11:46 ` Anton Blanchard
  0 siblings, 1 reply; 24+ messages in thread
From: Niels Christiansen @ 2001-12-08 17:43 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: lse-tech, linux-kernel

Anton Blanchard wrote:

| > > There's several things where per cpu data is useful; low frequency
| > > statistics is not one of them in my opinion.
| >
| > ...which may be true for 4-ways and even 8-ways but when you get to
| > 32-ways and greater, you start seeing cache problems.  That was the
| > case on AIX and per-cpu counters was one of the changes that helped
| > get the spectacular scalability on Regatta.
|
| I agree there are large areas of improvement to be done wrt cacheline
| ping ponging (see my patch in 2.4.17-pre6 for one example), but we
| should do our own benchmarking and not look at what AIX has been doing.

Oh, please!  You voiced an opinion.  I presented facts.  Nobody suggested
we should not measure on Linux.  As a matter of fact, I suggested that
Kiran does tests on the real counters and he said he would.

Niels


^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [Lse-tech] [RFC] [PATCH] Scalable Statistics Counters
@ 2001-12-07  9:52 Niels Christiansen
  2001-12-07 10:10 ` Dipankar Sarma
  0 siblings, 1 reply; 24+ messages in thread
From: Niels Christiansen @ 2001-12-07  9:52 UTC (permalink / raw)
  To: dipankar; +Cc: kiran, lse-tech, linux-kernel


Hello Dikanpar,

| > Anyway, since we just had a long thread going on NUMA topology, maybe
| > it would be proper to investigate if there is a better way, such as
| > using the topology to decide where to put counters?  I think so, seeing
| > as it is that most Intel based 8-ways and above will have at least some
| > NUMA in them.
|
| It should be easy to place the counters in appropriately close
| memory if linux gets good NUMA APIs built on top of the topology
| services. If we extend kmem_cache_alloc() to allocate memory
| in a particular NUMA node, we could simply do this for placing the
| counters -
| ...
| This would put the block of counters corresponding to a CPU in
| memory local to the NUMA node. If there are more sophisticated
| APIs available for suitable memory selection, those too can be made
| use of here.
|
| Is this the kind of thing you are looking at ?

I'm no NUMA person so I can't verify your code snippet but if it does
what you say, yes, that is exactly what I meant:  We may have to deal
with both cache coherence and placement of counters in local memory.

Niels


^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [Lse-tech] [RFC] [PATCH] Scalable Statistics Counters
@ 2001-12-05 15:02 Niels Christiansen
  2001-12-06 12:33 ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 24+ messages in thread
From: Niels Christiansen @ 2001-12-05 15:02 UTC (permalink / raw)
  To: kiran; +Cc: lse-tech, linux-kernel

Hello, Kiran,

> Statistics counters are used in many places in the Linux kernel,
including
> storage, network I/O subsystems etc.  These counters are not atomic since

> accuracy is not so important. Nevertheless, frequent updation of these
> counters result in cacheline bouncing among various cpus in a multi
processor
> environment. This patch introduces a new set of interfaces, which should
> improve performance of such counters in MP environment.  This
implementation
> switches to code that is devoid of overheads for SMP if these interfaces
> are used with a UP kernel.
>
> Comments are welcome :)
>
>Regards,
>Kiran

I'm wondering about the scope of this.  My Ethernet adapter with, maybe, 20
counter fields would have 20 counters allocated for each of my 16
processors.
The only way to get the total would be to use statctr_read() to merge them.
Same for the who knows how many IP counters etc., etc.

How many and which counters were converted for the test you refer to?

I do like the idea of a uniform access mechanism, though.  It is well in
line
with my thoughts about an architected interface for topology and
instrumentation
so I'll definitely get back to you as I try to collect requirements.

Niels


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2001-12-11 23:27 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-06 16:10 [Lse-tech] [RFC] [PATCH] Scalable Statistics Counters Niels Christiansen
2001-12-07  8:54 ` Dipankar Sarma
2001-12-08 22:24   ` Paul Jackson
2001-12-09  3:46     ` Jack Steiner
2001-12-09  4:44       ` Paul Jackson
2001-12-09 17:34         ` Jack Steiner
2001-12-11 23:27           ` Paul Jackson
2001-12-07 11:39 ` Ravikiran G Thirumalai
2001-12-08 13:46 ` Anton Blanchard
  -- strict thread matches above, loose matches on Subject: below --
2001-12-09 10:57 Manfred Spraul
2001-12-10 16:32 ` Jack Steiner
2001-12-10 17:00   ` Manfred Spraul
2001-12-08 17:43 Niels Christiansen
2001-12-09 11:46 ` Anton Blanchard
2001-12-07  9:52 Niels Christiansen
2001-12-07 10:10 ` Dipankar Sarma
2001-12-05 15:02 Niels Christiansen
2001-12-06 12:33 ` Ravikiran G Thirumalai
2001-12-06 13:07   ` Arjan van de Ven
2001-12-06 14:09     ` Ravikiran G Thirumalai
2001-12-06 14:10       ` Arjan van de Ven
2001-12-06 19:35         ` Dipankar Sarma
2001-12-07 21:09     ` Alex Bligh - linux-kernel
2001-12-07 21:16       ` Arjan van de Ven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox