From: Mathieu Desnoyers <compudj@krystal.dyndns.org>
To: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>,
Nick Piggin <nickpiggin@yahoo.com.au>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Yuriy Lalym <ylalym@gmail.com>, Tejun Heo <tj@kernel.org>,
ltt-dev@lists.casi.polymtl.ca,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [ltt-dev] [PATCH] Fix dirty page accounting in redirty_page_for_writepage()
Date: Thu, 30 Apr 2009 15:41:58 -0400 [thread overview]
Message-ID: <20090430194158.GB12926@Krystal> (raw)
In-Reply-To: <alpine.DEB.1.10.0904301009530.14178@qirst.com>
* Christoph Lameter (cl@linux.com) wrote:
> On Thu, 30 Apr 2009, Mathieu Desnoyers wrote:
>
> > > The 3 variants on x86 generate the same instructions. On other platforms
> > > they would need to be able to fallback in various way depending on the
> > > availability of instructions that are atomic vs. preempt or irqs.
> > >
> >
> > The problem here, as we did figure out a while ago with the atomic
> > slub we worked on a while ago, is that if we have the following code :
> >
> > local_irq_save
> > var++
> > var++
> > local_irq_restore
> >
> > that we would like to turn into irq-safe percpu variant with this
> > semantic :
> >
> > percpu_add_irqsafe(var)
> > percpu_add_irqsafe(var)
> >
> > We are generating two irq save/restore in the fallback, which will be
> > slow.
> >
> > However, we could do the following trick :
> >
> > percpu_irqsave(flags);
> > percpu_add_irq(var);
> > percpu_add_irq(var);
> > percpu_irqrestore(flags);
>
> Hmmm.I do not remember any of those double ops in the patches that I did a
> while back for this. It does not make sense either because atomic per cpu
> ops are only atomic for a single instruction. You are trying to extend
> that so that multiple "atomic" instructions are now atomic.
>
Hrm, not exactly. So I probably chose the naming of the primitives
poorly here if my idea seems unclear. Here is what I am trying to do :
On architectures with irq-safe percpu_add :
- No need to disable interrupts at all
On archs lacking such irq-safe percpu_add :
- disabling interrupts only once for a sequence of percpu counter operations.
I tried to come up with an example in vmstat where multiple percpu ops
would be required, but I figured out that the code needs to be changed
to support percpu ops correctly. However separating
percpu_irqsave/restore from percpu_add_return_irq lets us express
__inc_zone_state and inc_zone_state cleanly, which would be difficult
otherwise.
Let's assume we change the stat_threshold values in mm/vmstat.c so they
become power of 2 (so we don't care when the u8 overflow occurs so it
becomes a free running counter). This model does not support
"overstep" (yet).
Then assume :
u8 stat_threshold_mask = pcp->stat_threshold - 1;
mm/vmstat.c :
void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
{
... assuming p references percpu "u8" counters ...
u8 p_new;
p_new = percpu_add_return_irq(p, 1);
if (unlikely(!(p_new & pcp->stat_threshold_mask)))
zone_page_state_add(pcp->stat_threshold, zone, item);
}
void inc_zone_state(struct zone *zone, enum zone_stat_item item)
{
unsigned long flags;
/*
* Disabling interrupts _only_ on architectures lacking atomic
* percpu_*_irq ops.
*/
percpu_irqsave(flags);
__inc_zone_state(zone, item);
percpu_irqrestore(flags);
}
void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
{
... assuming p references percpu "u8" counters ...
u8 p_new;
p_new = percpu_sub_return_irq(p, 1);
if (unlikely(!(p_new & pcp->stat_threshold_mask)))
zone_page_state_add(-(pcp->stat_threshold), zone, item);
}
void dec_zone_state(struct zone *zone, enum zone_stat_item item)
{
unsigned long flags;
/*
* Disabling interrupts _only_ on architectures lacking atomic
* percpu_*_irq ops.
*/
percpu_irqsave(flags);
__dec_zone_state(zone, item);
percpu_irqrestore(flags);
}
void __mod_zone_state(struct zone *zone, enum zone_stat_item item, long delta)
{
... assuming p references percpu "u8" counters ...
u8 p_new;
long overflow_delta;
p_new = percpu_add_return_irq(p, delta);
/*
* We must count the number of threshold overflow generated by
* "delta". I know, this looks rather odd.
*/
overflow_delta = ((long)p_new & ~(long)pcp->stat_threshold_mask)
- (((long)p_new - delta)
& ~(long)pcp->stat_threshold_mask);
if (unlikely(abs(overflow_delta) > pcp->stat_threshold_mask))
zone_page_state_add(glob_delta, zone, item);
}
void mod_zone_state(struct zone *zone, enum zone_stat_item item, long delta)
{
unsigned long flags;
/*
* Disabling interrupts _only_ on architectures lacking atomic
* percpu_*_irq ops.
*/
percpu_irqsave(flags);
__mod_zone_state(zone, item, detlta);
percpu_irqrestore(flags);
}
Note that all the fast-path would execute with preemption enabled if the
architecture supports irqsave percpu atomic ops.
So as we can see, if cpu ops are used on _different_ atomic counters,
then it may require multiple percpu ops in sequence. However, in the
vmstat case, given the version currently in mainline uses a sequence of
operations on the same variable, this requires re-engineering the
structure, because otherwise races with preemption would occur.
disclaimer : the code above has been written in a email client and may
not compile/work/etc etc.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2009-04-30 19:42 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-29 23:25 [PATCH] Fix dirty page accounting in redirty_page_for_writepage() Mathieu Desnoyers
2009-04-29 23:56 ` Mathieu Desnoyers
2009-04-29 23:59 ` Andrew Morton
2009-04-30 2:34 ` Mathieu Desnoyers
2009-04-30 0:06 ` Linus Torvalds
2009-04-30 2:43 ` Mathieu Desnoyers
2009-04-30 6:21 ` Ingo Molnar
2009-04-30 6:33 ` [ltt-dev] " Mathieu Desnoyers
2009-04-30 6:50 ` Ingo Molnar
2009-04-30 13:38 ` Christoph Lameter
2009-04-30 14:10 ` Ingo Molnar
2009-04-30 14:12 ` Mathieu Desnoyers
2009-04-30 14:12 ` Christoph Lameter
2009-04-30 19:41 ` Mathieu Desnoyers [this message]
2009-04-30 20:17 ` Christoph Lameter
2009-04-30 21:17 ` Mathieu Desnoyers
2009-05-01 13:44 ` Christoph Lameter
2009-05-01 19:21 ` Mathieu Desnoyers
2009-05-01 19:31 ` Christoph Lameter
2009-05-01 20:24 ` Mathieu Desnoyers
2009-05-01 20:28 ` Christoph Lameter
2009-05-01 20:43 ` Mathieu Desnoyers
2009-05-01 20:42 ` Christoph Lameter
2009-05-01 21:19 ` Mathieu Desnoyers
2009-05-02 3:00 ` Christoph Lameter
2009-05-02 7:01 ` Mathieu Desnoyers
2009-05-02 21:01 ` Mathieu Desnoyers
2009-05-04 14:08 ` Christoph Lameter
2009-05-03 2:40 ` Tejun Heo
2009-05-04 14:10 ` Christoph Lameter
2009-04-30 13:22 ` Christoph Lameter
2009-04-30 13:38 ` Ingo Molnar
2009-04-30 13:40 ` Christoph Lameter
2009-04-30 14:14 ` Ingo Molnar
2009-04-30 14:15 ` Christoph Lameter
2009-04-30 14:38 ` Ingo Molnar
2009-04-30 14:45 ` Christoph Lameter
2009-04-30 15:01 ` Ingo Molnar
2009-04-30 15:25 ` Christoph Lameter
2009-04-30 15:42 ` Ingo Molnar
2009-04-30 15:44 ` Christoph Lameter
2009-04-30 16:06 ` Ingo Molnar
2009-04-30 16:11 ` Christoph Lameter
2009-04-30 16:16 ` Linus Torvalds
2009-04-30 17:23 ` Ingo Molnar
2009-04-30 18:07 ` Christoph Lameter
2009-05-01 19:59 ` Ingo Molnar
2009-05-01 20:35 ` Christoph Lameter
2009-05-01 21:07 ` Ingo Molnar
2009-05-02 3:06 ` Christoph Lameter
2009-05-02 9:03 ` Ingo Molnar
2009-05-04 14:48 ` Christoph Lameter
2009-04-30 16:13 ` Linus Torvalds
2009-04-30 15:54 ` Ingo Molnar
2009-04-30 16:00 ` Ingo Molnar
2009-04-30 16:08 ` Christoph Lameter
2009-04-30 13:50 ` Mathieu Desnoyers
2009-04-30 13:55 ` Christoph Lameter
2009-04-30 14:32 ` Ingo Molnar
2009-04-30 14:42 ` Christoph Lameter
2009-04-30 14:59 ` Ingo Molnar
2009-04-30 16:03 ` [ltt-dev] " Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090430194158.GB12926@Krystal \
--to=compudj@krystal.dyndns.org \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ltt-dev@lists.casi.polymtl.ca \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=ylalym@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.