LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v3 2/2] powerpc: Uprobes port to powerpc
From: Ananth N Mavinakayanahalli @ 2012-08-21 11:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Srikar Dronamraju, peterz, lkml, Paul Mackerras, Anton Blanchard,
	Ingo Molnar, linuxppc-dev
In-Reply-To: <20120817150031.GA5029@redhat.com>

On Fri, Aug 17, 2012 at 05:00:31PM +0200, Oleg Nesterov wrote:
> On 08/17, Ananth N Mavinakayanahalli wrote:
> >
> > On Thu, Aug 16, 2012 at 05:21:12PM +0200, Oleg Nesterov wrote:
> >
> > > Hmm, I am not sure. is_swbp_insn(insn), as it is used in the arch agnostic
> > > code, should only return true if insn == UPROBE_SWBP_INSN (just in case,
> > > this logic needs more fixes but this is offtopic).
> >
> > I think it does...
> >
> > > If powerpc has another insn(s) which can trigger powerpc's do_int3()
> > > counterpart, they should be rejected by arch_uprobe_analyze_insn().
> > > I think.
> >
> > The insn that gets passed to arch_uprobe_analyze_insn() is copy_insn()'s
> > version, which is the file copy of the instruction.
> 
> Yes, exactly. And we are going to single-step this saved uprobe->arch.insn,
> even if gdb/whatever replaces the original insn later or already replaced.
> 
> So arch_uprobe_analyze_insn() should reject the "unsafe" instructions which
> we can't step over safely.

Agreed.

> > We should also take
> > care of the in-memory copy, in case gdb had inserted a breakpoint at the
> > same location, right?
> 
> gdb (or even the application itself) and uprobes can obviously confuse
> each other, in many ways, and we can do nothing at least currently.
> Just we should ensure that the kernel can't crash/hang/etc.

Absolutely. The proper fix for this at least from a breakpoint insertion
perspective is to educate gdb (possibly ptrace itself) to fail on a
breakpoint insertion request on an already existing one.

> > Updating is_swbp_insn() per-arch where needed will
> > take care of both the cases, 'cos it gets called before
> > arch_analyze_uprobe_insn() too.
> 
> For example. set_swbp()->is_swbp_insn() == T means that (for example)
> uprobe_register() and uprobe_mmap() raced with each other and there is
> no need for set_swbp().

This is true for Intel like architectures that have *one* swbp
instruction. On Powerpc, gdb for instance, can insert a trap variant at
the address. Therefore, is_swbp_insn() by definition should return true
for all trap variants.

> However, find_active_uprobe()->is_swbp_at_addr()->is_swbp_insn() is
> different, "true" confirms that this insn has triggered do_int3() and
> thus we need send_sig(SIGTRAP) (just in case, this is not strictly
> correct too but offtopic again).
> 
> We definitely need more changes/fixes/improvements in this area. And
> perhaps powerpc requires more changes in the arch-neutral code, I dunno.

For powerpc, just having is_swbp_insn() (already a weak function) handle
the trap variants, should suffice.

> In particular, I think is_swbp_insn() should have a single caller,
> is_swbp_at_addr(), and this caller should always play with current->mm.
> And many, many other changes in the long term.
> 
> So far I think that, if powerpc really needs to change is_swbp_insn(),
> it would be better to make another patch and discuss this change.
> But of course I can't judge.

OK. I will separate out the is_swbp_insn() change into a separate patch.

Ananth

^ permalink raw reply

* Re: [PATCH 4/8] time: Condense timekeeper.xtime into xtime_sec
From: Andreas Schwab @ 2012-08-21  7:14 UTC (permalink / raw)
  To: John Stultz
  Cc: Prarit Bhargava, Peter Zijlstra, Richard Cochran, Linux Kernel,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar
In-Reply-To: <5033029A.2030201@linaro.org>

John Stultz <john.stultz@linaro.org> writes:

> @@ -115,6 +115,7 @@ static void tk_xtime_add(struct timekeeper *tk, const struct timespec *ts)
>  {
>  	tk->xtime_sec += ts->tv_sec;
>  	tk->xtime_nsec += (u64)ts->tv_nsec << tk->shift;
> +	tk_normalize_xtime(tk);
>  }

Yes, that does it.  Failure to normalize is always bad.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* RE: [PATCH V8] powerpc/fsl-pci: Unify pci/pcie initialization code
From: Li Yang-R58472 @ 2012-08-21  6:49 UTC (permalink / raw)
  To: Jia Hongtao-B38951, Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org, Bradley Hughes
In-Reply-To: <412C8208B4A0464FA894C5F0C278CD5D01A56A61@039-SN1MPN1-002.039d.mgd.msft.net>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogSmlhIEhvbmd0YW8tQjM4
OTUxDQo+IFNlbnQ6IFR1ZXNkYXksIEF1Z3VzdCAyMSwgMjAxMiAxMToyNiBBTQ0KPiBUbzogV29v
ZCBTY290dC1CMDc0MjENCj4gQ2M6IGxpbnV4cHBjLWRldkBsaXN0cy5vemxhYnMub3JnOyBnYWxh
a0BrZXJuZWwuY3Jhc2hpbmcub3JnOyBMaSBZYW5nLQ0KPiBSNTg0NzI7IEJyYWRsZXkgSHVnaGVz
DQo+IFN1YmplY3Q6IFJFOiBbUEFUQ0ggVjhdIHBvd2VycGMvZnNsLXBjaTogVW5pZnkgcGNpL3Bj
aWUgaW5pdGlhbGl6YXRpb24NCj4gY29kZQ0KPiANCj4gDQo+IA0KPiA+IC0tLS0tT3JpZ2luYWwg
TWVzc2FnZS0tLS0tDQo+ID4gRnJvbTogV29vZCBTY290dC1CMDc0MjENCj4gPiBTZW50OiBUdWVz
ZGF5LCBBdWd1c3QgMjEsIDIwMTIgNjowNCBBTQ0KPiA+IFRvOiBKaWEgSG9uZ3Rhby1CMzg5NTEN
Cj4gPiBDYzogbGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmc7IGdhbGFrQGtlcm5lbC5jcmFz
aGluZy5vcmc7IExpIFlhbmctDQo+ID4gUjU4NDcyOyBCcmFkbGV5IEh1Z2hlcw0KPiA+IFN1Ympl
Y3Q6IFJlOiBbUEFUQ0ggVjhdIHBvd2VycGMvZnNsLXBjaTogVW5pZnkgcGNpL3BjaWUgaW5pdGlh
bGl6YXRpb24NCj4gPiBjb2RlDQo+ID4NCj4gPiBPbiAwOC8yMC8yMDEyIDA1OjA2IEFNLCBKaWEg
SG9uZ3RhbyB3cm90ZToNCj4gPiA+IFdlIHVuaWZpZWQgdGhlIEZyZWVzY2FsZSBwY2kvcGNpZSBp
bml0aWFsaXphdGlvbiBieSBjaGFuZ2luZyB0aGUNCj4gPiA+IGZzbF9wY2kgdG8gYSBwbGF0Zm9y
bSBkcml2ZXIuIEluIHByZXZpb3VzIFBDSSBjb2RlIGFyY2hpdGVjdHVyZSB0aGUNCj4gPiA+IGlu
aXRpYWxpemF0aW9uIHJvdXRpbmUgaXMgY2FsbGVkIGF0IGJvYXJkX3NldHVwX2FyY2ggc3RhZ2Uu
IE5vdyB0aGUNCj4gPiA+IGluaXRpYWxpemF0aW9uIGlzIGRvbmUgaW4gcHJvYmUgZnVuY3Rpb24g
d2hpY2ggaXMgYXJjaGl0ZWN0dXJhbA0KPiA+ID4gYmV0dGVyLiBBbHNvIEl0J3MgY29udmVuaWVu
dCBmb3IgYWRkaW5nIFBNIHN1cHBvcnQgZm9yIFBDSSBjb250cm9sbGVyDQo+ID4gaW4gbGF0ZXIg
cGF0Y2guDQo+ID4gPg0KPiA+ID4gTm93IHdlIHJlZ2lzdGVyZWQgcGNpIGNvbnRyb2xsZXJzIGFz
IHBsYXRmb3JtIGRldmljZXMuIFNvIHdlIGNvbWJpbmUNCj4gPiA+IHR3byBpbml0aWFsaXphdGlv
biBjb2RlIGFzIG9uZSBwbGF0Zm9ybSBkcml2ZXIuDQo+ID4gPg0KPiA+ID4gU2lnbmVkLW9mZi1i
eTogSmlhIEhvbmd0YW8gPEIzODk1MUBmcmVlc2NhbGUuY29tPg0KPiA+ID4gU2lnbmVkLW9mZi1i
eTogTGkgWWFuZyA8bGVvbGlAZnJlZXNjYWxlLmNvbT4NCj4gPiA+IFNpZ25lZC1vZmYtYnk6IENo
dW5oZSBMYW4gPENodW5oZS5MYW5AZnJlZXNjYWxlLmNvbT4NCj4gPiA+IC0tLQ0KPiA+ID4gQ2hh
bmdlcyBmb3IgVjg6DQo+ID4gPiAqIFVzZSBwcmV2aW91cyBwcmltYXJ5IGRldGVybWluYXRpb24u
IEJhc2VkIG9uIHRoZSBwb2ludCB0aGF0IHRoZXJlDQo+IGFyZQ0KPiA+ID4gICBidWdzIG9uIHBy
aW1hcnktbGVzcyBzeXN0ZW0uDQo+ID4gPiAqIEFkZCBleGNlcHRpb25hbCBzdXBwb3J0IG9uIGdl
X2ltcDNhIGluIHdoaWNoIHRoZSBwcmltYXJ5IGJ1cyBpcyBub3QNCj4gPiB0aGUNCj4gPiA+ICAg
Zmlyc3QgcGNpIGJ1cyBkZXRlY3RlZC4NCj4gPg0KPiA+IFRoZSBleGNlcHRpb25hbCB0aGluZyBh
Ym91dCBnZV9pbXAzYSBpcyB0aGF0IGl0IGhhcyBubyBpc2Egbm9kZSwgYnV0DQo+ID4gd2UncmUg
bm90IHN1cmUgaWYgaXQgYWN0dWFsbHkgaGFzIGlzYSBvciBub3QuICBXZSBzaG91bGQgbm90IGJl
IHJlbHlpbmcNCj4gPiBvbiBwcm9iZSBvcmRlciBpbiBhbnkgY2FzZS4gIERldmljZSB0cmVlIG5v
ZGVzIGFyZSBub3Qgb3JkZXJlZC4NCj4gDQo+IFllcy4uLiBXZSBkb24ndCBrbm93IGlmIGdlX2lt
cDNhIGFjdHVhbGx5IGhhcyBpc2EgYW5kIHN0aWxsIG5vIGFuc3dlcg0KPiBmcm9tDQo+IGJvYXJk
IG93bmVyLiBJIGp1c3Qgc2V0IHByaW1hcnkgYXMgdGhlIGJvYXJkIHVzZWQgdG8uIEF0IGxlYXN0
IHdlIGRvbid0DQo+IGRvDQo+IGFueSBoYXJtLg0KPiANCj4gPg0KPiA+IEFub3RoZXIgaW50ZXJl
c3RpbmcgY2FzZSBpcyBzdHhzc2E4NTU1LmR0cywgd2hpY2ggaGFzIGFuIGk4MjU5IG5vZGUgYnV0
DQo+ID4gbm8gSVNBIG5vZGUgKGFyZSB0aGVyZSBhbnkgb3RoZXIgaW5zdGFuY2VzIG9mIHRoaXM/
KS4gIEhvd2V2ZXIsIEkgY2FuJ3QNCj4gPiB0ZWxsIGlmIHN0eF9ncDMuYyBpcyB0aGUgcGxhdGZv
cm0gZmlsZSB0aGF0IGdvZXMgd2l0aCB0aGlzIGRldmljZSB0cmVlLA0KPiA+IG9yIGlmIHRoZSBw
bGF0Zm9ybSBjb2RlIGZvciBzdHhzc2E4NTU1IGlzIG91dC1vZi10cmVlIChvciBzb21lIG90aGVy
DQo+IGZpbGUNCj4gPiB0aGF0IEknbSBub3Qgc2VlaW5nKS4NCj4gDQo+IE1QQzg1NDFfQ0RTIGFu
ZCBNUEM4NTU1X0NEUyBhbHNvIGhhcyBpODI1OSBidXQgbm8gSVNBIG5vZGUuIHN0eF9ncDMgc2Vl
bXMNCj4gZ28NCj4gd2l0aCBzdHhzc2E4NTU1LmR0cyBidXQgSSdtIG5vdCBzdXJlIGV0aGVyLg0K
PiANCj4gU28geW91IG1lYW4gd2UgaGF2ZSB0byBsb29rIGZvciBpODI1OSB0b28gZm9yIGRldGVy
bWluaW5nIHByaW1hcnkuDQo+IFRha2UgZGV2aWNlIHRyZWUgYXMgZXZpZGVuY2Ugd2UgY2FuIHRl
bGwgdGhhdCByZWFsIHByaW1hcnkgZXRoZXIgaGFzIGlzYQ0KPiBub2RlDQo+IG9yIGk4MjU5IG5v
ZGUuIEFuZCBpZiB0aGVyZSBpcyBubyBpc2Egd2UganVzdCBhcmJpdHJhcmlseSBkZXNpZ25hdGUg
b25lLg0KPiANCj4gSWYgdGhpcyBsb2dpYyB3b3JrcyB3ZWxsIHRoZW4gb2suDQoNCklmIHRoZXJl
IGlzIGk4MjU5IG5vZGUgaW4gdGhlIGRldmljZSB0cmVlLCBpdCBzaG91bGQgYmUgc3VnZ2VzdGlu
ZyB0aGF0IHRoZXJlIGlzIGEgUENJIHRvIElTQSBicmlkZ2UgYnV0IG5vdCBleHBsaWNpdGx5IGRl
c2NyaWJlZCBpbiB0aGUgZGV2aWNlIHRyZWUuICBUaGVuIHdlIG5lZWQgdG8gZml4IHRoZSBkZXZp
Y2UgdHJlZSB0byBhZGQgdGhlIElTQSBub2Rlcy4NCg0KLSBMZW8NCj4gDQo+IA0KPiA+DQo+ID4g
PiAtdm9pZCBfX2RldmluaXQgZnNsX3BjaV9pbml0KHZvaWQpDQo+ID4gPiArdm9pZCBmc2xfcGNp
X2Fzc2lnbl9wcmltYXJ5KHZvaWQpDQo+ID4gPiAgew0KPiA+ID4gLQlpbnQgcmV0Ow0KPiA+ID4g
IAlzdHJ1Y3QgZGV2aWNlX25vZGUgKm5vZGU7DQo+ID4gPiAtCXN0cnVjdCBwY2lfY29udHJvbGxl
ciAqaG9zZTsNCj4gPiA+IC0JZG1hX2FkZHJfdCBtYXggPSAweGZmZmZmZmZmOw0KPiA+ID4NCj4g
PiA+ICAJLyogQ2FsbGVycyBjYW4gc3BlY2lmeSB0aGUgcHJpbWFyeSBidXMgdXNpbmcgb3RoZXIg
bWVhbnMuICovDQo+ID4gPiAgCWlmICghZnNsX3BjaV9wcmltYXJ5KSB7DQo+ID4NCj4gPiBTaW5j
ZSB0aGUgd2hvbGUgcG9pbnQgb2YgdGhpcyBmdW5jdGlvbiBpcyBub3cgdG8gZmluZCB0aGUgcHJp
bWFyeSwganVzdA0KPiA+IHJldHVybiBpZiBpdCdzIGFscmVhZHkgc2V0LCBpbnN0ZWFkIG9mIGlu
ZGVudGluZyB0aGUgcmVzdCBvZiB0aGUNCj4gZnVuY3Rpb24uDQo+ID4NCj4gPiA+IEBAIC04NDIs
MzggKzgzOSw2MCBAQCB2b2lkIF9fZGV2aW5pdCBmc2xfcGNpX2luaXQodm9pZCkNCj4gPiA+ICAJ
CQlub2RlID0gZnNsX3BjaV9wcmltYXJ5Ow0KPiA+ID4NCj4gPiA+ICAJCQlpZiAob2ZfbWF0Y2hf
bm9kZShwY2lfaWRzLCBub2RlKSkNCj4gPiA+IC0JCQkJYnJlYWs7DQo+ID4gPiArCQkJCXJldHVy
bjsNCj4gPiA+ICAJCX0NCj4gPiA+IC0JfQ0KPiA+ID4NCj4gPiA+IC0Jbm9kZSA9IE5VTEw7DQo+
ID4gPiAtCWZvcl9lYWNoX25vZGVfYnlfdHlwZShub2RlLCAicGNpIikgew0KPiA+ID4gLQkJaWYg
KG9mX21hdGNoX25vZGUocGNpX2lkcywgbm9kZSkpIHsNCj4gPiA+ICsJCW5vZGUgPSBvZl9maW5k
X25vZGVfYnlfdHlwZShOVUxMLCAicGNpIik7DQo+ID4gPiArCQlpZiAob2ZfbWF0Y2hfbm9kZShw
Y2lfaWRzLCBub2RlKSkNCj4gPiA+DQo+ID4NCj4gPiBXaGF0IGlmIHRoZSBub2RlIHJldHVybmVk
IGRvZXNuJ3QgbWF0Y2g/ICBJZiB5b3UncmUgY2hlY2tpbmcgZm9yIHRoaXMsDQo+ID4gaGFuZGxl
IHRoZSBlbHNlLWNhc2UgKGV2ZW4gaWYganVzdCB3aXRoIGFuIGVycm9yIG1lc3NhZ2UpLg0KPiA+
DQo+ID4gLVNjb3R0DQoNCg==

^ permalink raw reply

* Re: [PATCH 4/8] time: Condense timekeeper.xtime into xtime_sec
From: John Stultz @ 2012-08-21  3:38 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Prarit Bhargava, Peter Zijlstra, Richard Cochran, Linux Kernel,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar
In-Reply-To: <m2boi549ed.fsf@igel.home>

On 08/20/2012 01:04 PM, Andreas Schwab wrote:
> John Stultz <john.stultz@linaro.org> writes:
>
>> Huh.  Yea, that looks fine.  And without the
>> __timekeeping_inject_sleeptime() call, the system resumed ok?
> Yes, it does.

So I'm mostly still stumped on this. But I did find one possible related 
bugfix that maybe you can try?

Let me know if the patch below dodges the problem, and if not, please 
send me the JDB printk output.
(if the resume hangs without any output, add a "return;" before the 
tk_xtime_add() call between the JDB printks).

thanks
-john



diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e16af19..1a9b9c5 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -115,6 +115,7 @@ static void tk_xtime_add(struct timekeeper *tk, const struct timespec *ts)
  {
  	tk->xtime_sec += ts->tv_sec;
  	tk->xtime_nsec += (u64)ts->tv_nsec << tk->shift;
+	tk_normalize_xtime(tk);
  }
  
  static void tk_set_wall_to_mono(struct timekeeper *tk, struct timespec wtm)
@@ -695,9 +696,22 @@ static void __timekeeping_inject_sleeptime(struct timekeeper *tk,
  					"sleep delta value!\n");
  		return;
  	}
+
+	printk("JDB: pre xt %ld:%ld  wtm: %ld:%ld st: %ld:%ld\n",
+		tk_xtime(tk).tv_sec, tk_xtime(tk).tv_nsec,
+		tk->wall_to_monotonic.tv_sec,tk->wall_to_monotonic.tv_nsec,
+		tk->total_sleep_time.tv_sec, tk->total_sleep_time.tv_nsec);
+	printk("JDB: Adding %ld:%ld\n", delta->tv_sec, delta->tv_nsec);
+
  	tk_xtime_add(tk, delta);
  	tk_set_wall_to_mono(tk, timespec_sub(tk->wall_to_monotonic, *delta));
  	tk_set_sleep_time(tk, timespec_add(tk->total_sleep_time, *delta));
+
+	printk("JDB: post xt %ld:%ld  wtm: %ld:%ld st: %ld:%ld\n",
+		tk_xtime(tk).tv_sec, tk_xtime(tk).tv_nsec,
+		tk->wall_to_monotonic.tv_sec,tk->wall_to_monotonic.tv_nsec,
+		tk->total_sleep_time.tv_sec, tk->total_sleep_time.tv_nsec);
+
  }
  
  /**

^ permalink raw reply related

* RE: [PATCH V8] powerpc/fsl-pci: Unify pci/pcie initialization code
From: Jia Hongtao-B38951 @ 2012-08-21  3:27 UTC (permalink / raw)
  To: Wood Scott-B07421; +Cc: linuxppc-dev@lists.ozlabs.org, Bradley Hughes
In-Reply-To: <5032DCC8.1040005@freescale.com>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogV29vZCBTY290dC1CMDc0
MjENCj4gU2VudDogVHVlc2RheSwgQXVndXN0IDIxLCAyMDEyIDg6NTcgQU0NCj4gVG86IEppYSBI
b25ndGFvLUIzODk1MQ0KPiBDYzogbGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmc7IEJyYWRs
ZXkgSHVnaGVzDQo+IFN1YmplY3Q6IFJlOiBbUEFUQ0ggVjhdIHBvd2VycGMvZnNsLXBjaTogVW5p
ZnkgcGNpL3BjaWUgaW5pdGlhbGl6YXRpb24NCj4gY29kZQ0KPiANCj4gT24gMDgvMjAvMjAxMiAw
NTowNCBQTSwgU2NvdHQgV29vZCB3cm90ZToNCj4gPiBPbiAwOC8yMC8yMDEyIDA1OjA2IEFNLCBK
aWEgSG9uZ3RhbyB3cm90ZToNCj4gPj4gQEAgLTg0MiwzOCArODM5LDYwIEBAIHZvaWQgX19kZXZp
bml0IGZzbF9wY2lfaW5pdCh2b2lkKQ0KPiA+PiAgCQkJbm9kZSA9IGZzbF9wY2lfcHJpbWFyeTsN
Cj4gPj4NCj4gPj4gIAkJCWlmIChvZl9tYXRjaF9ub2RlKHBjaV9pZHMsIG5vZGUpKQ0KPiA+PiAt
CQkJCWJyZWFrOw0KPiA+PiArCQkJCXJldHVybjsNCj4gPj4gIAkJfQ0KPiA+PiAtCX0NCj4gPj4N
Cj4gPj4gLQlub2RlID0gTlVMTDsNCj4gPj4gLQlmb3JfZWFjaF9ub2RlX2J5X3R5cGUobm9kZSwg
InBjaSIpIHsNCj4gPj4gLQkJaWYgKG9mX21hdGNoX25vZGUocGNpX2lkcywgbm9kZSkpIHsNCj4g
Pj4gKwkJbm9kZSA9IG9mX2ZpbmRfbm9kZV9ieV90eXBlKE5VTEwsICJwY2kiKTsNCj4gPj4gKwkJ
aWYgKG9mX21hdGNoX25vZGUocGNpX2lkcywgbm9kZSkpDQo+ID4+DQo+ID4NCj4gPiBXaGF0IGlm
IHRoZSBub2RlIHJldHVybmVkIGRvZXNuJ3QgbWF0Y2g/ICBJZiB5b3UncmUgY2hlY2tpbmcgZm9y
IHRoaXMsDQo+ID4gaGFuZGxlIHRoZSBlbHNlLWNhc2UgKGV2ZW4gaWYganVzdCB3aXRoIGFuIGVy
cm9yIG1lc3NhZ2UpLg0KPiANCj4gT3IganVzdCB1c2Ugb2ZfZmluZF9tYXRjaGluZ19ub2RlKCku
DQo+IA0KPiBBbHNvLCB3ZSBwcm9iYWJseSBuZWVkIHRvIGNoZWNrIG9mX2RldmljZV9pc19hdmFp
bGFibGUoKSBoZXJlIChsaWtlDQo+IGZzbF9hZGRfYnJpZGdlIGRvZXMpLCBhbmQgbW92ZSBvbiB0
byB0aGUgbmV4dCBQQ0kgYnVzIGlmIGl0J3MgZGlzYWJsZWQuDQo+IA0KPiAtU2NvdHQNCg0KDQpJ
IGFncmVlIHdpdGggeW91LCBvZl9kZXZpY2VfaXNfYXZhaWxhYmxlIHNob3VsZCBiZSB0ZXN0IHRv
by4NCg0KLSBIb25ndGFvLg0KDQo=

^ permalink raw reply

* RE: [PATCH V8] powerpc/fsl-pci: Unify pci/pcie initialization code
From: Jia Hongtao-B38951 @ 2012-08-21  3:26 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: Bradley Hughes, linuxppc-dev@lists.ozlabs.org, Li Yang-R58472
In-Reply-To: <5032B455.3080607@freescale.com>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogV29vZCBTY290dC1CMDc0
MjENCj4gU2VudDogVHVlc2RheSwgQXVndXN0IDIxLCAyMDEyIDY6MDQgQU0NCj4gVG86IEppYSBI
b25ndGFvLUIzODk1MQ0KPiBDYzogbGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmc7IGdhbGFr
QGtlcm5lbC5jcmFzaGluZy5vcmc7IExpIFlhbmctDQo+IFI1ODQ3MjsgQnJhZGxleSBIdWdoZXMN
Cj4gU3ViamVjdDogUmU6IFtQQVRDSCBWOF0gcG93ZXJwYy9mc2wtcGNpOiBVbmlmeSBwY2kvcGNp
ZSBpbml0aWFsaXphdGlvbg0KPiBjb2RlDQo+IA0KPiBPbiAwOC8yMC8yMDEyIDA1OjA2IEFNLCBK
aWEgSG9uZ3RhbyB3cm90ZToNCj4gPiBXZSB1bmlmaWVkIHRoZSBGcmVlc2NhbGUgcGNpL3BjaWUg
aW5pdGlhbGl6YXRpb24gYnkgY2hhbmdpbmcgdGhlDQo+ID4gZnNsX3BjaSB0byBhIHBsYXRmb3Jt
IGRyaXZlci4gSW4gcHJldmlvdXMgUENJIGNvZGUgYXJjaGl0ZWN0dXJlIHRoZQ0KPiA+IGluaXRp
YWxpemF0aW9uIHJvdXRpbmUgaXMgY2FsbGVkIGF0IGJvYXJkX3NldHVwX2FyY2ggc3RhZ2UuIE5v
dyB0aGUNCj4gPiBpbml0aWFsaXphdGlvbiBpcyBkb25lIGluIHByb2JlIGZ1bmN0aW9uIHdoaWNo
IGlzIGFyY2hpdGVjdHVyYWwNCj4gPiBiZXR0ZXIuIEFsc28gSXQncyBjb252ZW5pZW50IGZvciBh
ZGRpbmcgUE0gc3VwcG9ydCBmb3IgUENJIGNvbnRyb2xsZXINCj4gaW4gbGF0ZXIgcGF0Y2guDQo+
ID4NCj4gPiBOb3cgd2UgcmVnaXN0ZXJlZCBwY2kgY29udHJvbGxlcnMgYXMgcGxhdGZvcm0gZGV2
aWNlcy4gU28gd2UgY29tYmluZQ0KPiA+IHR3byBpbml0aWFsaXphdGlvbiBjb2RlIGFzIG9uZSBw
bGF0Zm9ybSBkcml2ZXIuDQo+ID4NCj4gPiBTaWduZWQtb2ZmLWJ5OiBKaWEgSG9uZ3RhbyA8QjM4
OTUxQGZyZWVzY2FsZS5jb20+DQo+ID4gU2lnbmVkLW9mZi1ieTogTGkgWWFuZyA8bGVvbGlAZnJl
ZXNjYWxlLmNvbT4NCj4gPiBTaWduZWQtb2ZmLWJ5OiBDaHVuaGUgTGFuIDxDaHVuaGUuTGFuQGZy
ZWVzY2FsZS5jb20+DQo+ID4gLS0tDQo+ID4gQ2hhbmdlcyBmb3IgVjg6DQo+ID4gKiBVc2UgcHJl
dmlvdXMgcHJpbWFyeSBkZXRlcm1pbmF0aW9uLiBCYXNlZCBvbiB0aGUgcG9pbnQgdGhhdCB0aGVy
ZSBhcmUNCj4gPiAgIGJ1Z3Mgb24gcHJpbWFyeS1sZXNzIHN5c3RlbS4NCj4gPiAqIEFkZCBleGNl
cHRpb25hbCBzdXBwb3J0IG9uIGdlX2ltcDNhIGluIHdoaWNoIHRoZSBwcmltYXJ5IGJ1cyBpcyBu
b3QNCj4gdGhlDQo+ID4gICBmaXJzdCBwY2kgYnVzIGRldGVjdGVkLg0KPiANCj4gVGhlIGV4Y2Vw
dGlvbmFsIHRoaW5nIGFib3V0IGdlX2ltcDNhIGlzIHRoYXQgaXQgaGFzIG5vIGlzYSBub2RlLCBi
dXQNCj4gd2UncmUgbm90IHN1cmUgaWYgaXQgYWN0dWFsbHkgaGFzIGlzYSBvciBub3QuICBXZSBz
aG91bGQgbm90IGJlIHJlbHlpbmcNCj4gb24gcHJvYmUgb3JkZXIgaW4gYW55IGNhc2UuICBEZXZp
Y2UgdHJlZSBub2RlcyBhcmUgbm90IG9yZGVyZWQuDQoNClllcy4uLiBXZSBkb24ndCBrbm93IGlm
IGdlX2ltcDNhIGFjdHVhbGx5IGhhcyBpc2EgYW5kIHN0aWxsIG5vIGFuc3dlciBmcm9tDQpib2Fy
ZCBvd25lci4gSSBqdXN0IHNldCBwcmltYXJ5IGFzIHRoZSBib2FyZCB1c2VkIHRvLiBBdCBsZWFz
dCB3ZSBkb24ndCBkbw0KYW55IGhhcm0uDQoNCj4gDQo+IEFub3RoZXIgaW50ZXJlc3RpbmcgY2Fz
ZSBpcyBzdHhzc2E4NTU1LmR0cywgd2hpY2ggaGFzIGFuIGk4MjU5IG5vZGUgYnV0DQo+IG5vIElT
QSBub2RlIChhcmUgdGhlcmUgYW55IG90aGVyIGluc3RhbmNlcyBvZiB0aGlzPykuICBIb3dldmVy
LCBJIGNhbid0DQo+IHRlbGwgaWYgc3R4X2dwMy5jIGlzIHRoZSBwbGF0Zm9ybSBmaWxlIHRoYXQg
Z29lcyB3aXRoIHRoaXMgZGV2aWNlIHRyZWUsDQo+IG9yIGlmIHRoZSBwbGF0Zm9ybSBjb2RlIGZv
ciBzdHhzc2E4NTU1IGlzIG91dC1vZi10cmVlIChvciBzb21lIG90aGVyIGZpbGUNCj4gdGhhdCBJ
J20gbm90IHNlZWluZykuDQoNCk1QQzg1NDFfQ0RTIGFuZCBNUEM4NTU1X0NEUyBhbHNvIGhhcyBp
ODI1OSBidXQgbm8gSVNBIG5vZGUuIHN0eF9ncDMgc2VlbXMgZ28NCndpdGggc3R4c3NhODU1NS5k
dHMgYnV0IEknbSBub3Qgc3VyZSBldGhlci4NCg0KU28geW91IG1lYW4gd2UgaGF2ZSB0byBsb29r
IGZvciBpODI1OSB0b28gZm9yIGRldGVybWluaW5nIHByaW1hcnkuDQpUYWtlIGRldmljZSB0cmVl
IGFzIGV2aWRlbmNlIHdlIGNhbiB0ZWxsIHRoYXQgcmVhbCBwcmltYXJ5IGV0aGVyIGhhcyBpc2Eg
bm9kZQ0Kb3IgaTgyNTkgbm9kZS4gQW5kIGlmIHRoZXJlIGlzIG5vIGlzYSB3ZSBqdXN0IGFyYml0
cmFyaWx5IGRlc2lnbmF0ZSBvbmUuDQoNCklmIHRoaXMgbG9naWMgd29ya3Mgd2VsbCB0aGVuIG9r
Lg0KDQoNCj4gDQo+ID4gLXZvaWQgX19kZXZpbml0IGZzbF9wY2lfaW5pdCh2b2lkKQ0KPiA+ICt2
b2lkIGZzbF9wY2lfYXNzaWduX3ByaW1hcnkodm9pZCkNCj4gPiAgew0KPiA+IC0JaW50IHJldDsN
Cj4gPiAgCXN0cnVjdCBkZXZpY2Vfbm9kZSAqbm9kZTsNCj4gPiAtCXN0cnVjdCBwY2lfY29udHJv
bGxlciAqaG9zZTsNCj4gPiAtCWRtYV9hZGRyX3QgbWF4ID0gMHhmZmZmZmZmZjsNCj4gPg0KPiA+
ICAJLyogQ2FsbGVycyBjYW4gc3BlY2lmeSB0aGUgcHJpbWFyeSBidXMgdXNpbmcgb3RoZXIgbWVh
bnMuICovDQo+ID4gIAlpZiAoIWZzbF9wY2lfcHJpbWFyeSkgew0KPiANCj4gU2luY2UgdGhlIHdo
b2xlIHBvaW50IG9mIHRoaXMgZnVuY3Rpb24gaXMgbm93IHRvIGZpbmQgdGhlIHByaW1hcnksIGp1
c3QNCj4gcmV0dXJuIGlmIGl0J3MgYWxyZWFkeSBzZXQsIGluc3RlYWQgb2YgaW5kZW50aW5nIHRo
ZSByZXN0IG9mIHRoZSBmdW5jdGlvbi4NCj4gDQo+ID4gQEAgLTg0MiwzOCArODM5LDYwIEBAIHZv
aWQgX19kZXZpbml0IGZzbF9wY2lfaW5pdCh2b2lkKQ0KPiA+ICAJCQlub2RlID0gZnNsX3BjaV9w
cmltYXJ5Ow0KPiA+DQo+ID4gIAkJCWlmIChvZl9tYXRjaF9ub2RlKHBjaV9pZHMsIG5vZGUpKQ0K
PiA+IC0JCQkJYnJlYWs7DQo+ID4gKwkJCQlyZXR1cm47DQo+ID4gIAkJfQ0KPiA+IC0JfQ0KPiA+
DQo+ID4gLQlub2RlID0gTlVMTDsNCj4gPiAtCWZvcl9lYWNoX25vZGVfYnlfdHlwZShub2RlLCAi
cGNpIikgew0KPiA+IC0JCWlmIChvZl9tYXRjaF9ub2RlKHBjaV9pZHMsIG5vZGUpKSB7DQo+ID4g
KwkJbm9kZSA9IG9mX2ZpbmRfbm9kZV9ieV90eXBlKE5VTEwsICJwY2kiKTsNCj4gPiArCQlpZiAo
b2ZfbWF0Y2hfbm9kZShwY2lfaWRzLCBub2RlKSkNCj4gPg0KPiANCj4gV2hhdCBpZiB0aGUgbm9k
ZSByZXR1cm5lZCBkb2Vzbid0IG1hdGNoPyAgSWYgeW91J3JlIGNoZWNraW5nIGZvciB0aGlzLA0K
PiBoYW5kbGUgdGhlIGVsc2UtY2FzZSAoZXZlbiBpZiBqdXN0IHdpdGggYW4gZXJyb3IgbWVzc2Fn
ZSkuDQo+IA0KPiAtU2NvdHQNCg0K

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/usb: fix bug of CPU hang when missing USB PHY clock
From: Tabi Timur-B04825 @ 2012-08-21  2:31 UTC (permalink / raw)
  To: Liu Shengzhou-B36685
  Cc: linux-usb@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1344595712-12804-2-git-send-email-Shengzhou.Liu@freescale.com>

On Fri, Aug 10, 2012 at 5:48 AM, Shengzhou Liu
<Shengzhou.Liu@freescale.com> wrote:

> +               for (timeout =3D 1000; timeout > 0; timeout--) {
> +                       /* check PHY_CLK_VALID to get phy clk valid */
> +                       if (in_be32(non_ehci + FSL_SOC_USB_CTRL)
> +                                       & PHY_CLK_VALID)
> +                               break;
> +                       udelay(1);
> +               }

Use spin_event_timeout() instead.

--=20
Timur Tabi
Linux kernel developer at Freescale=

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/usb: fix bug of CPU hang when missing USB PHY clock
From: Kumar Gala @ 2012-08-21  1:22 UTC (permalink / raw)
  To: gregkh; +Cc: Liu Shengzhou-B36685, linux-usb,
	linuxppc-dev@lists.ozlabs.org list
In-Reply-To: <3F453DDFF675A64A89321A1F3528102178AD97@039-SN1MPN1-004.039d.mgd.msft.net>


On Aug 12, 2012, at 10:01 PM, Liu Shengzhou-B36685 wrote:

>=20
>=20
>> -----Original Message-----
>> From: Kumar Gala [mailto:galak@kernel.crashing.org]
>> Sent: Friday, August 10, 2012 9:50 PM
>> To: Liu Shengzhou-B36685
>> Cc: linuxppc-dev@lists.ozlabs.org list; linux-usb@vger.kernel.org;
>> gregkh@linuxfoundation.org
>> Subject: Re: [PATCH 2/2] powerpc/usb: fix bug of CPU hang when =
missing USB PHY
>> clock
>>=20
>>=20
>> On Aug 10, 2012, at 5:48 AM, Shengzhou Liu wrote:
>>=20
>>> when missing USB PHY clock, kernel booting up will hang during USB
>>> initialization. We should check USBGP[PHY_CLK_VALID] bit to avoid =
CPU
>>> hanging in this case.
>>>=20
>>> Signed-off-by: Shengzhou Liu <Shengzhou.Liu@freescale.com>
>>> ---
>>> drivers/usb/host/ehci-fsl.c |   63 =
++++++++++++++++++++++++++++++-----------
>> -
>>> drivers/usb/host/ehci-fsl.h |    1 +
>>> 2 files changed, 46 insertions(+), 18 deletions(-)
>>=20
>> I assume this should be considered a bug fix and be looked at for =
inclusion in
>> v3.6?
>>=20
>> - k
> [Shengzhou] Yes.=20

Greg,

ping?

- k

^ permalink raw reply

* Re: [PATCH V8] powerpc/fsl-pci: Unify pci/pcie initialization code
From: Scott Wood @ 2012-08-21  0:56 UTC (permalink / raw)
  To: Jia Hongtao; +Cc: linuxppc-dev, Bradley Hughes
In-Reply-To: <5032B455.3080607@freescale.com>

On 08/20/2012 05:04 PM, Scott Wood wrote:
> On 08/20/2012 05:06 AM, Jia Hongtao wrote:
>> @@ -842,38 +839,60 @@ void __devinit fsl_pci_init(void)
>>  			node = fsl_pci_primary;
>>  
>>  			if (of_match_node(pci_ids, node))
>> -				break;
>> +				return;
>>  		}
>> -	}
>>  
>> -	node = NULL;
>> -	for_each_node_by_type(node, "pci") {
>> -		if (of_match_node(pci_ids, node)) {
>> +		node = of_find_node_by_type(NULL, "pci");
>> +		if (of_match_node(pci_ids, node))
>>
> 
> What if the node returned doesn't match?  If you're checking for this,
> handle the else-case (even if just with an error message).

Or just use of_find_matching_node().

Also, we probably need to check of_device_is_available() here (like
fsl_add_bridge does), and move on to the next PCI bus if it's disabled.

-Scott

^ permalink raw reply

* Re: [PATCH V8] powerpc/fsl-pci: Unify pci/pcie initialization code
From: Scott Wood @ 2012-08-20 22:04 UTC (permalink / raw)
  To: Jia Hongtao; +Cc: linuxppc-dev, Bradley Hughes
In-Reply-To: <1345457161-4731-1-git-send-email-B38951@freescale.com>

On 08/20/2012 05:06 AM, Jia Hongtao wrote:
> We unified the Freescale pci/pcie initialization by changing the fsl_pci
> to a platform driver. In previous PCI code architecture the initialization
> routine is called at board_setup_arch stage. Now the initialization is done
> in probe function which is architectural better. Also It's convenient for
> adding PM support for PCI controller in later patch.
> 
> Now we registered pci controllers as platform devices. So we combine two
> initialization code as one platform driver.
> 
> Signed-off-by: Jia Hongtao <B38951@freescale.com>
> Signed-off-by: Li Yang <leoli@freescale.com>
> Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
> ---
> Changes for V8:
> * Use previous primary determination. Based on the point that there are
>   bugs on primary-less system.
> * Add exceptional support on ge_imp3a in which the primary bus is not the
>   first pci bus detected.

The exceptional thing about ge_imp3a is that it has no isa node, but
we're not sure if it actually has isa or not.  We should not be relying
on probe order in any case.  Device tree nodes are not ordered.

Another interesting case is stxssa8555.dts, which has an i8259 node but
no ISA node (are there any other instances of this?).  However, I can't
tell if stx_gp3.c is the platform file that goes with this device tree,
or if the platform code for stxssa8555 is out-of-tree (or some other
file that I'm not seeing).

> -void __devinit fsl_pci_init(void)
> +void fsl_pci_assign_primary(void)
>  {
> -	int ret;
>  	struct device_node *node;
> -	struct pci_controller *hose;
> -	dma_addr_t max = 0xffffffff;
>  
>  	/* Callers can specify the primary bus using other means. */
>  	if (!fsl_pci_primary) {

Since the whole point of this function is now to find the primary, just
return if it's already set, instead of indenting the rest of the function.

> @@ -842,38 +839,60 @@ void __devinit fsl_pci_init(void)
>  			node = fsl_pci_primary;
>  
>  			if (of_match_node(pci_ids, node))
> -				break;
> +				return;
>  		}
> -	}
>  
> -	node = NULL;
> -	for_each_node_by_type(node, "pci") {
> -		if (of_match_node(pci_ids, node)) {
> +		node = of_find_node_by_type(NULL, "pci");
> +		if (of_match_node(pci_ids, node))
>

What if the node returned doesn't match?  If you're checking for this,
handle the else-case (even if just with an error message).

-Scott

^ permalink raw reply

* Re: [PATCH 4/8] time: Condense timekeeper.xtime into xtime_sec
From: Andreas Schwab @ 2012-08-20 20:04 UTC (permalink / raw)
  To: John Stultz
  Cc: Prarit Bhargava, Peter Zijlstra, Richard Cochran, Linux Kernel,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar
In-Reply-To: <503296C1.4070906@linaro.org>

John Stultz <john.stultz@linaro.org> writes:

> Huh.  Yea, that looks fine.  And without the
> __timekeeping_inject_sleeptime() call, the system resumed ok?

Yes, it does.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH 4/8] time: Condense timekeeper.xtime into xtime_sec
From: John Stultz @ 2012-08-20 19:57 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Prarit Bhargava, Peter Zijlstra, Richard Cochran, Linux Kernel,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar
In-Reply-To: <m2fw7h4ab5.fsf@igel.home>

On 08/20/2012 12:45 PM, Andreas Schwab wrote:
> John Stultz <john.stultz@linaro.org> writes:
>
>> I'm not very familiar w/ the iBook hardware, but does it use a
>> clocksource, or does it use arch_gettimeoffset()?
> clocksource: timebase mult[3640e38e] shift[24] registered
>
>> I suspect that the casting has avoided clipping some strange values from
>> the persistent clock.
> That's my guess as well.
>
>> Could you try with the following patch against Linus' HEAD? I suspect it
>> will let the box resume (although it will seem as though no time was
>> spent in resume) and then let me know what the JDB lines print out?
> JDB: suspend_time: 1345491706:0  resume_time: 1345491737:0
> JDB: Trying to add: 31:0
>
> (Looks reasonable.)
Huh.  Yea, that looks fine.  And without the 
__timekeeping_inject_sleeptime() call, the system resumed ok?

Thanks for the testing!  I'll keep looking here.
-john

^ permalink raw reply

* Re: [PATCH 4/8] time: Condense timekeeper.xtime into xtime_sec
From: Andreas Schwab @ 2012-08-20 19:45 UTC (permalink / raw)
  To: John Stultz
  Cc: Prarit Bhargava, Peter Zijlstra, Richard Cochran, Linux Kernel,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar
In-Reply-To: <503288ED.10306@linaro.org>

John Stultz <john.stultz@linaro.org> writes:

> I'm not very familiar w/ the iBook hardware, but does it use a 
> clocksource, or does it use arch_gettimeoffset()?

clocksource: timebase mult[3640e38e] shift[24] registered

> I suspect that the casting has avoided clipping some strange values from 
> the persistent clock.

That's my guess as well.

> Could you try with the following patch against Linus' HEAD? I suspect it 
> will let the box resume (although it will seem as though no time was 
> spent in resume) and then let me know what the JDB lines print out?

JDB: suspend_time: 1345491706:0  resume_time: 1345491737:0
JDB: Trying to add: 31:0

(Looks reasonable.)

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH 4/8] time: Condense timekeeper.xtime into xtime_sec
From: John Stultz @ 2012-08-20 18:58 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Prarit Bhargava, Peter Zijlstra, Richard Cochran, Linux Kernel,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar
In-Reply-To: <m2fw7ilhnm.fsf@igel.home>

On 08/19/2012 02:02 PM, Andreas Schwab wrote:
> John Stultz <john.stultz@linaro.org> writes:
>
>> The timekeeper struct has a xtime_nsec, which keeps the
>> sub-nanosecond remainder.  This ends up being somewhat
>> duplicative of the timekeeper.xtime.tv_nsec value, and we
>> have to do extra work to keep them apart, copying the full
>> nsec portion out and back in over and over.
>>
>> This patch simplifies some of the logic by taking the timekeeper
>> xtime value and splitting it into timekeeper.xtime_sec and
>> reuses the timekeeper.xtime_nsec for the sub-second portion
>> (stored in higher res shifted nanoseconds).
>>
>> This simplifies some of the accumulation logic. And will
>> allow for more accurate timekeeping once the vsyscall code
>> is updated to use the shifted nanosecond remainder.
> This (together with b44d50d "time: Fix casting issue in tk_set_xtime and
> tk_xtime_add") is causing resume to hang on the iBook (PowerBook6,7).
> The fact that the add-on commit is needed to uncover the bug might give
> a hint, but I'm unable to decipher it.

Thanks for the bug report and narrowing this down.

I'm not very familiar w/ the iBook hardware, but does it use a 
clocksource, or does it use arch_gettimeoffset()?

I suspect that the casting has avoided clipping some strange values from 
the persistent clock.

Could you try with the following patch against Linus' HEAD? I suspect it 
will let the box resume (although it will seem as though no time was 
spent in resume) and then let me know what the JDB lines print out?

thanks
-john

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e16af19..03f5a82 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -753,10 +753,14 @@ static void timekeeping_resume(void)
  	clocksource_resume();
  
  	write_seqlock_irqsave(&tk->lock, flags);
+	printk("JDB: suspend_time: %ld:%ld  resume_time: %ld:%ld\n",
+		timekeeping_suspend_time.tv_sec, timekeeping_suspend_time.tv_nsec,
+		ts.tv_sec, ts.tv_nsec);
  
  	if (timespec_compare(&ts, &timekeeping_suspend_time) > 0) {
  		ts = timespec_sub(ts, timekeeping_suspend_time);
-		__timekeeping_inject_sleeptime(tk, &ts);
+		printk("JDB: Trying to add: %ld:%ld\n", ts.tv_sec, ts.tv_nsec);
+		//__timekeeping_inject_sleeptime(tk, &ts);
  	}
  	/* re-base the last cycle value */
  	tk->clock->cycle_last = tk->clock->read(tk->clock);

^ permalink raw reply related

* Re: [RFC V7 PATCH 18/19] memory-hotplug: add node_device_release
From: Jianguo Wu @ 2012-08-20 14:09 UTC (permalink / raw)
  To: wency
  Cc: linux-s390, linux-ia64, linux-acpi, len.brown, linux-sh,
	linux-kernel, cmetcalf, linux-mm, isimatu.yasuaki, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <1345455342-27752-19-git-send-email-wency@cn.fujitsu.com>

On 2012/8/20 17:35, wency@cn.fujitsu.com wrote:
> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> When calling unregister_node(), the function shows following message at
> device_release().
> 
> Device 'node2' does not have a release() function, it is broken and must be
> fixed.
> 
> So the patch implements node_device_release()
> 
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  drivers/base/node.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index af1a177..9bc2f57 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -252,6 +252,13 @@ static inline void hugetlb_register_node(struct node *node) {}
>  static inline void hugetlb_unregister_node(struct node *node) {}
>  #endif
>  
> +static void node_device_release(struct device *dev)
> +{
> +	struct node *node_dev = to_node(dev);
> +
> +	flush_work(&node_dev->node_work);

Hi Congyang,
	I think this should be:
#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS)
	flush_work(&node_dev->node_work);
#endif

	As struct node defined in node.h:
struct node {
	struct sys_device	sysdev;

#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS)
	struct work_struct	node_work;
#endif
};

	Thanks
	Jianguo Wu

> +	memset(node_dev, 0, sizeof(struct node));
> +}
>  
>  /*
>   * register_node - Setup a sysfs device for a node.
> @@ -265,6 +272,7 @@ int register_node(struct node *node, int num, struct node *parent)
>  
>  	node->dev.id = num;
>  	node->dev.bus = &node_subsys;
> +	node->dev.release = node_device_release;
>  	error = device_register(&node->dev);
>  
>  	if (!error){
> 

^ permalink raw reply

* [PATCH v4 5/8] x86: Add clear_page_nocache
From: Kirill A. Shutemov @ 2012-08-20 13:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <1345470757-12005-1-git-send-email-kirill.shutemov@linux.intel.com>

From: Andi Kleen <ak@linux.intel.com>

Add a cache avoiding version of clear_page. Straight forward integer variant
of the existing 64bit clear_page, for both 32bit and 64bit.

Also add the necessary glue for highmem including a layer that non cache
coherent architectures that use the virtual address for flushing can
hook in. This is not needed on x86 of course.

If an architecture wants to provide cache avoiding version of clear_page
it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/page.h      |    2 +
 arch/x86/include/asm/string_32.h |    5 +++
 arch/x86/include/asm/string_64.h |    5 +++
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/clear_page_32.S     |   72 ++++++++++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_64.S     |   29 +++++++++++++++
 arch/x86/mm/fault.c              |    7 ++++
 7 files changed, 122 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_32.S

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 8ca8283..aa83a1b 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -29,6 +29,8 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	copy_page(to, from);
 }
 
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr);
+
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index 3d3e835..3f2fbcf 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Let gcc decide whether to inline or use the out of line functions */
 
 #define __HAVE_ARCH_STRCPY
@@ -337,6 +339,9 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern,
 #define __HAVE_ARCH_MEMSCAN
 extern void *memscan(void *addr, int c, size_t size);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_32_H */
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 19e2c46..ca23d1d 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Written 2002 by Andi Kleen */
 
 /* Only used for special circumstances. Stolen from i386/string.h */
@@ -63,6 +65,9 @@ char *strcpy(char *dest, const char *src);
 char *strcat(char *dest, const char *src);
 int strcmp(const char *cs, const char *ct);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_64_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index b00f678..14e47a2 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,6 +23,7 @@ lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_SMP) += rwlock.o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-y += clear_page_$(BITS).o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o
 
@@ -40,7 +41,7 @@ endif
 else
         obj-y += iomap_copy_64.o
         lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
-        lib-y += thunk_64.o clear_page_64.o copy_page_64.o
+        lib-y += thunk_64.o copy_page_64.o
         lib-y += memmove_64.o memset_64.o
         lib-y += copy_user_64.o copy_user_nocache_64.o
 	lib-y += cmpxchg16b_emu.o
diff --git a/arch/x86/lib/clear_page_32.S b/arch/x86/lib/clear_page_32.S
new file mode 100644
index 0000000..9592161
--- /dev/null
+++ b/arch/x86/lib/clear_page_32.S
@@ -0,0 +1,72 @@
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeature.h>
+#include <asm/dwarf2.h>
+
+/*
+ * Fallback version if SSE2 is not avaible.
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	mov    %eax,%edx
+	xorl   %eax,%eax
+	movl   $4096/32,%ecx
+	.p2align 4
+.Lloop:
+	decl	%ecx
+#define PUT(x) mov %eax,x*4(%edx)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	lea	32(%edx),%edx
+	jnz	.Lloop
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
+
+	.section .altinstr_replacement,"ax"
+1:      .byte 0xeb /* jmp <disp8> */
+	.byte (clear_page_nocache_sse2 - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_XMM2,\
+				16, 2b-1b
+	.previous
+
+/*
+ * Zero a page avoiding the caches
+ * eax	page
+ */
+ENTRY(clear_page_nocache_sse2)
+	CFI_STARTPROC
+	mov    %eax,%edx
+	xorl   %eax,%eax
+	movl   $4096/32,%ecx
+	.p2align 4
+.Lloop_sse2:
+	decl	%ecx
+#define PUT(x) movnti %eax,x*4(%edx)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	lea	32(%edx),%edx
+	jnz	.Lloop_sse2
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_sse2)
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index f2145cf..9d2f3c2 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -40,6 +40,7 @@ ENTRY(clear_page)
 	PUT(5)
 	PUT(6)
 	PUT(7)
+#undef PUT
 	leaq	64(%rdi),%rdi
 	jnz	.Lloop
 	nop
@@ -71,3 +72,31 @@ ENDPROC(clear_page)
 	altinstruction_entry clear_page,2b,X86_FEATURE_ERMS,   \
 			     .Lclear_page_end-clear_page,3b-2b
 	.previous
+
+/*
+ * Zero a page avoiding the caches
+ * rdi	page
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	xorl   %eax,%eax
+	movl   $4096/64,%ecx
+	.p2align 4
+.Lloop_nocache:
+	decl	%ecx
+#define PUT(x) movnti %rax,x*8(%rdi)
+	movnti %rax,(%rdi)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	leaq	64(%rdi),%rdi
+	jnz	.Lloop_nocache
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 76dcd9d..d8cf231 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1209,3 +1209,10 @@ good_area:
 
 	up_read(&mm->mmap_sem);
 }
+
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr)
+{
+	void *p = kmap_atomic(page);
+	clear_page_nocache(p);
+	kunmap_atomic(p);
+}
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v4 8/8] mm: implement vm.clear_huge_page_nocache sysctl
From: Kirill A. Shutemov @ 2012-08-20 13:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <1345470757-12005-1-git-send-email-kirill.shutemov@linux.intel.com>

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

In some cases cache avoiding clearing huge page may slow down workload.
Let's provide an sysctl handle to disable it.

We use static_key here to avoid extra work on fast path.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/sysctl/vm.txt |   13 ++++++++++++
 include/linux/mm.h          |    5 ++++
 kernel/sysctl.c             |   12 +++++++++++
 mm/memory.c                 |   44 +++++++++++++++++++++++++++++++++++++-----
 4 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 078701f..9559a97 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -19,6 +19,7 @@ files can be found in mm/swap.c.
 Currently, these files are in /proc/sys/vm:
 
 - block_dump
+- clear_huge_page_nocache
 - compact_memory
 - dirty_background_bytes
 - dirty_background_ratio
@@ -74,6 +75,18 @@ huge pages although processes will also directly compact memory as required.
 
 ==============================================================
 
+clear_huge_page_nocache
+
+Available only when the architecture provides ARCH_HAS_USER_NOCACHE and
+CONFIG_TRANSPARENT_HUGEPAGE or CONFIG_HUGETLBFS is set.
+
+When set to 1 (default) kernel will use cache avoiding clear routine for
+clearing huge pages. This minimize cache pollution.
+When set to 0 kernel will clear huge pages through cache. This may speed
+up some workloads. Also it's useful for benchmarking propose.
+
+==============================================================
+
 dirty_background_bytes
 
 Contains the amount of dirty memory at which the background kernel
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2858723..9b48f43 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1643,6 +1643,11 @@ extern void clear_huge_page(struct page *page,
 extern void copy_user_huge_page(struct page *dst, struct page *src,
 				unsigned long addr, struct vm_area_struct *vma,
 				unsigned int pages_per_huge_page);
+#ifdef ARCH_HAS_USER_NOCACHE
+extern int sysctl_clear_huge_page_nocache;
+extern int clear_huge_page_nocache_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *length, loff_t *ppos);
+#endif
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 87174ef..80ccc67 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1366,6 +1366,18 @@ static struct ctl_table vm_table[] = {
 		.extra2		= &one,
 	},
 #endif
+#if defined(ARCH_HAS_USER_NOCACHE) && \
+	(defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS))
+	{
+		.procname	= "clear_huge_page_nocache",
+		.data		= &sysctl_clear_huge_page_nocache,
+		.maxlen		= sizeof(sysctl_clear_huge_page_nocache),
+		.mode		= 0644,
+		.proc_handler	= clear_huge_page_nocache_handler,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif
 	{ }
 };
 
diff --git a/mm/memory.c b/mm/memory.c
index 625ca33..395d574 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -57,6 +57,7 @@
 #include <linux/swapops.h>
 #include <linux/elf.h>
 #include <linux/gfp.h>
+#include <linux/static_key.h>
 
 #include <asm/io.h>
 #include <asm/pgalloc.h>
@@ -3970,12 +3971,43 @@ EXPORT_SYMBOL(might_fault);
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
 
-#ifndef ARCH_HAS_USER_NOCACHE
-#define ARCH_HAS_USER_NOCACHE 0
-#endif
+#ifdef ARCH_HAS_USER_NOCACHE
+int sysctl_clear_huge_page_nocache = 1;
+static DEFINE_MUTEX(sysctl_clear_huge_page_nocache_lock);
+static struct static_key clear_huge_page_nocache __read_mostly =
+	STATIC_KEY_INIT_TRUE;
 
-#if ARCH_HAS_USER_NOCACHE == 0
+static inline int is_nocache_enabled(void)
+{
+	return static_key_true(&clear_huge_page_nocache);
+}
+
+int clear_huge_page_nocache_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *length, loff_t *ppos)
+{
+	int orig_value = sysctl_clear_huge_page_nocache;
+	int ret;
+
+	mutex_lock(&sysctl_clear_huge_page_nocache_lock);
+	orig_value = sysctl_clear_huge_page_nocache;
+	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
+	if (!ret && write && sysctl_clear_huge_page_nocache != orig_value) {
+		if (sysctl_clear_huge_page_nocache)
+			static_key_slow_inc(&clear_huge_page_nocache);
+		else
+			static_key_slow_dec(&clear_huge_page_nocache);
+	}
+	mutex_unlock(&sysctl_clear_huge_page_nocache_lock);
+
+	return ret;
+}
+#else
 #define clear_user_highpage_nocache clear_user_highpage
+
+static inline int is_nocache_enabled(void)
+{
+	return 0;
+}
 #endif
 
 static void clear_gigantic_page(struct page *page,
@@ -3991,7 +4023,7 @@ static void clear_gigantic_page(struct page *page,
 	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
 			i++, p = mem_map_next(p, page, i), vaddr += PAGE_SIZE) {
 		cond_resched();
-		if (!ARCH_HAS_USER_NOCACHE  || i == target)
+		if (!is_nocache_enabled() || i == target)
 			clear_user_highpage(p, vaddr);
 		else
 			clear_user_highpage_nocache(p, vaddr);
@@ -4015,7 +4047,7 @@ void clear_huge_page(struct page *page,
 	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
 			i++, page++, vaddr += PAGE_SIZE) {
 		cond_resched();
-		if (!ARCH_HAS_USER_NOCACHE || i == target)
+		if (!is_nocache_enabled() || i == target)
 			clear_user_highpage(page, vaddr);
 		else
 			clear_user_highpage_nocache(page, vaddr);
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v4 7/8] x86: switch the 64bit uncached page clear to SSE/AVX v2
From: Kirill A. Shutemov @ 2012-08-20 13:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <1345470757-12005-1-git-send-email-kirill.shutemov@linux.intel.com>

From: Andi Kleen <ak@linux.intel.com>

With multiple threads vector stores are more efficient, so use them.
This will cause the page clear to run non preemptable and add some
overhead. However on 32bit it was already non preempable (due to
kmap_atomic) and there is an preemption opportunity every 4K unit.

On a NPB (Nasa Parallel Benchmark) 128GB run on a Westmere this improves
the performance regression of enabling transparent huge pages
by ~2% (2.81% to 0.81%), near the runtime variability now.
On a system with AVX support more is expected.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[kirill.shutemov@linux.intel.com: Properly save/restore arguments]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/lib/clear_page_64.S |   79 ++++++++++++++++++++++++++++++++++--------
 1 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index 9d2f3c2..b302cff 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -73,30 +73,79 @@ ENDPROC(clear_page)
 			     .Lclear_page_end-clear_page,3b-2b
 	.previous
 
+#define SSE_UNROLL 128
+
 /*
  * Zero a page avoiding the caches
  * rdi	page
  */
 ENTRY(clear_page_nocache)
 	CFI_STARTPROC
-	xorl   %eax,%eax
-	movl   $4096/64,%ecx
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $16,%rsp
+	CFI_ADJUST_CFA_OFFSET 16
+	movdqu %xmm0,(%rsp)
+	xorpd  %xmm0,%xmm0
+	movl   $4096/SSE_UNROLL,%ecx
 	.p2align 4
 .Lloop_nocache:
 	decl	%ecx
-#define PUT(x) movnti %rax,x*8(%rdi)
-	movnti %rax,(%rdi)
-	PUT(1)
-	PUT(2)
-	PUT(3)
-	PUT(4)
-	PUT(5)
-	PUT(6)
-	PUT(7)
-#undef PUT
-	leaq	64(%rdi),%rdi
+	.set x,0
+	.rept SSE_UNROLL/16
+	movntdq %xmm0,x(%rdi)
+	.set x,x+16
+	.endr
+	leaq	SSE_UNROLL(%rdi),%rdi
 	jnz	.Lloop_nocache
-	nop
-	ret
+	movdqu (%rsp),%xmm0
+	addq   $16,%rsp
+	CFI_ADJUST_CFA_OFFSET -16
+	jmp   kernel_fpu_end
 	CFI_ENDPROC
 ENDPROC(clear_page_nocache)
+
+#ifdef CONFIG_AS_AVX
+
+	.section .altinstr_replacement,"ax"
+1:	.byte 0xeb					/* jmp <disp8> */
+	.byte (clear_page_nocache_avx - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_AVX,\
+	                     16, 2b-1b
+	.previous
+
+#define AVX_UNROLL 256 /* TUNE ME */
+
+ENTRY(clear_page_nocache_avx)
+	CFI_STARTPROC
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $32,%rsp
+	CFI_ADJUST_CFA_OFFSET 32
+	vmovdqu %ymm0,(%rsp)
+	vxorpd  %ymm0,%ymm0,%ymm0
+	movl   $4096/AVX_UNROLL,%ecx
+	.p2align 4
+.Lloop_avx:
+	decl	%ecx
+	.set x,0
+	.rept AVX_UNROLL/32
+	vmovntdq %ymm0,x(%rdi)
+	.set x,x+32
+	.endr
+	leaq	AVX_UNROLL(%rdi),%rdi
+	jnz	.Lloop_avx
+	vmovdqu (%rsp),%ymm0
+	addq   $32,%rsp
+	CFI_ADJUST_CFA_OFFSET -32
+	jmp   kernel_fpu_end
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_avx)
+
+#endif
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v4 6/8] mm: make clear_huge_page cache clear only around the fault address
From: Kirill A. Shutemov @ 2012-08-20 13:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <1345470757-12005-1-git-send-email-kirill.shutemov@linux.intel.com>

From: Andi Kleen <ak@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels
of CPU caches. To avoid this only cache clear the 4K area
around the fault address and use a cache avoiding clears
for the rest of the 2MB area.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c |   37 +++++++++++++++++++++++++++++--------
 1 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index dfc179b..625ca33 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3969,18 +3969,32 @@ EXPORT_SYMBOL(might_fault);
 #endif
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
+
+#ifndef ARCH_HAS_USER_NOCACHE
+#define ARCH_HAS_USER_NOCACHE 0
+#endif
+
+#if ARCH_HAS_USER_NOCACHE == 0
+#define clear_user_highpage_nocache clear_user_highpage
+#endif
+
 static void clear_gigantic_page(struct page *page,
-				unsigned long addr,
-				unsigned int pages_per_huge_page)
+		unsigned long haddr, unsigned long fault_address,
+		unsigned int pages_per_huge_page)
 {
 	int i;
 	struct page *p = page;
+	unsigned long vaddr;
+	int target = (fault_address - haddr) >> PAGE_SHIFT;
 
 	might_sleep();
-	for (i = 0; i < pages_per_huge_page;
-	     i++, p = mem_map_next(p, page, i)) {
+	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
+			i++, p = mem_map_next(p, page, i), vaddr += PAGE_SIZE) {
 		cond_resched();
-		clear_user_highpage(p, addr + i * PAGE_SIZE);
+		if (!ARCH_HAS_USER_NOCACHE  || i == target)
+			clear_user_highpage(p, vaddr);
+		else
+			clear_user_highpage_nocache(p, vaddr);
 	}
 }
 void clear_huge_page(struct page *page,
@@ -3988,16 +4002,23 @@ void clear_huge_page(struct page *page,
 		     unsigned int pages_per_huge_page)
 {
 	int i;
+	unsigned long vaddr;
+	int target = (fault_address - haddr) >> PAGE_SHIFT;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, haddr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, fault_address,
+				pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
-	for (i = 0; i < pages_per_huge_page; i++) {
+	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
+			i++, page++, vaddr += PAGE_SIZE) {
 		cond_resched();
-		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
+		if (!ARCH_HAS_USER_NOCACHE || i == target)
+			clear_user_highpage(page, vaddr);
+		else
+			clear_user_highpage_nocache(page, vaddr);
 	}
 }
 
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v4 2/8] THP: Pass fault address to __do_huge_pmd_anonymous_page()
From: Kirill A. Shutemov @ 2012-08-20 13:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <1345470757-12005-1-git-send-email-kirill.shutemov@linux.intel.com>

From: Andi Kleen <ak@linux.intel.com>

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 70737ec..6f0825b611 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -633,7 +633,8 @@ static inline pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 
 static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 					struct vm_area_struct *vma,
-					unsigned long haddr, pmd_t *pmd,
+					unsigned long haddr,
+					unsigned long address, pmd_t *pmd,
 					struct page *page)
 {
 	pgtable_t pgtable;
@@ -720,8 +721,8 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			put_page(page);
 			goto out;
 		}
-		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd,
-							  page))) {
+		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr,
+						address, pmd, page))) {
 			mem_cgroup_uncharge_page(page);
 			put_page(page);
 			goto out;
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v4 4/8] mm: pass fault address to clear_huge_page()
From: Kirill A. Shutemov @ 2012-08-20 13:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <1345470757-12005-1-git-send-email-kirill.shutemov@linux.intel.com>

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |    2 +-
 mm/huge_memory.c   |    2 +-
 mm/hugetlb.c       |    3 ++-
 mm/memory.c        |    7 ++++---
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..2858723 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1638,7 +1638,7 @@ extern void dump_page(struct page *page);
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
 extern void clear_huge_page(struct page *page,
-			    unsigned long addr,
+			    unsigned long haddr, unsigned long fault_address,
 			    unsigned int pages_per_huge_page);
 extern void copy_user_huge_page(struct page *dst, struct page *src,
 				unsigned long addr, struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6f0825b611..070bf89 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -644,7 +644,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 	if (unlikely(!pgtable))
 		return VM_FAULT_OOM;
 
-	clear_huge_page(page, haddr, HPAGE_PMD_NR);
+	clear_huge_page(page, haddr, address, HPAGE_PMD_NR);
 	__SetPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3c86d3d..5182192 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2718,7 +2718,8 @@ retry:
 				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
-		clear_huge_page(page, haddr, pages_per_huge_page(h));
+		clear_huge_page(page, haddr, fault_address,
+				pages_per_huge_page(h));
 		__SetPageUptodate(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
diff --git a/mm/memory.c b/mm/memory.c
index 5736170..dfc179b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3984,19 +3984,20 @@ static void clear_gigantic_page(struct page *page,
 	}
 }
 void clear_huge_page(struct page *page,
-		     unsigned long addr, unsigned int pages_per_huge_page)
+		     unsigned long haddr, unsigned long fault_address,
+		     unsigned int pages_per_huge_page)
 {
 	int i;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, addr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, addr + i * PAGE_SIZE);
+		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
 	}
 }
 
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v4 3/8] hugetlb: pass fault address to hugetlb_no_page()
From: Kirill A. Shutemov @ 2012-08-20 13:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <1345470757-12005-1-git-send-email-kirill.shutemov@linux.intel.com>

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/hugetlb.c |   38 +++++++++++++++++++-------------------
 1 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bc72712..3c86d3d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2672,7 +2672,8 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
 }
 
 static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, pte_t *ptep, unsigned int flags)
+			unsigned long haddr, unsigned long fault_address,
+			pte_t *ptep, unsigned int flags)
 {
 	struct hstate *h = hstate_vma(vma);
 	int ret = VM_FAULT_SIGBUS;
@@ -2696,7 +2697,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	}
 
 	mapping = vma->vm_file->f_mapping;
-	idx = vma_hugecache_offset(h, vma, address);
+	idx = vma_hugecache_offset(h, vma, haddr);
 
 	/*
 	 * Use page lock to guard against racing truncation
@@ -2708,7 +2709,7 @@ retry:
 		size = i_size_read(mapping->host) >> huge_page_shift(h);
 		if (idx >= size)
 			goto out;
-		page = alloc_huge_page(vma, address, 0);
+		page = alloc_huge_page(vma, haddr, 0);
 		if (IS_ERR(page)) {
 			ret = PTR_ERR(page);
 			if (ret == -ENOMEM)
@@ -2717,7 +2718,7 @@ retry:
 				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
-		clear_huge_page(page, address, pages_per_huge_page(h));
+		clear_huge_page(page, haddr, pages_per_huge_page(h));
 		__SetPageUptodate(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
@@ -2763,7 +2764,7 @@ retry:
 	 * the spinlock.
 	 */
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED))
-		if (vma_needs_reservation(h, vma, address) < 0) {
+		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto backout_unlocked;
 		}
@@ -2778,16 +2779,16 @@ retry:
 		goto backout;
 
 	if (anon_rmap)
-		hugepage_add_new_anon_rmap(page, vma, address);
+		hugepage_add_new_anon_rmap(page, vma, haddr);
 	else
 		page_dup_rmap(page);
 	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
 				&& (vma->vm_flags & VM_SHARED)));
-	set_huge_pte_at(mm, address, ptep, new_pte);
+	set_huge_pte_at(mm, haddr, ptep, new_pte);
 
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
 		/* Optimization, do the COW without a second fault */
-		ret = hugetlb_cow(mm, vma, address, ptep, new_pte, page);
+		ret = hugetlb_cow(mm, vma, haddr, ptep, new_pte, page);
 	}
 
 	spin_unlock(&mm->page_table_lock);
@@ -2813,21 +2814,20 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page *pagecache_page = NULL;
 	static DEFINE_MUTEX(hugetlb_instantiation_mutex);
 	struct hstate *h = hstate_vma(vma);
+	unsigned long haddr = address & huge_page_mask(h);
 
-	address &= huge_page_mask(h);
-
-	ptep = huge_pte_offset(mm, address);
+	ptep = huge_pte_offset(mm, haddr);
 	if (ptep) {
 		entry = huge_ptep_get(ptep);
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
-			migration_entry_wait(mm, (pmd_t *)ptep, address);
+			migration_entry_wait(mm, (pmd_t *)ptep, haddr);
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
 				VM_FAULT_SET_HINDEX(hstate_index(h));
 	}
 
-	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
+	ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
 	if (!ptep)
 		return VM_FAULT_OOM;
 
@@ -2839,7 +2839,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	mutex_lock(&hugetlb_instantiation_mutex);
 	entry = huge_ptep_get(ptep);
 	if (huge_pte_none(entry)) {
-		ret = hugetlb_no_page(mm, vma, address, ptep, flags);
+		ret = hugetlb_no_page(mm, vma, haddr, address, ptep, flags);
 		goto out_mutex;
 	}
 
@@ -2854,14 +2854,14 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * consumed.
 	 */
 	if ((flags & FAULT_FLAG_WRITE) && !pte_write(entry)) {
-		if (vma_needs_reservation(h, vma, address) < 0) {
+		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto out_mutex;
 		}
 
 		if (!(vma->vm_flags & VM_MAYSHARE))
 			pagecache_page = hugetlbfs_pagecache_page(h,
-								vma, address);
+								vma, haddr);
 	}
 
 	/*
@@ -2884,16 +2884,16 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry)) {
-			ret = hugetlb_cow(mm, vma, address, ptep, entry,
+			ret = hugetlb_cow(mm, vma, haddr, ptep, entry,
 							pagecache_page);
 			goto out_page_table_lock;
 		}
 		entry = pte_mkdirty(entry);
 	}
 	entry = pte_mkyoung(entry);
-	if (huge_ptep_set_access_flags(vma, address, ptep, entry,
+	if (huge_ptep_set_access_flags(vma, haddr, ptep, entry,
 						flags & FAULT_FLAG_WRITE))
-		update_mmu_cache(vma, address, ptep);
+		update_mmu_cache(vma, haddr, ptep);
 
 out_page_table_lock:
 	spin_unlock(&mm->page_table_lock);
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v4 1/8] THP: Use real address for NUMA policy
From: Kirill A. Shutemov @ 2012-08-20 13:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <1345470757-12005-1-git-send-email-kirill.shutemov@linux.intel.com>

From: Andi Kleen <ak@linux.intel.com>

Use the fault address, not the rounded down hpage address for NUMA
policy purposes. In some circumstances this can give more exact
NUMA policy.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..70737ec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -681,11 +681,11 @@ static inline gfp_t alloc_hugepage_gfpmask(int defrag, gfp_t extra_gfp)
 
 static inline struct page *alloc_hugepage_vma(int defrag,
 					      struct vm_area_struct *vma,
-					      unsigned long haddr, int nd,
+					      unsigned long address, int nd,
 					      gfp_t extra_gfp)
 {
 	return alloc_pages_vma(alloc_hugepage_gfpmask(defrag, extra_gfp),
-			       HPAGE_PMD_ORDER, vma, haddr, nd);
+			       HPAGE_PMD_ORDER, vma, address, nd);
 }
 
 #ifndef CONFIG_NUMA
@@ -710,7 +710,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (unlikely(khugepaged_enter(vma)))
 			return VM_FAULT_OOM;
 		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					  vma, haddr, numa_node_id(), 0);
+					  vma, address, numa_node_id(), 0);
 		if (unlikely(!page)) {
 			count_vm_event(THP_FAULT_FALLBACK);
 			goto out;
@@ -944,7 +944,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (transparent_hugepage_enabled(vma) &&
 	    !transparent_hugepage_debug_cow())
 		new_page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					      vma, haddr, numa_node_id(), 0);
+					      vma, address, numa_node_id(), 0);
 	else
 		new_page = NULL;
 
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v4 0/8] Avoid cache trashing on clearing huge/gigantic page
From: Kirill A. Shutemov @ 2012-08-20 13:52 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels of CPU
caches.  To avoid this only cache clear the 4K area around the fault
address and use a cache avoiding clears for the rest of the 2MB area.

This patchset implements cache avoiding version of clear_page only for
x86. If an architecture wants to provide cache avoiding version of
clear_page it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().

v4:
  - vm.clear_huge_page_nocache sysctl;
  - rework page iteration in clear_{huge,gigantic}_page according to
    Andrea Arcangeli suggestion;
v3:
  - Rebased to current Linus' tree. kmap_atomic() build issue is fixed;
  - Pass fault address to clear_huge_page(). v2 had problem with clearing
    for sizes other than HPAGE_SIZE;
  - x86: fix 32bit variant. Fallback version of clear_page_nocache() has
    been added for non-SSE2 systems;
  - x86: clear_page_nocache() moved to clear_page_{32,64}.S;
  - x86: use pushq_cfi/popq_cfi instead of push/pop;
v2:
  - No code change. Only commit messages are updated;
  - RFC mark is dropped;

Andi Kleen (5):
  THP: Use real address for NUMA policy
  THP: Pass fault address to __do_huge_pmd_anonymous_page()
  x86: Add clear_page_nocache
  mm: make clear_huge_page cache clear only around the fault address
  x86: switch the 64bit uncached page clear to SSE/AVX v2

Kirill A. Shutemov (3):
  hugetlb: pass fault address to hugetlb_no_page()
  mm: pass fault address to clear_huge_page()
  mm: implement vm.clear_huge_page_nocache sysctl

 Documentation/sysctl/vm.txt      |   13 ++++++
 arch/x86/include/asm/page.h      |    2 +
 arch/x86/include/asm/string_32.h |    5 ++
 arch/x86/include/asm/string_64.h |    5 ++
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/clear_page_32.S     |   72 +++++++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_64.S     |   78 ++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/fault.c              |    7 +++
 include/linux/mm.h               |    7 +++-
 kernel/sysctl.c                  |   12 ++++++
 mm/huge_memory.c                 |   17 ++++----
 mm/hugetlb.c                     |   39 ++++++++++---------
 mm/memory.c                      |   72 ++++++++++++++++++++++++++++++----
 13 files changed, 294 insertions(+), 38 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_32.S

-- 
1.7.7.6

^ permalink raw reply

* [PATCH 7/8] ppc/pnv: using PCI core to do resource assignment
From: Gavin Shan @ 2012-08-20 13:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: weiyang, Gavin Shan
In-Reply-To: <1345470561-17785-1-git-send-email-shangw@linux.vnet.ibm.com>

Currently, the PCI probe flags "PCI_PROBE_ONLY | PCI_REASSIGN_ALL_RSRC"
used on powernv platform. That means the platform has to do the PCI
resource assignment by itself.

The patch changes the PCI probe flag to "PCI_REASSIGN_ALL_RSRC" so
that the PCI core will do the resource assignment. Also, the I/O
and MMIO minimal alignment for P2P bridges have been configured
while doing fixup for the PHBs.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Reviewed-by: Richard Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   43 +++++------------------------
 1 files changed, 7 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index c856f90..ead4eff 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1124,36 +1124,6 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
 #endif /* CONFIG_PCI_MSI */
 
-/* This is the starting point of our IODA specific resource
- * allocation process
- */
-static void __devinit pnv_pci_ioda_fixup_phb(struct pci_controller *hose)
-{
-	resource_size_t size, align;
-	struct pci_bus *child;
-
-	/* Associate PEs per functions */
-	pnv_ioda_setup_PEs(hose->bus);
-
-	/* Calculate all resources */
-	pnv_ioda_calc_bus(hose->bus, IORESOURCE_IO, &size, &align);
-	pnv_ioda_calc_bus(hose->bus, IORESOURCE_MEM, &size, &align);
-
-	/* Apply then to HW */
-	pnv_ioda_update_resources(hose->bus);
-
-	/* Setup DMA */
-	pnv_ioda_setup_dma(hose->private_data);
-
-	/* Configure PCI Express settings */
-	list_for_each_entry(child, &hose->bus->children, node) {
-		struct pci_dev *self = child->self;
-		if (!self)
-			continue;
-		pcie_bus_configure_settings(child, self->pcie_mpss);
-	}
-}
-
 /*
  * This function is supposed to be called on basis of PE from top
  * to bottom style. So the the I/O or MMIO segment assigned to
@@ -1473,16 +1443,17 @@ void __init pnv_pci_init_ioda1_phb(struct device_node *np)
 	/* Setup MSI support */
 	pnv_pci_init_ioda_msis(phb);
 
-	/* We set both PCI_PROBE_ONLY and PCI_REASSIGN_ALL_RSRC. This is an
-	 * odd combination which essentially means that we skip all resource
-	 * fixups and assignments in the generic code, and do it all
-	 * ourselves here
+	/*
+	 * We pass the PCI probe flag PCI_REASSIGN_ALL_RSRC here
+	 * to let the PCI core do resource assignment. It's supposed
+	 * that the PCI core will do correct I/O and MMIO alignment
+	 * for the P2P bridge bars so that each PCI bus (excluding
+	 * the child P2P bridges) can form individual PE.
 	 */
-	ppc_md.pcibios_fixup_phb = pnv_pci_ioda_fixup_phb;
 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
 	ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook;
 	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
-	pci_add_flags(PCI_PROBE_ONLY | PCI_REASSIGN_ALL_RSRC);
+	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 
 	/* Reset IODA tables to a clean state */
 	rc = opal_pci_reset(phb_id, OPAL_PCI_IODA_TABLE_RESET, OPAL_ASSERT_RESET);
-- 
1.7.5.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox