Possible shadow bug (was: Re: [PATCH] memcg: do not expose uninitialized mem_cgroup_per

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Possible shadow bug (was: Re: [PATCH] memcg: do not expose uninitialized mem_cgroup_per_node to world)
       [not found]                       ` <4DF0801F.9050908@redhat.com>
@ 2011-06-09 12:40                         ` Stefano Stabellini
  2011-06-09 15:01                           ` [Xen-devel] " Tim Deegan
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Stabellini @ 2011-06-09 12:40 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: xen-devel, Keir Fraser, Tim Deegan, Stefano Stabellini,
	containers@lists.linux-foundation.org, Li Zefan,
	linux-kernel@vger.kernel.org, Michal Hocko, linux-mm@kvack.org,
	Keir Fraser, akpm@linux-foundation.org, Hiroyuki Kamezawa,
	Paul Menage, KAMEZAWA Hiroyuki, balbir@linux.vnet.ibm.com

CC'ing xen-devel and Tim.

This is a comment from a previous email in the thread:

> It most easily reproduced only on xen hvm 32bit guest under heavy vcpus
> contention for real cpus resources (i.e. I had to overcommit cpus and
> run several cpu hog tasks on host to make guest crash on reboot cycle).
> And from last experiments, crash happens only on on hosts that doesn't
> have hap feature or if hap is disabled in hypervisor.

it makes me think that it is a shadow pagetables bug; see details below.
You can find more details on it following this thread on the lkml.




On Thu, 9 Jun 2011, Igor Mammedov wrote:
> On 06/08/2011 05:35 AM, KAMEZAWA Hiroyuki wrote:
> > On Tue, 07 Jun 2011 15:25:59 +0200
> > Igor Mammedov<imammedo@redhat.com>  wrote:
> >
> >> Sorry for late reply,
> >>
> >> On 06/03/2011 03:00 PM, Hiroyuki Kamezawa wrote:
> >>> 2011/6/3 Igor Mammedov<imammedo@redhat.com>:
> >>>> On 06/02/2011 01:10 AM, Hiroyuki Kamezawa wrote:
> >>>>>> pc = list_entry(list->prev, struct page_cgroup, lru);
> >>>>> Hmm, I disagree your patch is a fix for mainline. At least, a cgroup
> >>>>> before completion of
> >>>>> create() is not populated to userland and you never be able to rmdir()
> >>>>> it because you can't
> >>>>> find it.
> >>>>>
> >>>>>
> >>>>>    >26:   e8 7d 12 30 00          call   0x3012a8
> >>>>>    >2b:*  8b 73 08                mov    0x8(%ebx),%esi<-- trapping
> >>>>> instruction
> >>>>>    >2e:   8b 7c 24 24             mov    0x24(%esp),%edi
> >>>>>    >32:   8b 07                   mov    (%edi),%eax
> >>>>>
> >>>>> Hm, what is the call 0x3012a8 ?
> >>>>>
> >>>>                  pc = list_entry(list->prev, struct page_cgroup, lru);
> >>>>                  if (busy == pc) {
> >>>>                          list_move(&pc->lru, list);
> >>>>                          busy = 0;
> >>>>                          spin_unlock_irqrestore(&zone->lru_lock, flags);
> >>>>                          continue;
> >>>>                  }
> >>>>                  spin_unlock_irqrestore(&zone->lru_lock, flags);<---- is
> >>>>    call 0x3012a8
> >>>>                  ret = mem_cgroup_move_parent(pc, mem, GFP_KERNEL);
> >>>>
> >>>> and  mov 0x8(%ebx),%esi
> >>>> is dereferencing of 'pc' in inlined mem_cgroup_move_parent
> >>>>
> >>> Ah, thank you for input..then panicd at accessing pc->page and "pc"
> >>> was 0xfffffff4.
> >>> it means list->prev was NULL.
> >>>
> >> yes, that's the case.
> >>>> I've looked at vmcore once more and indeed there isn't any parallel task
> >>>> that touches cgroups code path.
> >>>> Will investigate if it is xen to blame for incorrect data in place.
> >>>>
> >>>> Thanks very much for your opinion.
> >>> What curious to me is that the fact "list->prev" is NULL.
> >>> I can see why you doubt the initialization code ....the list pointer never
> >>> contains NULL once it's used....
> >>> it smells like memory corruption or some to me. If you have vmcore,
> >>> what the problematic mem_cgroup_per_zone(node) contains ?
> >> it has all zeros except for last field:
> >>
> >> crash>  rd f3446a00 62
> >> f3446a00:  00000000 00000000 00000000 00000000   ................
> >> f3446a10:  00000000 00000000 00000000 00000000   ................
> >> f3446a20:  00000000 00000000 00000000 00000000   ................
> >> f3446a30:  00000000 00000000 00000000 00000000   ................
> >> f3446a40:  00000000 00000000 00000000 00000000   ................
> >> f3446a50:  00000000 00000000 00000000 00000000   ................
> >> f3446a60:  00000000 00000000 00000000 00000000   ................
> >> f3446a70:  00000000 00000000 f36ef800 f3446a7c   ..........n.|jD.
> >> f3446a80:  f3446a7c f3446a84 f3446a84 f3446a8c   |jD..jD..jD..jD.
> >> f3446a90:  f3446a8c f3446a94 f3446a94 f3446a9c   .jD..jD..jD..jD.
> >> f3446aa0:  f3446a9c 00000000 00000000 00000000   .jD.............
> >> f3446ab0:  00000000 00000000 00000000 00000000   ................
> >> f3446ac0:  00000000 00000000 00000000 00000000   ................
> >> f3446ad0:  00000000 00000000 00000000 00000000   ................
> >> f3446ae0:  00000000 00000000 00000000 00000000   ................
> >> f3446af0:  00000000 f36ef800
> >>
> >> crash>  struct mem_cgroup f36ef800
> >> struct mem_cgroup {
> >> ...
> >> info = {
> >>       nodeinfo = {0xf3446a00}
> >>     },
> >> ...
> >>
> >> It looks like a very targeted corruption of the first zone except of
> >> the last field, while the second zone and the rest are perfectly
> >> normal (i.e. have empty initialized lists).
> >>
> > Hmm, ok, thank you. Then, mem_cgroup_pre_zone[] was initialized once.
> > In this kind of case, I tend to check slab header of memory object f3446a00,
> > or check whether f3446a00 is an alive slab object or not.
> It looks like f3446a00 alive/allocated object
> 
> crash> kmem f3446a00
> CACHE    NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
> f7000c80 size-512                 512       2251      2616    327     4k
> SLAB      MEMORY    TOTAL  ALLOCATED  FREE
> f3da6540  f3446000      8          1     7
> FREE / [ALLOCATED]
>    [f3446a00]
> 
>    PAGE    PHYSICAL   MAPPING    INDEX CNT FLAGS
> c1fa58c0  33446000         0        70  1 2800080
> 
> 
> However I have a related crash that can lead to not initialized lists of 
> the first entry
> (i.e. to what we see at f3446a00), debug kernel sometimes will crash at
> alloc_mem_cgroup_per_zone_info:
> 
> XXX: pn: f208dc00, phy: 3208dc00
> XXX: pn: f2e85a00, phy: 32e85a00
> BUG: unable to handle kernel paging request at 9b74e240
> IP: [<c080b95f>] mem_cgroup_create0x+0xef/0x350
> *pdpt = 0000000033542001 *pde = 0000000000000000
> Oops: 0002 [#1] SMP
> ...
> 
> Pid: 1823, comm: libvirtd Tainted: G           ---------------- T
> (2.6.32.700565 #21) HVM domU
> EIP: 0060:[<c080b95f>] EFLAGS: 00210297 CPU: 3
> EIP is at mem_cgroup_create+0xef/0x350
> EAX: 9b74e240 EBX: f2e85a00 ECX: 00000001 EDX: 00000001
> ESI: a88c8840 EDI: a88c8840 EBP: f201deb4 ESP: f201de8c
>   DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Process libvirtd (pid: 1823, ti=f201c000 task=f3642ab0 task.ti=f201c000)
> Stack:
>   c09579b2 f2e85a00 32e85a00 f3455800 00000000 f2e85a00 f2c14ac0 c0a5a820
> <0>  fffffff4 f2c14ac0 f201def8 c049d3a7 00000000 00000000 00000000 000001ed
> <0>  f2c14ac8 f5fa4400 f24fe954 f3502000 f2c14e40 f24f5608 f3502010 f2c14ac0
> Call Trace:
>   [<c049d3a7>] cgroup_mkdir+0xf7/0x450
>   [<c05318e3>] vfs_mkdir+0x93/0xf0
>   [<c0533787>] ? lookup_hash+0x27/0x30
>   [<c053390e>] sys_mkdirat+0xde/0x100
>   [<c04b5d4d>] ? call_rcu_sched+0xd/0x10
>   [<c04b5d58>] ? call_rcu+0x8/0x10
>   [<c047ab9f>] ? __put_cred+0x2f/0x50
>   [<c0524ded>] ? sys_faccessat+0x14d/0x180
>   [<c0523fb7>] ? filp_close+0x47/0x70
>   [<c0533950>] sys_mkdir+0x20/0x30
>   [<c0409b5f>] sysenter_do_call+0x12/0x28
> 
> 
> static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
> {
> ...
>          memset(pn, 0, sizeof(*pn));
> 
>          for (zone = 0; zone<  MAX_NR_ZONES; zone++) {
>                  mz =&pn->zoneinfo[zone];
>                  for_each_lru(l)
>                          INIT_LIST_HEAD(&mz->lists[l]);<- crash here
>                  mz->usage_in_excess = 0;
>                  mz->on_tree = false;
>                  mz->mem = mem;
>          }
> ...
> 
> 
> crash>  dis 0xc080b93e 15
> 0xc080b93e<mem_cgroup_create+206>:     movl   $0x0,-0x18(%ebp)
> 0xc080b945<mem_cgroup_create+213>:     mov    %esi,-0x1c(%ebp)
> 0xc080b948<mem_cgroup_create+216>:     imul   $0x7c,-0x18(%ebp),%edi
> 0xc080b94c<mem_cgroup_create+220>:     xor    %ecx,%ecx
> 0xc080b94e<mem_cgroup_create+222>:     xor    %edx,%edx
> 0xc080b950<mem_cgroup_create+224>:     lea    (%edi,%edx,8),%esi
> 0xc080b953<mem_cgroup_create+227>:     add    $0x1,%ecx
> 0xc080b956<mem_cgroup_create+230>:     lea    (%ebx,%esi,1),%eax
> 0xc080b959<mem_cgroup_create+233>:     add    $0x1,%edx
> 0xc080b95c<mem_cgroup_create+236>:     cmp    $0x5,%ecx
> 0xc080b95f<mem_cgroup_create+239>:     mov    %eax,(%ebx,%esi,1)
> 0xc080b962<mem_cgroup_create+242>:     mov    %eax,0x4(%eax)
> 0xc080b965<mem_cgroup_create+245>:     jne    0xc080b950
> 0xc080b967<mem_cgroup_create+247>:     mov    -0x14(%ebp),%eax
> 0xc080b96a<mem_cgroup_create+250>:     movl   $0x0,0x6c(%eax)
> 
> EDI on the first iteration should be 0 however it is a88c8840 according to Oops
> dump and looking at -0x18(%ebp) in core shows 0 as it should be:
> 
> crash>  x/xw 0xf201deb4-0x18
> 0xf201de9c:     0x00000000
> 
> so it looks like EDI is incorrectly restored by Xen or at the moment when 0xc080b948
> was executed -0x18(%ebp) had that weird value.
> 
> It is possible that invalid EDI value and following
> 
> 0xc080b950<mem_cgroup_create+224>:     lea    (%edi,%edx,8),%esi
> 
> <https://bugzilla.redhat.com/show_bug.cgi?id=700565#c36>lead to some 
> accessible page and writes
> 
> 0xc080b95f<mem_cgroup_create+239>:     mov    %eax,(%ebx,%esi,1)
> 0xc080b962<mem_cgroup_create+242>:     mov    %eax,0x4(%eax)
> 
> silently go to that page. Than after init lists loop it uses correct pn offset from
> -0x14(%ebp) and initialises the rest fields of structure on the correct page.
> 
>                  mz->usage_in_excess = 0;
>                  mz->on_tree = false;
>                  mz->mem = mem;
> 
> 0xc080b967<mem_cgroup_create+247>:     mov    -0x14(%ebp),%eax<-
> 0xc080b96a<mem_cgroup_create+250>:     movl   $0x0,0x6c(%eax)
> 0xc080b971<mem_cgroup_create+257>:     movl   $0x0,0x70(%eax)
> 0xc080b978<mem_cgroup_create+264>:     movb   $0x0,0x74(%eax)
> 0xc080b97c<mem_cgroup_create+268>:     mov    -0x1c(%ebp),%edx
> 0xc080b97f<mem_cgroup_create+271>:     mov    %edx,0x78(%eax)
> 0xc080b982<mem_cgroup_create+274>:     add    $0x7c,%eax
> 0xc080b985<mem_cgroup_create+277>:     addl   $0x1,-0x18(%ebp)
> 0xc080b989<mem_cgroup_create+281>:     cmpl   $0x4,-0x18(%ebp)
> 0xc080b98d<mem_cgroup_create+285>:     mov    %eax,-0x14(%ebp)
> 0xc080b990<mem_cgroup_create+288>:     jne    0xc080b948
> 
> which could lead to the 0-ed list entries of the first zone
> and the originally reported Oops in mem_cgroup_force_empty.
> Afterwards it looks like:
> 
> 0xc080b985<mem_cgroup_create+277>:     addl   $0x1,-0x18(%ebp)
> 
> -0x18(%ebp) is read correctly and the rest of 3 mz entries are initialized as
> expected.
> 
> So question is why and how
> 0xc080b948<mem_cgroup_create+216>:     imul   $0x7c,-0x18(%ebp),%edi
> may be screwed up
> 
> PS:
> However, memory search for the went astray writes of the first entry
> i.e. sequesnce f3446a00 f3446a00 in a couple of vmcores didn't give
> any positive results.
> 
> 
> > Thanks,
> > -Kame
> >> PS:
> >> It most easily reproduced only on xen hvm 32bit guest under heavy
> >> vcpus contention for real cpus resources (i.e. I had to overcommit
> >> cpus and run several cpu hog tasks on host to make guest crash on
> >> reboot cycle).
> >> And from last experiments, crash happens only on on hosts that
> >> doesn't have hap feature or if hap is disabled in hypervisor.
> >>
> >>> Thanks,
> >>> -Kame
> >>
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] Possible shadow bug (was: Re: [PATCH] memcg: do not expose uninitialized mem_cgroup_per_node to world)
  2011-06-09 12:40                         ` Possible shadow bug (was: Re: [PATCH] memcg: do not expose uninitialized mem_cgroup_per_node to world) Stefano Stabellini
@ 2011-06-09 15:01                           ` Tim Deegan
  2011-06-09 16:47                             ` [Xen-devel] Possible shadow bug Igor Mammedov
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Deegan @ 2011-06-09 15:01 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Igor Mammedov, xen-devel, Keir Fraser,
	containers@lists.linux-foundation.org, Li Zefan,
	linux-kernel@vger.kernel.org, Michal Hocko, linux-mm@kvack.org,
	Keir Fraser, akpm@linux-foundation.org, Hiroyuki Kamezawa,
	Paul Menage, KAMEZAWA Hiroyuki, balbir@linux.vnet.ibm.com

At 13:40 +0100 on 09 Jun (1307626812), Stefano Stabellini wrote:
> CC'ing xen-devel and Tim.
> 
> This is a comment from a previous email in the thread:
> 
> > It most easily reproduced only on xen hvm 32bit guest under heavy vcpus
> > contention for real cpus resources (i.e. I had to overcommit cpus and
> > run several cpu hog tasks on host to make guest crash on reboot cycle).
> > And from last experiments, crash happens only on on hosts that doesn't
> > have hap feature or if hap is disabled in hypervisor.
> 
> it makes me think that it is a shadow pagetables bug; see details below.
> You can find more details on it following this thread on the lkml.

Oh dear.  I'm having a look at the linux code now to try and understand
the behaviour.  In the meantime, what version of Xen was this on?  If
you're willing to try recompiling Xen with some small patches that
disable the "cleverer" parts of the shadow pagetable code that might
indicate something.  (Of course, it might just change the timing to
obscure a real linux bug too.)

The only time I've seen a corruption like this, with a mapping
transiently going to the wrong frame, it turned out to be caused by
32-bit pagetable-handling code writing a PAE PTE with a single 64-bit
write (which is not atomic on x86-32), and the TLB happening to see the
intermediate, half-written entry.  I doubt that there's any bug like
that in linux, though, or we'd surely have seen it before now.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] Possible shadow bug
  2011-06-09 15:01                           ` [Xen-devel] " Tim Deegan
@ 2011-06-09 16:47                             ` Igor Mammedov
  2011-06-10 10:01                               ` Tim Deegan
  0 siblings, 1 reply; 9+ messages in thread
From: Igor Mammedov @ 2011-06-09 16:47 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Stefano Stabellini, xen-devel, Keir Fraser,
	containers@lists.linux-foundation.org, Li Zefan,
	linux-kernel@vger.kernel.org, Michal Hocko, linux-mm@kvack.org,
	Keir Fraser, akpm@linux-foundation.org, Hiroyuki Kamezawa,
	Paul Menage, KAMEZAWA Hiroyuki, balbir@linux.vnet.ibm.com

On 06/09/2011 05:01 PM, Tim Deegan wrote:
> At 13:40 +0100 on 09 Jun (1307626812), Stefano Stabellini wrote:
>> CC'ing xen-devel and Tim.
>>
>> This is a comment from a previous email in the thread:
>>
>>> It most easily reproduced only on xen hvm 32bit guest under heavy vcpus
>>> contention for real cpus resources (i.e. I had to overcommit cpus and
>>> run several cpu hog tasks on host to make guest crash on reboot cycle).
>>> And from last experiments, crash happens only on on hosts that doesn't
>>> have hap feature or if hap is disabled in hypervisor.
>> it makes me think that it is a shadow pagetables bug; see details below.
>> You can find more details on it following this thread on the lkml.
> Oh dear.  I'm having a look at the linux code now to try and understand
> the behaviour.  In the meantime, what version of Xen was this on?  If
It's rhel5.6 xen. I've tried to test on SLES 11 that has 4.0.1 xen, however
wasn't able to reproduce problem. (I'm not sure if hap was turned off in 
this
case). More detailed info can be found at RHBZ#700565

> you're willing to try recompiling Xen with some small patches that
> disable the "cleverer" parts of the shadow pagetable code that might
> indicate something.  (Of course, it might just change the timing to
> obscure a real linux bug too.)
>
Haven't got to this part yet. But looks like it's the only option left.

> The only time I've seen a corruption like this, with a mapping
> transiently going to the wrong frame, it turned out to be caused by
> 32-bit pagetable-handling code writing a PAE PTE with a single 64-bit
> write (which is not atomic on x86-32), and the TLB happening to see the
> intermediate, half-written entry.  I doubt that there's any bug like
> that in linux, though, or we'd surely have seen it before now.
>
> Cheers,
>
> Tim.
>


-- 
Thanks,
  Igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] Possible shadow bug
  2011-06-09 16:47                             ` [Xen-devel] Possible shadow bug Igor Mammedov
@ 2011-06-10 10:01                               ` Tim Deegan
  2011-06-10 10:10                                 ` Tim Deegan
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Deegan @ 2011-06-10 10:01 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: xen-devel, Keir Fraser, Stefano Stabellini,
	containers@lists.linux-foundation.org, Li Zefan,
	linux-kernel@vger.kernel.org, Michal Hocko, linux-mm@kvack.org,
	Keir Fraser, akpm@linux-foundation.org, balbir@linux.vnet.ibm.com,
	Paul Menage, KAMEZAWA Hiroyuki, Hiroyuki Kamezawa

Hi, 

At 18:47 +0200 on 09 Jun (1307645229), Igor Mammedov wrote:
> It's rhel5.6 xen. I've tried to test on SLES 11 that has 4.0.1 xen, however
> wasn't able to reproduce problem. (I'm not sure if hap was turned
> off in this case). More detailed info can be found at RHBZ#700565

The best way to be sure whether HAP is in use is to connect to the
serial line, hit ^A^A^A to switch input to Xen, and hit 'q' to dump
per-domain state.  The printout for the guest domain should either say 
"paging assistance: shadow refcounts translate external"
or 
"paging assistance: hap refcounts translate external".

(If you don't have serial you can get the same info by running 
"xm debug-keys q" and then "xm dmesg" to read the output.)

> >you're willing to try recompiling Xen with some small patches that
> >disable the "cleverer" parts of the shadow pagetable code that might
> >indicate something.  (Of course, it might just change the timing to
> >obscure a real linux bug too.)
> >
> Haven't got to this part yet. But looks like it's the only option left.

Actually, looking at the disassembly you posted, it looks more like it
might be an emulator bug in Xen; if Xen finds itself emulating the IMUL
instruction and either gets the logic wrong or does the memory access
wrong, it could cause that failure.  And one reason that Xen emulates
instructions is if the memory operand is on a pagetable that's shadowed
(which might be a page that was recently a pagetable). 

ISTR that even though the RHEL xen reports a 3.0.x version it has quite
a lot of backports in it.  Does it have this patch?
http://hg.uk.xensource.com/xen-3.1-testing.hg/rev/e8fca4c42d05

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] Possible shadow bug
  2011-06-10 10:01                               ` Tim Deegan
@ 2011-06-10 10:10                                 ` Tim Deegan
  2011-06-10 11:48                                   ` Pasi Kärkkäinen
  2011-06-10 13:55                                   ` Igor Mammedov
  0 siblings, 2 replies; 9+ messages in thread
From: Tim Deegan @ 2011-06-10 10:10 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: xen-devel, Keir Fraser, Stabellini,
	containers@lists.linux-foundation.org, Li Zefan,
	linux-kernel@vger.kernel.org, Stefano, Michal Hocko,
	linux-mm@kvack.org, Keir Fraser, KAMEZAWA,
	akpm@linux-foundation.org, Hiroyuki Kamezawa, Paul Menage,
	Hiroyuki, balbir@linux.vnet.ibm.com

At 11:01 +0100 on 10 Jun (1307703699), Tim Deegan wrote:
> Actually, looking at the disassembly you posted, it looks more like it
> might be an emulator bug in Xen; if Xen finds itself emulating the IMUL
> instruction and either gets the logic wrong or does the memory access
> wrong, it could cause that failure.  And one reason that Xen emulates
> instructions is if the memory operand is on a pagetable that's shadowed
> (which might be a page that was recently a pagetable). 
> 
> ISTR that even though the RHEL xen reports a 3.0.x version it has quite
> a lot of backports in it.  Does it have this patch?
> http://hg.uk.xensource.com/xen-3.1-testing.hg/rev/e8fca4c42d05

Oops, that URL doesn't work; I meant this:
http://xenbits.xen.org/xen-3.1-testing.hg/rev/e8fca4c42d05

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] Possible shadow bug
  2011-06-10 10:10                                 ` Tim Deegan
@ 2011-06-10 11:48                                   ` Pasi Kärkkäinen
  2011-06-10 12:40                                     ` Tim Deegan
  2011-06-10 13:55                                   ` Igor Mammedov
  1 sibling, 1 reply; 9+ messages in thread
From: Pasi Kärkkäinen @ 2011-06-10 11:48 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Igor Mammedov, Stefano, Paul Menage, xen-devel, Keir Fraser,
	Stabellini, containers@lists.linux-foundation.org, Li Zefan,
	linux-kernel@vger.kernel.org, Michal Hocko, linux-mm@kvack.org,
	Keir Fraser, akpm@linux-foundation.org, balbir@linux.vnet.ibm.com,
	KAMEZAWA, Hiroyuki, Hiroyuki Kamezawa

On Fri, Jun 10, 2011 at 11:10:11AM +0100, Tim Deegan wrote:
> At 11:01 +0100 on 10 Jun (1307703699), Tim Deegan wrote:
> > Actually, looking at the disassembly you posted, it looks more like it
> > might be an emulator bug in Xen; if Xen finds itself emulating the IMUL
> > instruction and either gets the logic wrong or does the memory access
> > wrong, it could cause that failure.  And one reason that Xen emulates
> > instructions is if the memory operand is on a pagetable that's shadowed
> > (which might be a page that was recently a pagetable). 
> > 
> > ISTR that even though the RHEL xen reports a 3.0.x version it has quite
> > a lot of backports in it.  Does it have this patch?
> > http://hg.uk.xensource.com/xen-3.1-testing.hg/rev/e8fca4c42d05
> 
> Oops, that URL doesn't work; I meant this:
> http://xenbits.xen.org/xen-3.1-testing.hg/rev/e8fca4c42d05
> 

RHEL5 Xen (hypervisor) reports version as 3.1.2-xyz..

-- Pasi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] Possible shadow bug
  2011-06-10 11:48                                   ` Pasi Kärkkäinen
@ 2011-06-10 12:40                                     ` Tim Deegan
  2011-06-10 15:38                                       ` Igor Mammedov
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Deegan @ 2011-06-10 12:40 UTC (permalink / raw)
  To: Pasi K?rkk?inen
  Cc: xen-devel, Keir Fraser, Stabellini,
	containers@lists.linux-foundation.org, Li Zefan,
	linux-kernel@vger.kernel.org, Michal Hocko, linux-mm@kvack.org,
	Keir Fraser, Igor Mammedov, Paul Menage, Hiroyuki Kamezawa,
	akpm@linux-foundation.org, Hiroyuki, balbir@linux.vnet.ibm.com

At 14:48 +0300 on 10 Jun (1307717301), Pasi K?rkk?inen wrote:
> On Fri, Jun 10, 2011 at 11:10:11AM +0100, Tim Deegan wrote:
> > At 11:01 +0100 on 10 Jun (1307703699), Tim Deegan wrote:
> > > ISTR that even though the RHEL xen reports a 3.0.x version it has quite
> > > a lot of backports in it.  Does it have this patch?
> > > http://hg.uk.xensource.com/xen-3.1-testing.hg/rev/e8fca4c42d05
> > 
> > Oops, that URL doesn't work; I meant this:
> > http://xenbits.xen.org/xen-3.1-testing.hg/rev/e8fca4c42d05
> > 
> 
> RHEL5 Xen (hypervisor) reports version as 3.1.2-xyz..

Based on a quick scrobble through the CentOS 5.6 SRPMs it looks like a
3.1.0 hypervisor with a bunch of extra patches, but not this one.  This
is very likely the cause of the crash in mem_cgroup_create(), and
probably the corruptions too.  That would explain why they didn't happen
on a 4.0.x SLES11 Xen, but not really why the original patch in this
thread made it go away.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] Possible shadow bug
  2011-06-10 10:10                                 ` Tim Deegan
  2011-06-10 11:48                                   ` Pasi Kärkkäinen
@ 2011-06-10 13:55                                   ` Igor Mammedov
  1 sibling, 0 replies; 9+ messages in thread
From: Igor Mammedov @ 2011-06-10 13:55 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Stefano, Paul Menage, xen-devel, Keir Fraser, Stabellini,
	containers@lists.linux-foundation.org, Li Zefan,
	linux-kernel@vger.kernel.org, Michal Hocko, linux-mm@kvack.org,
	Keir Fraser, akpm@linux-foundation.org, balbir@linux.vnet.ibm.com,
	KAMEZAWA, Hiroyuki, Hiroyuki Kamezawa

On 06/10/2011 12:10 PM, Tim Deegan wrote:
> At 11:01 +0100 on 10 Jun (1307703699), Tim Deegan wrote:
>> Actually, looking at the disassembly you posted, it looks more like it
>> might be an emulator bug in Xen; if Xen finds itself emulating the IMUL
>> instruction and either gets the logic wrong or does the memory access
>> wrong, it could cause that failure.  And one reason that Xen emulates
>> instructions is if the memory operand is on a pagetable that's shadowed
>> (which might be a page that was recently a pagetable).
>>
>> ISTR that even though the RHEL xen reports a 3.0.x version it has quite
>> a lot of backports in it.  Does it have this patch?
>> http://hg.uk.xensource.com/xen-3.1-testing.hg/rev/e8fca4c42d05
> Oops, that URL doesn't work; I meant this:
> http://xenbits.xen.org/xen-3.1-testing.hg/rev/e8fca4c42d05
>
> Tim.
>
Tim, Thank you very much!
We were missing that cs and it solved problem.

-- 
Thanks,
  Igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xen-devel] Possible shadow bug
  2011-06-10 12:40                                     ` Tim Deegan
@ 2011-06-10 15:38                                       ` Igor Mammedov
  0 siblings, 0 replies; 9+ messages in thread
From: Igor Mammedov @ 2011-06-10 15:38 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Pasi K?rkk?inen, xen-devel, Keir Fraser, Stabellini,
	containers@lists.linux-foundation.org, Li Zefan,
	linux-kernel@vger.kernel.org, Michal Hocko, linux-mm@kvack.org,
	Keir Fraser, Paul Menage, Hiroyuki Kamezawa,
	akpm@linux-foundation.org, Hiroyuki, balbir@linux.vnet.ibm.com

On 06/10/2011 02:40 PM, Tim Deegan wrote:
> At 14:48 +0300 on 10 Jun (1307717301), Pasi K?rkk?inen wrote:
>> On Fri, Jun 10, 2011 at 11:10:11AM +0100, Tim Deegan wrote:
>>> At 11:01 +0100 on 10 Jun (1307703699), Tim Deegan wrote:
>>>> ISTR that even though the RHEL xen reports a 3.0.x version it has quite
>>>> a lot of backports in it.  Does it have this patch?
>>>> http://hg.uk.xensource.com/xen-3.1-testing.hg/rev/e8fca4c42d05
>>> Oops, that URL doesn't work; I meant this:
>>> http://xenbits.xen.org/xen-3.1-testing.hg/rev/e8fca4c42d05
>>>
>> RHEL5 Xen (hypervisor) reports version as 3.1.2-xyz..
> Based on a quick scrobble through the CentOS 5.6 SRPMs it looks like a
> 3.1.0 hypervisor with a bunch of extra patches, but not this one.  This
> is very likely the cause of the crash in mem_cgroup_create(), and
> probably the corruptions too.  That would explain why they didn't happen
> on a 4.0.x SLES11 Xen, but not really why the original patch in this
> thread made it go away.
>
Maybe it changes timing so that imul is executed with correct memory 
content?
Putting extra printk inside zone loop or flushing tlb before it also 
make problem
go away. Or may be problem just becomes invisible and memory is corrupted at
another place.

PS:
Well, never mind. I do not know what I'm talking about.
> Cheers,
>
> Tim.
>


-- 
Thanks,
  Igor

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-06-10 15:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1306925044-2828-1-git-send-email-imammedo@redhat.com>
     [not found] ` <20110601123913.GC4266@tiehlicka.suse.cz>
     [not found]   ` <4DE6399C.8070802@redhat.com>
     [not found]     ` <20110601134149.GD4266@tiehlicka.suse.cz>
     [not found]       ` <4DE64F0C.3050203@redhat.com>
     [not found]         ` <20110601152039.GG4266@tiehlicka.suse.cz>
     [not found]           ` <4DE66BEB.7040502@redhat.com>
     [not found]             ` <BANLkTimbqHPeUdue=_Z31KVdPwcXtbLpeg@mail.gmail.com>
     [not found]               ` <4DE8D50F.1090406@redhat.com>
     [not found]                 ` <BANLkTinMamg_qesEffGxKu3QkT=zyQ2MRQ@mail.gmail.com>
     [not found]                   ` <4DEE26E7.2060201@redhat.com>
     [not found]                     ` <20110608123527.479e6991.kamezawa.hiroyu@jp.fujitsu.com>
     [not found]                       ` <4DF0801F.9050908@redhat.com>
2011-06-09 12:40                         ` Possible shadow bug (was: Re: [PATCH] memcg: do not expose uninitialized mem_cgroup_per_node to world) Stefano Stabellini
2011-06-09 15:01                           ` [Xen-devel] " Tim Deegan
2011-06-09 16:47                             ` [Xen-devel] Possible shadow bug Igor Mammedov
2011-06-10 10:01                               ` Tim Deegan
2011-06-10 10:10                                 ` Tim Deegan
2011-06-10 11:48                                   ` Pasi Kärkkäinen
2011-06-10 12:40                                     ` Tim Deegan
2011-06-10 15:38                                       ` Igor Mammedov
2011-06-10 13:55                                   ` Igor Mammedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).