Crashing kernel with dom0/libxc gnttab/gntshr

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Crashing kernel with dom0/libxc gnttab/gntshr
@ 2013-07-30 10:50 Vincent Bernardoff
  2013-07-30 10:59 ` Ian Campbell
  0 siblings, 1 reply; 12+ messages in thread
From: Vincent Bernardoff @ 2013-07-30 10:50 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 388 bytes --]

Hi,

The attached program makes my kernel (3.9.9-1-ARCH, stock Archlinux 
kernel) crash with the attached dmesg output.

The program just shares a page from dom0 to dom0, then map the page, 
then unshare the page, and the unsharing makes the kernel crash. I ran 
into this issue while implementing a native OCaml vchan driver.

I'm very much interested in advices/help.

Cheers,

Vincent

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: libxc_gntshr_bug2.c --]
[-- Type: text/x-csrc; name="libxc_gntshr_bug2.c", Size: 1044 bytes --]

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <xenctrl.h>
#include <sys/mman.h>

int main(int argc, char** argv)
{
  void* map_shr;
  void* map_tab;
  uint32_t ref;
  int ret;

  xc_gntshr *shr_h = xc_gntshr_open(NULL, 0);
  if (shr_h == NULL)
    {
      perror("xc_gntshr_open");
      exit(EXIT_FAILURE);
    }

  xc_gnttab *tab_h = xc_gnttab_open(NULL, 0);
  if (tab_h == NULL)
    {
      perror("xc_gnttab_open");
      exit(EXIT_FAILURE);
    }

  map_shr = xc_gntshr_share_pages(shr_h, 0, 1, &ref, 1);
  if (map_shr == NULL)
    {
      perror("xc_gntshr_share_pages");
      exit(EXIT_FAILURE);
    }

  map_tab = xc_gnttab_map_grant_ref(tab_h, 0, ref, PROT_READ|PROT_WRITE);
  if (map_tab == NULL)
    {
      perror("xc_gnttab_map_grant_ref");
      exit(EXIT_FAILURE);
    }

  /* Now we unshare the page */
  ret = xc_gntshr_munmap(shr_h, map_shr, 1);
  if (ret != 0)
    {
      perror("xc_gntshr_munmap");
      exit(EXIT_FAILURE);
    }

  /* At this point, the kernel should complain… */

  return 0;
}

[-- Attachment #3: dmesg.log --]
[-- Type: text/x-log, Size: 13090 bytes --]

[  299.710029] FS:  00007fe69748f700(0000) GS:ffff88011ba40000(0000) knlGS:0000000000000000                                                                                                                      
[  299.710029] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b                                                                                                                                                 
[  299.710029] CR2: 00007fe696d78f30 CR3: 00000000c34fe000 CR4: 0000000000002660                                                                                                                                 
[  299.710029] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                                                                                 
[  299.710029] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400                                                                                                                                 
[  299.876698] Process a.out (pid: 922, threadinfo ffff8800cc3c6000, task ffff8800c34829e0)                                                                                                                      
[  299.876698] Stack:                                                                                                                                                                                            
[  299.876698]  ffff8800cc2dc5b0 ffff8800cc3c7d88 ffff88000251bc60 ffff88000251b980                                                                                                                              
[  299.876698]  ffff88000251b960 ffff88000251b990 ffff8800c34829e0 ffff8800cc3c7dd8                                                                                                                              
[  299.876698]  ffffffffa03e847f ffff88000251b990 ffff880114d50a80 0000000000000000                                                                                                                              
[  299.876698] Call Trace:                                                                                                                                                                                       
[  299.876698]  [<ffffffffa03e847f>] ? mn_release+0x4f/0x130 [xen_gntdev]                                                                                                                                        
[  299.876698]  [<ffffffff8116b0c4>] ? __mmu_notifier_release+0x44/0xc0                                                                                                                                          
[  299.876698]  [<ffffffff81153d09>] ? exit_mmap+0x149/0x170                                                                                                                                                     
[  299.876698]  [<ffffffff814d2a8a>] ? _raw_spin_lock_irqsave+0x1a/0x50                                                                                                                                          
[  299.876698]  [<ffffffff810b5c3a>] ? exit_robust_list+0x6a/0x130                                                                                                                                               
[  299.876698]  [<ffffffff81055209>] ? mmput+0x59/0x120                                                                                                                                                          
[  299.876698]  [<ffffffff8105d97f>] ? do_exit+0x27f/0xab0                                                                                                                                                       
[  299.876698]  [<ffffffff81152b90>] ? do_munmap+0x2b0/0x3e0                                                                                                                                                     
[  299.876698]  [<ffffffff8105e22f>] ? do_group_exit+0x3f/0xa0                                                                                                                                                   
[  299.876698]  [<ffffffff8105e2a4>] ? sys_exit_group+0x14/0x20                                                                                                                                                  
[  299.876698]  [<ffffffff814da89d>] ? system_call_fastpath+0x1a/0x1f                                                                                                                                            
[  299.876698] Code: 00 00 00 d8 02 3c cc 00 88 ff ff ff ff ff ff ff ff ff ff 60 7d 3c cc 00 88 ff ff 30 e0 00 00 00 00 00 00 82 02 01 00 00 00 00 00 <70> 7d 3c cc 00 88 ff ff 2b e0 00 00 00 00 00 00 b0 c5 2d 
cc 00                                                                                                                                                                                                            
[  299.876698] RIP  [<ffff8800cc3c7d60>] 0xffff8800cc3c7d5f                                                                                                                                                      
[  299.876698]  RSP <ffff8800cc3c7d70>                                                                                                                                                                           
[  299.964961] ---[ end trace 2cc41b9c64237359 ]---                                                                                                                                                              
[  299.964962] Fixing recursive fault but reboot is needed!                                                                                                                                                      
[  299.964963] BUG: scheduling while atomic: a.out/922/0x00000002                                                                                                                                                
[  299.964985] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_analog snd_hda_intel snd_hda_codec iTCO_wdt gpio_ich iTCO_vendor_support ppdev evdev dcdbas radeon mperf psmouse tg3 coretemp microcode serio_
raw pcspkr snd_hwdep snd_pcm ttm snd_page_alloc snd_timer drm_kms_helper i2c_i801 snd x38_edac edac_core ptp pps_core lpc_ich libphy drm i2c_algo_bit i2c_core soundcore parport_pc parport button processor xenf
s xen_privcmd xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn nfs lockd sunrpc fscache ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod cdrom sd_mod ahci libahci libata scsi_mod ehc
i_pci uhci_hcd ehci_hcd usbcore usb_common                                                                                                                                                                       
[  299.964987] Pid: 922, comm: a.out Tainted: G    B D      3.9.9-1-ARCH #1                                                                                                                                      
[  299.964987] Call Trace:                                                                                                                                                                                       
[  299.964991]  [<ffffffff814cabcb>] __schedule_bug+0x4d/0x5b                                                                                                                                                    
[  299.964994]  [<ffffffff814d1ae6>] __schedule+0x936/0x940                                                                                                                                                      
[  299.964997]  [<ffffffff81059a29>] ? console_trylock+0x19/0x70                                                                                                                                                 
[  299.964999]  [<ffffffff814d2c86>] ? _raw_spin_unlock+0x36/0x40                                                                                                                                                
[  299.965002]  [<ffffffff8105a3c6>] ? vprintk_emit+0x176/0x4c0                                                                                                                                                  
[  299.965004]  [<ffffffff814ca7ff>] ? printk+0x54/0x56                                                                                                                                                          
[  299.965007]  [<ffffffff814d1b19>] schedule+0x29/0x70                                                                                                                                                          
[  299.965009]  [<ffffffff8105e129>] do_exit+0xa29/0xab0                                                                                                                                                         
[  299.965012]  [<ffffffff8105b731>] ? kmsg_dump+0xc1/0xd0                                                                                                                                                       
[  299.965015]  [<ffffffff814d42c3>] oops_end+0xa3/0xe0                                                                                                                                                          
[  299.965019]  [<ffffffff81018deb>] die+0x4b/0x70                                                                                                                                                               
[  299.965021]  [<ffffffff814d3be0>] do_trap+0x60/0x170                                                                                                                                                          
[  299.965024]  [<ffffffff810163d5>] do_invalid_op+0x95/0xb0                                                                                                                                                     
[  299.965027]  [<ffffffff810085ec>] ? xen_batched_set_pte+0xdc/0x200                                                                                                                                            
[  299.965030]  [<ffffffff814d2a8a>] ? _raw_spin_lock_irqsave+0x1a/0x50                                                                                                                                          
[  299.965032]  [<ffffffff814d2ca2>] ? _raw_spin_unlock_irqrestore+0x12/0x50                                                                                                                                     
[  299.965035]  [<ffffffff814dbb1e>] invalid_op+0x1e/0x30                                                                                                                                                        
[  299.965038]  [<ffffffffa03e847f>] ? mn_release+0x4f/0x130 [xen_gntdev]                                                                                                                                        
[  299.965042]  [<ffffffff8116b0c4>] ? __mmu_notifier_release+0x44/0xc0                                                                                                                                          
[  299.965045]  [<ffffffff81153d09>] ? exit_mmap+0x149/0x170                                                                                                                                                     
[  299.965047]  [<ffffffff814d2a8a>] ? _raw_spin_lock_irqsave+0x1a/0x50                                                                                                                                          
[  299.965050]  [<ffffffff810b5c3a>] ? exit_robust_list+0x6a/0x130                                                                                                                                               
[  299.965055]  [<ffffffff81055209>] ? mmput+0x59/0x120                                                                                                                                                          
[  299.965057]  [<ffffffff8105d97f>] ? do_exit+0x27f/0xab0                                                                                                                                                       
[  299.965060]  [<ffffffff81152b90>] ? do_munmap+0x2b0/0x3e0                                                                                                                                                     
[  299.965062]  [<ffffffff8105e22f>] ? do_group_exit+0x3f/0xa0                                                                                                                                                   
[  299.965065]  [<ffffffff8105e2a4>] ? sys_exit_group+0x14/0x20                                                                                                                                                  
[  299.965067]  [<ffffffff814da89d>] ? system_call_fastpath+0x1a/0x1f

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crashing kernel with dom0/libxc gnttab/gntshr
  2013-07-30 10:50 Crashing kernel with dom0/libxc gnttab/gntshr Vincent Bernardoff
@ 2013-07-30 10:59 ` Ian Campbell
  2013-07-30 13:41   ` Vincent Bernardoff
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Campbell @ 2013-07-30 10:59 UTC (permalink / raw)
  To: Vincent Bernardoff; +Cc: xen-devel

On Tue, 2013-07-30 at 11:50 +0100, Vincent Bernardoff wrote:
> Hi,
> 
> The attached program makes my kernel (3.9.9-1-ARCH, stock Archlinux 
> kernel) crash with the attached dmesg output.

The dmesg output seems to start halfway through a crash message, which
means it is missing the PC etc and may not be the first crash in any
case.

Please could you configure a serial console and try and capture the
first crash message in its entirety. Bonus points if you can avoid
linewrapping the dmesg too ;-)

> The program just shares a page from dom0 to dom0,

Not just from dom0 to dom0 but actually within the same process. I'm not
sure that matters but it is a bit unusual. Are you able to repro this
with two separate processes acting as front vs. backend?

The reason I ask is that it isn't clear if the crash is the process with
its front or back "hat" on, separating the two out would be useful.

>  then map the page, 
> then unshare the page, and the unsharing makes the kernel crash. I ran 
> into this issue while implementing a native OCaml vchan driver.
> 
> I'm very much interested in advices/help.
> 
> Cheers,
> 
> Vincent
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crashing kernel with dom0/libxc gnttab/gntshr
  2013-07-30 10:59 ` Ian Campbell
@ 2013-07-30 13:41   ` Vincent Bernardoff
  2013-07-30 15:50     ` Vincent Bernardoff
  0 siblings, 1 reply; 12+ messages in thread
From: Vincent Bernardoff @ 2013-07-30 13:41 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1133 bytes --]

On 30/07/2013 11:59, Ian Campbell wrote:
> On Tue, 2013-07-30 at 11:50 +0100, Vincent Bernardoff wrote:
>> >Hi,
>> >
>> >The attached program makes my kernel (3.9.9-1-ARCH, stock Archlinux
>> >kernel) crash with the attached dmesg output.
> The dmesg output seems to start halfway through a crash message, which
> means it is missing the PC etc and may not be the first crash in any
> case.
>
> Please could you configure a serial console and try and capture the
> first crash message in its entirety. Bonus points if you can avoid
> linewrapping the dmesg too ;-)
>
>> >The program just shares a page from dom0 to dom0,
> Not just from dom0 to dom0 but actually within the same process. I'm not
> sure that matters but it is a bit unusual. Are you able to repro this
> with two separate processes acting as front vs. backend?
>
> The reason I ask is that it isn't clear if the crash is the process with
> its front or back "hat" on, separating the two out would be useful.
>

Here is the updated version, with a program that calls fork() and here 
is a better dmesg dump as well. The faulty program is the server (sharer).

Vincent

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: libxc_gntshr_bug2.c --]
[-- Type: text/x-csrc; name="libxc_gntshr_bug2.c", Size: 2243 bytes --]

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <xenctrl.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/wait.h>

int main(int argc, char** argv)
{
  void* map_shr;
  void* map_tab;
  int ret;
  int status;
  int s_to_c[2];
  int c_to_s[2];

  /* setup pipes for communication */

  if(pipe(s_to_c) == -1)
    {
      perror("pipe");
      exit(EXIT_FAILURE);
    }
  if(pipe(c_to_s) == -1)
    {
      perror("pipe");
      exit(EXIT_FAILURE);
    }

  if (fork() != 0)  /* Parent code*/
    {
      uint32_t ref;
      char buf[1];
      xc_gntshr *shr_h = xc_gntshr_open(NULL, 0);

      printf("I'm server, with pid %d\n", getpid());

      if (shr_h == NULL)
        {
          perror("xc_gntshr_open");
          exit(EXIT_FAILURE);
        }

      map_shr = xc_gntshr_share_pages(shr_h, 0, 1, &ref, 1);
      if (map_shr == NULL)
        {
          perror("xc_gntshr_share_pages");
          exit(EXIT_FAILURE);
        }

      /* Send the gntref to the client. */
      write(s_to_c[1], &ref, sizeof(uint32_t));

      read(c_to_s[0], buf, 1);
      /* Now we unshare the page */
      ret = xc_gntshr_munmap(shr_h, map_shr, 1);
      if (ret != 0)
        {
          perror("xc_gntshr_munmap");
          exit(EXIT_FAILURE);
        }

      /* At this point, the kernel should complain… */
      /* Waiting for the child to die. */
      wait(&status);
      printf("Children died with status %d\n", status);
      return 0;
    }
  else /* Child code */
    {
      uint32_t ref;
      xc_gnttab *tab_h = xc_gnttab_open(NULL, 0);

      printf("I'm client, with pid %d\n", getpid());

      if (tab_h == NULL)
        {
          perror("xc_gnttab_open");
          exit(EXIT_FAILURE);
        }

      /* Receive the ref from the server. */
      read(s_to_c[0], &ref, sizeof(uint32_t));

      /* Ready to map! */

      map_tab = xc_gnttab_map_grant_ref(tab_h, 0, ref, PROT_READ|PROT_WRITE);
      if (map_tab == NULL)
        {
          perror("xc_gnttab_map_grant_ref");
          exit(EXIT_FAILURE);
        }

      /* Sending a msg to server to indicate that he can now
         unshare. */
      write(c_to_s[1], "\0", 1);

      return 0;
    }

  return 0;
}

[-- Attachment #3: dmesg.log --]
[-- Type: text/x-log, Size: 1641 bytes --]

[ 1461.873885] BUG: Bad page map in process a.out  pte:12bfff127 pmd:cc6c0067
[ 1461.873891] page:ffffea0004afffc0 count:0 mapcount:-1 mapping:          (null) index:0xffffffffffffffff
[ 1461.873893] page flags: 0x2fc000000000c04(referenced|reserved|private)
[ 1461.873898] addr:00007fe1c3f6c000 vm_flags:140400fb anon_vma:          (null) mapping:ffff880114555be8 index:0
[ 1461.873899] vma->vm_ops->fault:           (null)
[ 1461.873911] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0 [xen_gntalloc]
[ 1461.873914] CPU: 1 PID: 1010 Comm: a.out Tainted: G    B        3.10.3-1-ARCH #1
[ 1461.873916] Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A05 05/09/2008
[ 1461.873917]  ffff8800cc407450 ffff8800bea35cc0 ffffffff814bd2df ffff8800bea35d08
[ 1461.873920]  ffffffff81146404 ffffea0004afffc0 ffff880114555be8 ffff8800cc6c0b60
[ 1461.873923]  ffffea0004afffc0 00007fe1c3f6d000 ffff8800bea35e30 00007fe1c3f6c000
[ 1461.873925] Call Trace:
[ 1461.873932]  [<ffffffff814bd2df>] dump_stack+0x19/0x1b
[ 1461.873937]  [<ffffffff81146404>] print_bad_pte+0x1b4/0x270
[ 1461.873939]  [<ffffffff811480c3>] unmap_single_vma+0x803/0x8d0
[ 1461.873944]  [<ffffffff8112e5d0>] ? SyS_readahead+0xb0/0xb0
[ 1461.873948]  [<ffffffff811492f9>] unmap_vmas+0x49/0x90
[ 1461.873951]  [<ffffffff8114ed79>] unmap_region+0x99/0x110
[ 1461.873954]  [<ffffffff8114f2f9>] ? vma_rb_erase+0x129/0x240
[ 1461.873956]  [<ffffffff81150f9a>] do_munmap+0x23a/0x3e0
[ 1461.873958]  [<ffffffff81151181>] vm_munmap+0x41/0x60
[ 1461.873961]  [<ffffffff811520c2>] SyS_munmap+0x22/0x30
[ 1461.873964]  [<ffffffff814ca75d>] system_call_fastpath+0x1a/0x1f

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crashing kernel with dom0/libxc gnttab/gntshr
  2013-07-30 13:41   ` Vincent Bernardoff
@ 2013-07-30 15:50     ` Vincent Bernardoff
  2013-07-30 15:55       ` Ian Campbell
  2013-07-30 16:58       ` David Vrabel
  0 siblings, 2 replies; 12+ messages in thread
From: Vincent Bernardoff @ 2013-07-30 15:50 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 518 bytes --]

I also have a bug using tools/libvchan/vchan-node1:

When killing the server node (sudo ./vchan-node1 server read 0 
/local/domain/0/vchan) before the client node (sudo ./vchan-node1 client 
write 0 /local/domain/0/vchan), the following dmesg error appears.

I'm using Xen unstable (master branch) and stock Archlinux 3.10.3-1-ARCH 
kernel.

Use the following script (setup.sh) if you want to try reproducing it 
with vchan-node1, vchan-node1 indeed needs some xenstore keys to be 
written in order to work correctly.

[-- Attachment #2: dmesg-vchan.log --]
[-- Type: text/x-log, Size: 5709 bytes --]

[  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167 pmd:b9b5c067
[  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:          (null) index:0xffffffffffffffff
[  902.729314] page flags: 0x2fc000000000c14(referenced|dirty|reserved|private)
[  902.729319] addr:00007f02aab28000 vm_flags:140400fb anon_vma:          (null) mapping:ffff88011472a908 index:0
[  902.729320] vma->vm_ops->fault:           (null)
[  902.729332] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0 [xen_gntalloc]
[  902.729335] CPU: 1 PID: 2785 Comm: vchan-node1 Not tainted 3.10.3-1-ARCH #1
[  902.729337] Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A05 05/09/2008
[  902.729338]  ffff88009b4e45c0 ffff88009d537ae8 ffffffff814bd2df ffff88009d537b30
[  902.729341]  ffffffff81146404 ffffea0004afffc0 ffff88011472a908 ffff8800b9b5c940
[  902.729344]  ffffea0004afffc0 00007f02aab29000 ffff88009d537c58 00007f02aab28000
[  902.729346] Call Trace:
[  902.729352]  [<ffffffff814bd2df>] dump_stack+0x19/0x1b
[  902.729356]  [<ffffffff81146404>] print_bad_pte+0x1b4/0x270
[  902.729359]  [<ffffffff811480c3>] unmap_single_vma+0x803/0x8d0
[  902.729363]  [<ffffffff8100a605>] ? __xen_pgd_unpin+0x105/0x290
[  902.729365]  [<ffffffff811492f9>] unmap_vmas+0x49/0x90
[  902.729368]  [<ffffffff81152168>] exit_mmap+0x98/0x170
[  902.729370]  [<ffffffff8100122b>] ? xen_hypercall_xen_version+0xb/0x20
[  902.729374]  [<ffffffff81052b59>] mmput+0x59/0x120
[  902.729377]  [<ffffffff8105b23f>] do_exit+0x27f/0xab0
[  902.729380]  [<ffffffff8100bfd9>] ? xen_irq_enable_direct_reloc+0x4/0x4
[  902.729382]  [<ffffffff814c2a6f>] ? _raw_spin_unlock_irq+0xf/0x50
[  902.729386]  [<ffffffff81085a49>] ? finish_task_switch+0x49/0xe0
[  902.729388]  [<ffffffff8105baef>] do_group_exit+0x3f/0xa0
[  902.729391]  [<ffffffff8106a35d>] get_signal_to_deliver+0x2ad/0x610
[  902.729394]  [<ffffffff81013498>] do_signal+0x48/0x8b0
[  902.729397]  [<ffffffff814c280a>] ? _raw_spin_lock_irqsave+0x1a/0x50
[  902.729399]  [<ffffffff814c280a>] ? _raw_spin_lock_irqsave+0x1a/0x50
[  902.729401]  [<ffffffff814c2a22>] ? _raw_spin_unlock_irqrestore+0x12/0x50
[  902.729404]  [<ffffffff8107b2d8>] ? finish_wait+0x58/0x70
[  902.729408]  [<ffffffffa03a26d9>] ? evtchn_read+0x229/0x240 [xen_evtchn]
[  902.729410]  [<ffffffff8107b400>] ? wake_up_bit+0x30/0x30
[  902.729413]  [<ffffffff81013d68>] do_notify_resume+0x68/0xa0
[  902.729415]  [<ffffffff814caa1a>] int_signal+0x12/0x17
[  902.729417] Disabling lock debugging due to kernel taint
[  902.729458] BUG: Bad page state in process vchan-node1  pfn:12bfff
[  902.729460] page:ffffea0004afffc0 count:0 mapcount:-1 mapping:          (null) index:0xffffffffffffffff
[  902.729462] page flags: 0x2fc000000000c14(referenced|dirty|reserved|private)
[  902.729465] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_analog iTCO_wdt iTCO_vendor_support gpio_ich dcdbas ppdev snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer evdev snd sou
ndcore x38_edac edac_core mperf coretemp microcode radeon psmouse pcspkr ttm serio_raw drm_kms_helper tg3 drm i2c_i801 i2c_algo_bit i2c_core ptp lpc_ich pps_core libphy xenfs xen_privcmd processor button parpo
rt_pc parport xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn nfs lockd sunrpc fscache ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod cdrom sd_mod ahci libahci libata scsi_mod uhc
i_hcd ehci_pci ehci_hcd usbcore usb_common
[  902.729501] CPU: 1 PID: 2785 Comm: vchan-node1 Tainted: G    B        3.10.3-1-ARCH #1
[  902.729503] Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A05 05/09/2008
[  902.729504]  00fc000000000000 ffff88009d537a78 ffffffff814bd2df ffff88009d537a90
[  902.729506]  ffffffff81128f6c 0000000000000000 ffff88009d537ad0 ffffffff81129140
[  902.729508]  ffffea0004afffc0 ffffea0004afffc0 02fc000000000c14 0000000000000000
[  902.729511] Call Trace:
[  902.729514]  [<ffffffff814bd2df>] dump_stack+0x19/0x1b
[  902.729517]  [<ffffffff81128f6c>] bad_page.part.62+0x9c/0xf0
[  902.729520]  [<ffffffff81129140>] free_pages_prepare+0x180/0x1a0
[  902.729522]  [<ffffffff81129bb1>] free_hot_cold_page+0x31/0x150
[  902.729525]  [<ffffffff8112a12e>] free_hot_cold_page_list+0x5e/0xe0
[  902.729528]  [<ffffffff8112f428>] release_pages+0x1d8/0x210
[  902.729530]  [<ffffffff8115cddd>] free_pages_and_swap_cache+0xad/0xd0
[  902.729533]  [<ffffffff81146d0c>] tlb_flush_mmu.part.44+0x4c/0x90
[  902.729535]  [<ffffffff81146e95>] tlb_finish_mmu+0x55/0x60
[  902.729537]  [<ffffffff81152197>] exit_mmap+0xc7/0x170
[  902.729540]  [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
[  902.729542]  [<ffffffff81052b59>] mmput+0x59/0x120
[  902.729545]  [<ffffffff8105b23f>] do_exit+0x27f/0xab0
[  902.729547]  [<ffffffff8100bfd9>] ? xen_irq_enable_direct_reloc+0x4/0x4
[  902.729550]  [<ffffffff814c2a6f>] ? _raw_spin_unlock_irq+0xf/0x50
[  902.729552]  [<ffffffff81085a49>] ? finish_task_switch+0x49/0xe0
[  902.729555]  [<ffffffff8105baef>] do_group_exit+0x3f/0xa0
[  902.729557]  [<ffffffff8106a35d>] get_signal_to_deliver+0x2ad/0x610
[  902.729560]  [<ffffffff81013498>] do_signal+0x48/0x8b0
[  902.729562]  [<ffffffff814c280a>] ? _raw_spin_lock_irqsave+0x1a/0x50
[  902.729564]  [<ffffffff814c280a>] ? _raw_spin_lock_irqsave+0x1a/0x50
[  902.729566]  [<ffffffff814c2a22>] ? _raw_spin_unlock_irqrestore+0x12/0x50
[  902.729569]  [<ffffffff8107b2d8>] ? finish_wait+0x58/0x70
[  902.729572]  [<ffffffffa03a26d9>] ? evtchn_read+0x229/0x240 [xen_evtchn]
[  902.729575]  [<ffffffff8107b400>] ? wake_up_bit+0x30/0x30
[  902.729577]  [<ffffffff81013d68>] do_notify_resume+0x68/0xa0
[  902.729579]  [<ffffffff814caa1a>] int_signal+0x12/0x17

[-- Attachment #3: setup.sh --]
[-- Type: application/x-shellscript, Size: 149 bytes --]

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crashing kernel with dom0/libxc gnttab/gntshr
  2013-07-30 15:50     ` Vincent Bernardoff
@ 2013-07-30 15:55       ` Ian Campbell
  2013-07-30 16:58       ` David Vrabel
  1 sibling, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2013-07-30 15:55 UTC (permalink / raw)
  To: Vincent Bernardoff; +Cc: Daniel De Graaf, xen-devel

Adding Daniel who maintains vchan and I think the kernel side of the
driver in question too to the CC.

On Tue, 2013-07-30 at 16:50 +0100, Vincent Bernardoff wrote:
> I also have a bug using tools/libvchan/vchan-node1:
> 
> When killing the server node (sudo ./vchan-node1 server read 0 
> /local/domain/0/vchan) before the client node (sudo ./vchan-node1 client 
> write 0 /local/domain/0/vchan), the following dmesg error appears.
> 
> I'm using Xen unstable (master branch) and stock Archlinux 3.10.3-1-ARCH 
> kernel.
> 
> Use the following script (setup.sh) if you want to try reproducing it 
> with vchan-node1, vchan-node1 indeed needs some xenstore keys to be 
> written in order to work correctly.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crashing kernel with dom0/libxc gnttab/gntshr
  2013-07-30 15:50     ` Vincent Bernardoff
  2013-07-30 15:55       ` Ian Campbell
@ 2013-07-30 16:58       ` David Vrabel
  2013-07-30 21:03         ` Daniel De Graaf
  1 sibling, 1 reply; 12+ messages in thread
From: David Vrabel @ 2013-07-30 16:58 UTC (permalink / raw)
  To: Vincent Bernardoff; +Cc: Daniel De Graaf, Stefano Stabellini, xen-devel

On 30/07/13 16:50, Vincent Bernardoff wrote:
> I also have a bug using tools/libvchan/vchan-node1:
> 
> When killing the server node (sudo ./vchan-node1 server read 0
> /local/domain/0/vchan) before the client node (sudo ./vchan-node1 client
> write 0 /local/domain/0/vchan), the following dmesg error appears.

Does this only happen if both client and server are in the same domain?
 Have you tested it using two domains? Did it work?

> I'm using Xen unstable (master branch) and stock Archlinux 3.10.3-1-ARCH
> kernel.
> 
> Use the following script (setup.sh) if you want to try reproducing it
> with vchan-node1, vchan-node1 indeed needs some xenstore keys to be
> written in order to work correctly.

[  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167
pmd:b9b5c067
[  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
   (null) index:0xffffffffffffffff

I think this is the test for page_mapcount(page) < 0 in zap_pte_range().
 This has looked up the page using the PTE it is trying to clear.  Has
it found the correct page?  Since the MFN is currently mapped into the
same domain, has the m2p_override stuff confused the look up and it is
checking the grantee page not the granter?

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crashing kernel with dom0/libxc gnttab/gntshr
  2013-07-30 16:58       ` David Vrabel
@ 2013-07-30 21:03         ` Daniel De Graaf
  2013-08-02 13:50           ` Stefano Stabellini
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel De Graaf @ 2013-07-30 21:03 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Ian Campbell, Stefano Stabellini, Vincent Bernardoff

On 07/30/2013 12:58 PM, David Vrabel wrote:
[...]
>
> [  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167
> pmd:b9b5c067
> [  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
>     (null) index:0xffffffffffffffff
>
> I think this is the test for page_mapcount(page) < 0 in zap_pte_range().
>   This has looked up the page using the PTE it is trying to clear.  Has
> it found the correct page?  Since the MFN is currently mapped into the
> same domain, has the m2p_override stuff confused the look up and it is
> checking the grantee page not the granter?
>
> David

I think something like this is happening, since while reproducing this
on my test system, some linked list corruption was found that I believe
to be the cause of this problem. The gnttab_map_refs function on PV uses
m2p_add_override on the page, which threads page->lru to an
m2p_overrides list. However, something else is using page->lru during
the use of gntdev, as shown by the following debug patch:

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 3c8803f..198e57e 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map)
  	if (err)
  		return err;
  
+	printk("map page0 lru: %p prev=%p:%p next=%p:%p\n",
+		&map->pages[0]->lru,
+		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
+		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
+
  	for (i = 0; i < map->count; i++) {
  		if (map->map_ops[i].status)
  			err = -EINVAL;
@@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
  		}
  	}
  
+	printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n",
+		&map->pages[0]->lru,
+		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
+		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
  	err = gnttab_unmap_refs(map->unmap_ops + offset,
  			use_ptemod ? map->kmap_ops + offset : NULL, map->pages + offset,
			pages);

Output:
[   88.610644] map page0 lru: ffffea0001dee160 prev=ffffffff82f2d510:ffffea0001dee160 next=ffffffff82f2d510:ffffea0001dee160
[   88.611515] BUG: Bad page map in process a.out  pte:8000000077b85167 pmd:2541a067
[   88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping:          (null) index:0xffffffffffffffff
[   88.611532] page flags: 0x1000000000000814(referenced|dirty|private)
[   88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma:          (null) mapping:ffff8800692974a0 index:0
[   88.611547] vma->vm_ops->fault:           (null)
[   88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0
[...backtrace cropped...]
[   88.614301] unmap page0 lru: ffffea0001dee160 prev=ffff8800254c9d08:ffff88001ea0b120 next=ffff8800254c9d08:ffff88001ea0b938

The initial map is a linked list with only that element, so the address
0xffffffff82f2d510 is the m2p_overrides entry. This means the page being
found by zap_pte_range is not a valid struct page.

The struct page* being used by the gntalloc device was 0xffffea0000952740,
for reference; it's not a direct collision between the page used by the
gntdev and gntalloc devices.

Not sure what the best fix is for this at the moment.

-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Crashing kernel with dom0/libxc gnttab/gntshr
  2013-07-30 21:03         ` Daniel De Graaf
@ 2013-08-02 13:50           ` Stefano Stabellini
  2013-08-02 14:10             ` Ian Campbell
  2013-08-02 16:49             ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 12+ messages in thread
From: Stefano Stabellini @ 2013-08-02 13:50 UTC (permalink / raw)
  To: Daniel De Graaf
  Cc: Jeremy Fitzhardinge, Ian Campbell, Stefano Stabellini, xen-devel,
	David Vrabel, Vincent Bernardoff

On Tue, 30 Jul 2013, Daniel De Graaf wrote:
> On 07/30/2013 12:58 PM, David Vrabel wrote:
> [...]
> > 
> > [  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167
> > pmd:b9b5c067
> > [  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
> >     (null) index:0xffffffffffffffff
> > 
> > I think this is the test for page_mapcount(page) < 0 in zap_pte_range().
> >   This has looked up the page using the PTE it is trying to clear.  Has
> > it found the correct page?  Since the MFN is currently mapped into the
> > same domain, has the m2p_override stuff confused the look up and it is
> > checking the grantee page not the granter?
> > 
> > David
> 
> I think something like this is happening, since while reproducing this
> on my test system, some linked list corruption was found that I believe
> to be the cause of this problem. The gnttab_map_refs function on PV uses
> m2p_add_override on the page, which threads page->lru to an
> m2p_overrides list. However, something else is using page->lru during
> the use of gntdev, as shown by the following debug patch:

I have never managed to prove that something else is trying to use
page->lru while the m2p_override is using it.

Jeremy, at the time the code was written, you were pretty confident
that page->lru couldn't be used by anybody else.
Why was that?



> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 3c8803f..198e57e 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map)
>  	if (err)
>  		return err;
>  +	printk("map page0 lru: %p prev=%p:%p next=%p:%p\n",
> +		&map->pages[0]->lru,
> +		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
> +		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
> +
>  	for (i = 0; i < map->count; i++) {
>  		if (map->map_ops[i].status)
>  			err = -EINVAL;
> @@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map, int
> offset, int pages)
>  		}
>  	}
>  +	printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n",
> +		&map->pages[0]->lru,
> +		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
> +		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
>  	err = gnttab_unmap_refs(map->unmap_ops + offset,
>  			use_ptemod ? map->kmap_ops + offset : NULL, map->pages
> + offset,
> 			pages);
> 
> Output:
> [   88.610644] map page0 lru: ffffea0001dee160
> prev=ffffffff82f2d510:ffffea0001dee160 next=ffffffff82f2d510:ffffea0001dee160
> [   88.611515] BUG: Bad page map in process a.out  pte:8000000077b85167
> pmd:2541a067
> [   88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping:
> (null) index:0xffffffffffffffff
> [   88.611532] page flags: 0x1000000000000814(referenced|dirty|private)
> [   88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma:
> (null) mapping:ffff8800692974a0 index:0
> [   88.611547] vma->vm_ops->fault:           (null)
> [   88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0
> [...backtrace cropped...]
> [   88.614301] unmap page0 lru: ffffea0001dee160
> prev=ffff8800254c9d08:ffff88001ea0b120 next=ffff8800254c9d08:ffff88001ea0b938
> 
> The initial map is a linked list with only that element, so the address
> 0xffffffff82f2d510 is the m2p_overrides entry. This means the page being
> found by zap_pte_range is not a valid struct page.
> 
> The struct page* being used by the gntalloc device was 0xffffea0000952740,
> for reference; it's not a direct collision between the page used by the
> gntdev and gntalloc devices.
> 
> Not sure what the best fix is for this at the moment.
> 
> -- 
> Daniel De Graaf
> National Security Agency
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crashing kernel with dom0/libxc gnttab/gntshr
  2013-08-02 13:50           ` Stefano Stabellini
@ 2013-08-02 14:10             ` Ian Campbell
  2013-08-02 16:49             ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2013-08-02 14:10 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Daniel De Graaf, xen-devel, David Vrabel, Jeremy Fitzhardinge,
	Vincent Bernardoff

On Fri, 2013-08-02 at 14:50 +0100, Stefano Stabellini wrote:
> On Tue, 30 Jul 2013, Daniel De Graaf wrote:
> > On 07/30/2013 12:58 PM, David Vrabel wrote:
> > [...]
> > > 
> > > [  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167
> > > pmd:b9b5c067
> > > [  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
> > >     (null) index:0xffffffffffffffff
> > > 
> > > I think this is the test for page_mapcount(page) < 0 in zap_pte_range().
> > >   This has looked up the page using the PTE it is trying to clear.  Has
> > > it found the correct page?  Since the MFN is currently mapped into the
> > > same domain, has the m2p_override stuff confused the look up and it is
> > > checking the grantee page not the granter?
> > > 
> > > David
> > 
> > I think something like this is happening, since while reproducing this
> > on my test system, some linked list corruption was found that I believe
> > to be the cause of this problem. The gnttab_map_refs function on PV uses
> > m2p_add_override on the page, which threads page->lru to an
> > m2p_overrides list. However, something else is using page->lru during
> > the use of gntdev, as shown by the following debug patch:
> 
> I have never managed to prove that something else is trying to use
> page->lru while the m2p_override is using it.

Isn't it very much dependent on the actual original owner of the page?

A lot of these fields are free to use by the code which actually called
alloc_page, but for a facility like the m2p_override which can consume
pages from a variety of sources you'd need to be careful about what each
of those callers was doing.

Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crashing kernel with dom0/libxc gnttab/gntshr
  2013-08-02 13:50           ` Stefano Stabellini
  2013-08-02 14:10             ` Ian Campbell
@ 2013-08-02 16:49             ` Jeremy Fitzhardinge
  2013-08-02 17:02               ` Stefano Stabellini
  1 sibling, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2013-08-02 16:49 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Daniel De Graaf, xen-devel, David Vrabel, Ian Campbell,
	Vincent Bernardoff

On 08/02/2013 06:50 AM, Stefano Stabellini wrote:
> On Tue, 30 Jul 2013, Daniel De Graaf wrote:
>> On 07/30/2013 12:58 PM, David Vrabel wrote:
>> [...]
>>> [  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167
>>> pmd:b9b5c067
>>> [  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
>>>     (null) index:0xffffffffffffffff
>>>
>>> I think this is the test for page_mapcount(page) < 0 in zap_pte_range().
>>>   This has looked up the page using the PTE it is trying to clear.  Has
>>> it found the correct page?  Since the MFN is currently mapped into the
>>> same domain, has the m2p_override stuff confused the look up and it is
>>> checking the grantee page not the granter?
>>>
>>> David
>> I think something like this is happening, since while reproducing this
>> on my test system, some linked list corruption was found that I believe
>> to be the cause of this problem. The gnttab_map_refs function on PV uses
>> m2p_add_override on the page, which threads page->lru to an
>> m2p_overrides list. However, something else is using page->lru during
>> the use of gntdev, as shown by the following debug patch:
> I have never managed to prove that something else is trying to use
> page->lru while the m2p_override is using it.
>
> Jeremy, at the time the code was written, you were pretty confident
> that page->lru couldn't be used by anybody else.
> Why was that?

Hm. Probably the reasoning was that page->lru was only used for pages
which in the pagecache, mapped from files, and m2p pages are never
mapped from files. But maybe something else has decided to use lru for
non-mapped pages (transparent hugepage? page dedup?), or are m2p pages
getting into the pagecache somehow?

    J

>
>
>
>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>> index 3c8803f..198e57e 100644
>> --- a/drivers/xen/gntdev.c
>> +++ b/drivers/xen/gntdev.c
>> @@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map)
>>  	if (err)
>>  		return err;
>>  +	printk("map page0 lru: %p prev=%p:%p next=%p:%p\n",
>> +		&map->pages[0]->lru,
>> +		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
>> +		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
>> +
>>  	for (i = 0; i < map->count; i++) {
>>  		if (map->map_ops[i].status)
>>  			err = -EINVAL;
>> @@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map, int
>> offset, int pages)
>>  		}
>>  	}
>>  +	printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n",
>> +		&map->pages[0]->lru,
>> +		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
>> +		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
>>  	err = gnttab_unmap_refs(map->unmap_ops + offset,
>>  			use_ptemod ? map->kmap_ops + offset : NULL, map->pages
>> + offset,
>> 			pages);
>>
>> Output:
>> [   88.610644] map page0 lru: ffffea0001dee160
>> prev=ffffffff82f2d510:ffffea0001dee160 next=ffffffff82f2d510:ffffea0001dee160
>> [   88.611515] BUG: Bad page map in process a.out  pte:8000000077b85167
>> pmd:2541a067
>> [   88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping:
>> (null) index:0xffffffffffffffff
>> [   88.611532] page flags: 0x1000000000000814(referenced|dirty|private)
>> [   88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma:
>> (null) mapping:ffff8800692974a0 index:0
>> [   88.611547] vma->vm_ops->fault:           (null)
>> [   88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0
>> [...backtrace cropped...]
>> [   88.614301] unmap page0 lru: ffffea0001dee160
>> prev=ffff8800254c9d08:ffff88001ea0b120 next=ffff8800254c9d08:ffff88001ea0b938
>>
>> The initial map is a linked list with only that element, so the address
>> 0xffffffff82f2d510 is the m2p_overrides entry. This means the page being
>> found by zap_pte_range is not a valid struct page.
>>
>> The struct page* being used by the gntalloc device was 0xffffea0000952740,
>> for reference; it's not a direct collision between the page used by the
>> gntdev and gntalloc devices.
>>
>> Not sure what the best fix is for this at the moment.
>>
>> -- 
>> Daniel De Graaf
>> National Security Agency
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crashing kernel with dom0/libxc gnttab/gntshr
  2013-08-02 16:49             ` Jeremy Fitzhardinge
@ 2013-08-02 17:02               ` Stefano Stabellini
  2013-08-03 10:06                 ` Ian Campbell
  0 siblings, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2013-08-02 17:02 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ian Campbell, Stefano Stabellini, xen-devel, David Vrabel,
	Vincent Bernardoff, Daniel De Graaf

On Fri, 2 Aug 2013, Jeremy Fitzhardinge wrote:
> On 08/02/2013 06:50 AM, Stefano Stabellini wrote:
> > On Tue, 30 Jul 2013, Daniel De Graaf wrote:
> >> On 07/30/2013 12:58 PM, David Vrabel wrote:
> >> [...]
> >>> [  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167
> >>> pmd:b9b5c067
> >>> [  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
> >>>     (null) index:0xffffffffffffffff
> >>>
> >>> I think this is the test for page_mapcount(page) < 0 in zap_pte_range().
> >>>   This has looked up the page using the PTE it is trying to clear.  Has
> >>> it found the correct page?  Since the MFN is currently mapped into the
> >>> same domain, has the m2p_override stuff confused the look up and it is
> >>> checking the grantee page not the granter?
> >>>
> >>> David
> >> I think something like this is happening, since while reproducing this
> >> on my test system, some linked list corruption was found that I believe
> >> to be the cause of this problem. The gnttab_map_refs function on PV uses
> >> m2p_add_override on the page, which threads page->lru to an
> >> m2p_overrides list. However, something else is using page->lru during
> >> the use of gntdev, as shown by the following debug patch:
> > I have never managed to prove that something else is trying to use
> > page->lru while the m2p_override is using it.
> >
> > Jeremy, at the time the code was written, you were pretty confident
> > that page->lru couldn't be used by anybody else.
> > Why was that?
> 
> Hm. Probably the reasoning was that page->lru was only used for pages
> which in the pagecache, mapped from files, and m2p pages are never
> mapped from files. But maybe something else has decided to use lru for
> non-mapped pages (transparent hugepage? page dedup?), or are m2p pages
> getting into the pagecache somehow?
> 

I think it could be the latter.
For example we have recently changed QEMU not to use O_DIRECT on foreign
grants to work around a network bug in the kernel.
It might be possible that these pages end up in the pagecache after they
have been already added to the m2p.



> >
> >
> >
> >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> >> index 3c8803f..198e57e 100644
> >> --- a/drivers/xen/gntdev.c
> >> +++ b/drivers/xen/gntdev.c
> >> @@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map)
> >>  	if (err)
> >>  		return err;
> >>  +	printk("map page0 lru: %p prev=%p:%p next=%p:%p\n",
> >> +		&map->pages[0]->lru,
> >> +		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
> >> +		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
> >> +
> >>  	for (i = 0; i < map->count; i++) {
> >>  		if (map->map_ops[i].status)
> >>  			err = -EINVAL;
> >> @@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map, int
> >> offset, int pages)
> >>  		}
> >>  	}
> >>  +	printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n",
> >> +		&map->pages[0]->lru,
> >> +		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
> >> +		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
> >>  	err = gnttab_unmap_refs(map->unmap_ops + offset,
> >>  			use_ptemod ? map->kmap_ops + offset : NULL, map->pages
> >> + offset,
> >> 			pages);
> >>
> >> Output:
> >> [   88.610644] map page0 lru: ffffea0001dee160
> >> prev=ffffffff82f2d510:ffffea0001dee160 next=ffffffff82f2d510:ffffea0001dee160
> >> [   88.611515] BUG: Bad page map in process a.out  pte:8000000077b85167
> >> pmd:2541a067
> >> [   88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping:
> >> (null) index:0xffffffffffffffff
> >> [   88.611532] page flags: 0x1000000000000814(referenced|dirty|private)
> >> [   88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma:
> >> (null) mapping:ffff8800692974a0 index:0
> >> [   88.611547] vma->vm_ops->fault:           (null)
> >> [   88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0
> >> [...backtrace cropped...]
> >> [   88.614301] unmap page0 lru: ffffea0001dee160
> >> prev=ffff8800254c9d08:ffff88001ea0b120 next=ffff8800254c9d08:ffff88001ea0b938
> >>
> >> The initial map is a linked list with only that element, so the address
> >> 0xffffffff82f2d510 is the m2p_overrides entry. This means the page being
> >> found by zap_pte_range is not a valid struct page.
> >>
> >> The struct page* being used by the gntalloc device was 0xffffea0000952740,
> >> for reference; it's not a direct collision between the page used by the
> >> gntdev and gntalloc devices.
> >>
> >> Not sure what the best fix is for this at the moment.
> >>
> >> -- 
> >> Daniel De Graaf
> >> National Security Agency
> >>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crashing kernel with dom0/libxc gnttab/gntshr
  2013-08-02 17:02               ` Stefano Stabellini
@ 2013-08-03 10:06                 ` Ian Campbell
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2013-08-03 10:06 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Jeremy Fitzhardinge, Vincent Bernardoff, xen-devel,
	Daniel De Graaf, David Vrabel

On Fri, 2013-08-02 at 18:02 +0100, Stefano Stabellini wrote:
> On Fri, 2 Aug 2013, Jeremy Fitzhardinge wrote:
> > On 08/02/2013 06:50 AM, Stefano Stabellini wrote:
> > > Jeremy, at the time the code was written, you were pretty confident
> > > that page->lru couldn't be used by anybody else.
> > > Why was that?
> > 
> > Hm. Probably the reasoning was that page->lru was only used for pages
> > which in the pagecache, mapped from files, and m2p pages are never
> > mapped from files. But maybe something else has decided to use lru for
> > non-mapped pages (transparent hugepage? page dedup?), or are m2p pages
> > getting into the pagecache somehow?
> > 
> 
> I think it could be the latter.
> For example we have recently changed QEMU not to use O_DIRECT on foreign
> grants to work around a network bug in the kernel.
> It might be possible that these pages end up in the pagecache after they
> have been already added to the m2p.

Vincent's test programs (one posted at the root of this thread and
another a multiprocess version a few mails in) doesn't do any explicit
I/O on the shared pages at all, it literally doesn't touch them.

The test program is:
	allocate
	share
	map
	unmap
	crash

The second version moves the map/unmap/crash into a separate process
(achieved with fork). I suppose it might still be interesting to split
into two completely separate executables to check for weird cross talk
between share and map in related (i.e. parent-child) processes.

I hope the gntshr interface locks pages down so that we aren't worrying
about swapping etc, but this doesn't appear to be at all probabilistic
in any case.

Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-08-03 10:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-30 10:50 Crashing kernel with dom0/libxc gnttab/gntshr Vincent Bernardoff
2013-07-30 10:59 ` Ian Campbell
2013-07-30 13:41   ` Vincent Bernardoff
2013-07-30 15:50     ` Vincent Bernardoff
2013-07-30 15:55       ` Ian Campbell
2013-07-30 16:58       ` David Vrabel
2013-07-30 21:03         ` Daniel De Graaf
2013-08-02 13:50           ` Stefano Stabellini
2013-08-02 14:10             ` Ian Campbell
2013-08-02 16:49             ` Jeremy Fitzhardinge
2013-08-02 17:02               ` Stefano Stabellini
2013-08-03 10:06                 ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).