From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marek =?utf-8?Q?Marczykowski-G=C3=B3recki?= Subject: Re: gntdev/gntalloc and fork? - crash in gntdev Date: Thu, 28 May 2015 01:45:08 +0200 Message-ID: <20150527234508.GA14838@mail-itl> References: <20150430144744.GF919@mail-itl> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0628656981273674691==" Return-path: In-Reply-To: <20150430144744.GF919@mail-itl> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: xen-devel Cc: Boris Ostrovsky , David Vrabel List-Id: xen-devel@lists.xenproject.org --===============0628656981273674691== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="IS0zKkzwUGydFO0o" Content-Disposition: inline --IS0zKkzwUGydFO0o Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 30, 2015 at 04:47:44PM +0200, Marek Marczykowski-G=C3=B3recki w= rote: > Hi, >=20 > What is the proper way to handle shared pages (either side - using > gntdev or gntalloc) regarding fork and possible exec later? The child > process do not need to access those pages in any way, but will map > different one(s), using newly opened FD to the gntdev/gntalloc device. > Should it unmap them and close FD to the device manually just after the > fork? Or the process using gntdev or gntalloc should prevent using fork > at all? >=20 > I'm asking because I get kernel oops[1] in context of such process. This > process uses both gntdev and gntalloc. The PID reported there is a > child, which maps additional pages (using newly opened FD to > /dev/xen/gnt*), but I'm not sure if the crash happens before, after or > at this second mapping (actually vchan connection), or maybe even at > cleanup of this second mapping. The parent process keeps its mappings > for the whole lifetime of its child. I don't have a 100% reliable way > to reproduce this problem, but it happens quite often when I run such > operations in a loop. Any ideas?=20 > The kernel is vanilla 3.19.3, running on Xen 4.4.2. >=20 > The kernel message: > [74376.073464] general protection fault: 0000 [#1] SMP=20 > [74376.073475] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat= _masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 n= f_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_t= emp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_= netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen= _privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mf= n(O) xen_evtchn xen_blkfront > [74376.073522] CPU: 1 PID: 9377 Comm: qrexec-agent Tainted: G O= 3.19.3-4.pvops.qubes.x86_64 #1 > [74376.073528] task: ffff880002442e40 ti: ffff88000032c000 task.ti: ffff8= 8000032c000 > [74376.073532] RIP: e030:[] [] unmap= _if_in_range+0x15/0xd0 [xen_gntdev] > [74376.073543] RSP: e02b:ffff88000032fc08 EFLAGS: 00010292 > [74376.073546] RAX: 0000000000000000 RBX: dead000000100100 RCX: 00007fd86= 16ea000 > [74376.073550] RDX: 00007fd8616ea000 RSI: 00007fd8616e9000 RDI: dead00000= 0100100 > [74376.073554] RBP: ffff88000032fc48 R08: 0000000000000000 R09: 000000000= 0000000 > [74376.073557] R10: ffffea000021bb00 R11: 0000000000000000 R12: 00007fd86= 16e9000 > [74376.073561] R13: 00007fd8616ea000 R14: ffff880012702e40 R15: ffff88001= 2702e70 > [74376.073569] FS: 00007fd8616ca700(0000) GS:ffff880013c80000(0000) knlG= S:0000000000000000 > [74376.073574] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [74376.073577] CR2: 00007fd8616e9458 CR3: 00000000e7af5000 CR4: 000000000= 0042660 > [74376.073582] Stack: > [74376.073584] ffff8800188356c0 00000000000000d0 ffff88000032fc68 000000= 00c64ef797 > [74376.073590] 0000000000000220 dead000000100100 00007fd8616e9000 00007f= d8616ea000 > [74376.073596] ffff88000032fc88 ffffffffa00953c6 ffff88000032fcc8 ffff88= 0012702e70 > [74376.073603] Call Trace: > [74376.073610] [] mn_invl_range_start+0x46/0x90 [xen_g= ntdev] > [74376.073620] [] __mmu_notifier_invalidate_range_star= t+0x5b/0x90 > [74376.073627] [] do_wp_page+0x769/0x820 > [74376.074031] [] handle_mm_fault+0x7fc/0x10c0 > [74376.074031] [] ? radix_tree_lookup+0xd/0x10 > [74376.074031] [] __do_page_fault+0x1dc/0x5a0 > [74376.074031] [] ? mutex_lock+0x16/0x37 > [74376.074031] [] ? evtchn_ioctl+0x118/0x3c0 [xen_evtc= hn] > [74376.074031] [] ? do_vfs_ioctl+0x2f8/0x4f0 > [74376.074031] [] ? do_munmap+0x29f/0x3b0 > [74376.074031] [] do_page_fault+0x31/0x70 > [74376.074031] [] page_fault+0x28/0x30 > [74376.074031] Code: e9 dd fd ff ff 31 c9 31 db e9 20 fe ff ff 0f 1f 84 0= 0 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 ec 2= 8 <48> 8b 47 10 48 85 c0 74 4e 4c 8b 00 49 39 d0 73 46 4c 8b 48 08 > [74376.074031] RIP [] unmap_if_in_range+0x15/0xd0 [xen= _gntdev] > [74376.074031] RSP > [74376.091682] ---[ end trace 2b21c5b714eb1071 ]--- > [74404.069009] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [qre= xec-agent:9379] > [74404.069009] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat= _masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 n= f_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_t= emp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_= netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen= _privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mf= n(O) xen_evtchn xen_blkfront > [74404.069009] CPU: 2 PID: 9379 Comm: qrexec-agent Tainted: G D O= 3.19.3-4.pvops.qubes.x86_64 #1 > [74404.069009] task: ffff880010e24a00 ti: ffff880002470000 task.ti: ffff8= 80002470000 > [74404.069009] RIP: e030:[] [] _raw_= spin_lock+0x21/0x30 > [74404.069009] RSP: e02b:ffff880002473e18 EFLAGS: 00000297 > [74404.069009] RAX: 0000000000000040 RBX: ffff880002345c00 RCX: 000000000= 0018cf8 > [74404.069009] RDX: 0000000000000041 RSI: ffff880002345c00 RDI: ffff88001= 2702e60 > [74404.069009] RBP: ffff880002473e18 R08: ffff880012702240 R09: 000000018= 02a0019 > [74404.069009] R10: ffffea000049c080 R11: ffffffffa00955bf R12: ffff88001= 2702e70 > [74404.069009] R13: ffff880012702e40 R14: ffff8800132c6f20 R15: ffff88001= 2b163c0 > [74404.069009] FS: 00007fd8616ca700(0000) GS:ffff880013d00000(0000) knlG= S:0000000000000000 > [74404.069009] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [74404.069009] CR2: 00007fd8610be098 CR3: 000000000b971000 CR4: 000000000= 0042660 > [74404.069009] Stack: > [74404.069009] ffff880002473e48 ffffffffa0095452 ffff880002473e48 ffff88= 0002345c00 > [74404.069009] ffff880012702e70 0000000000000000 ffff880002473e78 ffffff= ff811e8c2e > [74404.069009] ffff880002473e78 ffff880012702e40 ffff880012702e40 ffff88= 0012d123c8 > [74404.069009] Call Trace: > [74404.069009] [] mn_release+0x22/0x130 [xen_gntdev] > [74404.069009] [] mmu_notifier_unregister+0x4e/0xe0 > [74404.069009] [] gntdev_release+0x60/0xa0 [xen_gntdev] > [74404.069009] [] __fput+0xdf/0x1e0 > [74404.069009] [] ____fput+0xe/0x10 > [74404.069009] [] task_work_run+0xbf/0x100 > [74404.069009] [] do_notify_resume+0x97/0xb0 > [74404.069009] [] int_signal+0x12/0x17 > [74404.069009] Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 8= 9 e5 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 75 04 5d c3 f3 90 0f b6 0= 7 <38> d0 75 f7 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 >=20 >=20 --=20 Best Regards, Marek Marczykowski-G=C3=B3recki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? --IS0zKkzwUGydFO0o Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVZlcEAAoJENuP0xzK19csmU0H/AoTV8SNIHzPkWI9w103pcnF emORD/QYiCBuEAOCVnpH/Avn6rS66IYn5A+xNkj9W2j1GZ3ctex0cjUch8lUH2ha F5nbibvBmR7MkvwcEeH0/L2vZt5skDc75Ns3a9SagTjLlrNUftaFDXTg63jTb3qD +PQG8vzuXVQlhJdtq84opzueAOMpC/9KFKkfziiCVajrdTb8PnN1L78l9/fcnewK 5unMFAvVFEsv4p6FS6dTWzNBWm3gjfBEq4eK4CSnVWh0bMmfrt7v/MtKRYdEn8ci tKxwdD0J/36buu9ItKdVJjXGbeUI/hXeLTOJq5zv0SxVLiIS0o/FUHuUU/NK8yk= =54ac -----END PGP SIGNATURE----- --IS0zKkzwUGydFO0o-- --===============0628656981273674691== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============0628656981273674691==--