From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Vrabel Subject: Re: xen-blkback unmap with network retansmission will cause a coredump Date: Mon, 22 Sep 2014 11:11:30 +0100 Message-ID: <541FF5D2.8030002@citrix.com> References: <541D5D8C.8020604@huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta4.messagelabs.com ([85.158.143.247]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1XW0an-0002oW-15 for xen-devel@lists.xenproject.org; Mon, 22 Sep 2014 10:11:37 +0000 In-Reply-To: <541D5D8C.8020604@huawei.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "Chentao(Boby)" , "konrad.wilk" , =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= Cc: meiwanlong@huawei.com, mu.muyang@huawei.com, Yanqiangjun , liuyongan@huawei.com, huangzhichao@huawei.com, xen-devel@lists.xenproject.org, dengguoqiang@huawei.com, zhangmin , wu.wubin@huawei.com List-Id: xen-devel@lists.xenproject.org On 20/09/14 11:57, Chentao(Boby) wrote: > Hi konrad and roger, > > When xen-blkback module executes unmap operation, and at the same > time the skb of network retansmission uses this map page, it will > cause a crash of hostos. > > The crash stack of this problem is like below. > {do_page_fault+0x38e} > {page_fault+0x28} {memcpy+0xb} > {swiotlb_tbl_map_single+0x212} > {swiotlb_map_page+0x17a} > {tg3:tg3_start_xmit+0x656} > {dev_hard_start_xmit+0x334} > {sch_direct_xmit+0x1ae} What dom0 (backend) kernel are you using? Which backend and what storage? > I search website, found citrix engineers has met this problem long > time ago. And I realized citrix engineers solve this problem > according to modify kernel stack. Because this modification is very > large, linux kernel community hasn't accept it until now. I have a > immature thought, in dispatch_rw_block_io function, if this io is a > write operation, we use grant copy hypercall instead of grant map > hypercall. I verify my modification and it can solve this problem. Switching to grant copy will reduce performance significantly in many cases. This was fixed for user space backends by replacing the foreign mapping with a mapping of a scratch page, when unmapping the grant. Something similar should be done for kernel-only foreign mappings. This requires a GNTOP_unmap_and_duplicate hypercall sub-op to allow efficient batching. David