From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Vrabel <david.vrabel@citrix.com>
Subject: Re: xen-blkback unmap with network retansmission will
 cause a coredump
Date: Mon, 22 Sep 2014 11:11:30 +0100
Message-ID: <541FF5D2.8030002@citrix.com>
References: <541D5D8C.8020604@huawei.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta4.messagelabs.com ([85.158.143.247])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <david.vrabel@citrix.com>) id 1XW0an-0002oW-15
	for xen-devel@lists.xenproject.org; Mon, 22 Sep 2014 10:11:37 +0000
In-Reply-To: <541D5D8C.8020604@huawei.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: "Chentao(Boby)" <boby.chen@huawei.com>, "konrad.wilk" <konrad.wilk@oracle.com>, =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= <roger.pau@citrix.com>
Cc: meiwanlong@huawei.com, mu.muyang@huawei.com, Yanqiangjun <yanqiangjun@huawei.com>, liuyongan@huawei.com, huangzhichao@huawei.com, xen-devel@lists.xenproject.org, dengguoqiang@huawei.com, zhangmin <rudy.zhangmin@huawei.com>, wu.wubin@huawei.com
List-Id: xen-devel@lists.xenproject.org

On 20/09/14 11:57, Chentao(Boby) wrote:
> Hi konrad and roger,
> 
> When xen-blkback module executes unmap operation, and at the same 
> time the skb of network retansmission uses this map page, it will 
> cause a crash of hostos.
> 
> The crash stack of this problem is like below. 
> <ffffffff8041133e>{do_page_fault+0x38e} 
> <ffffffff8040d9e8>{page_fault+0x28} <ffffffff80223cdb>{memcpy+0xb} 
> <ffffffff802325c2>{swiotlb_tbl_map_single+0x212} 
> <ffffffff8023274a>{swiotlb_map_page+0x17a} 
> <ffffffffa03468e6>{tg3:tg3_start_xmit+0x656} 
> <ffffffff80354d14>{dev_hard_start_xmit+0x334} 
> <ffffffff803721be>{sch_direct_xmit+0x1ae}

What dom0 (backend) kernel are you using?  Which backend and what storage?

> I search website, found citrix engineers has met this problem long
> time ago. And I realized citrix engineers solve this problem
> according to modify kernel stack. Because this modification is very
> large, linux kernel community hasn't accept it until now. I have a
> immature thought, in dispatch_rw_block_io function, if this io is a
> write operation, we use grant copy hypercall instead of grant map
> hypercall. I verify my modification and it can solve this problem.

Switching to grant copy will reduce performance significantly in many cases.

This was fixed for user space backends by replacing the foreign mapping
with a mapping of a scratch page, when unmapping the grant.

Something similar should be done for kernel-only foreign mappings.  This
requires a GNTOP_unmap_and_duplicate hypercall sub-op to allow efficient
batching.

David