From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?windows-1252?Q?Roger_Pau_Monn=E9?= <roger.pau@citrix.com>
Subject: Re: xen-blkback unmap with network retansmission will
	cause a coredump
Date: Tue, 23 Sep 2014 16:16:30 +0200
Message-ID: <542180BE.6000409@citrix.com>
References: <541D5D8C.8020604@huawei.com> <541FF362.4070404@citrix.com>
	<54217556.7020002@huawei.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta5.messagelabs.com ([195.245.231.135])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <roger.pau@citrix.com>) id 1XWQuV-0007ii-S8
	for xen-devel@lists.xenproject.org; Tue, 23 Sep 2014 14:17:43 +0000
In-Reply-To: <54217556.7020002@huawei.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: "Chentao(Boby)" <boby.chen@huawei.com>, "konrad.wilk" <konrad.wilk@oracle.com>
Cc: dengguoqiang@huawei.com, meiwanlong@huawei.com, mu.muyang@huawei.com, Yanqiangjun <yanqiangjun@huawei.com>, liuyongan@huawei.com, huangzhichao@huawei.com, xen-devel@lists.xenproject.org, zhangmin <rudy.zhangmin@huawei.com>, wu.wubin@huawei.com
List-Id: xen-devel@lists.xenproject.org

El 23/09/14 a les 15.27, Chentao(Boby) ha escrit:
> =

> =

> On 2014/9/22 18:01, Roger Pau Monn=E9 wrote:
>> El 20/09/14 a les 12.57, Chentao(Boby) ha escrit:
>>> Hi konrad and roger,
>>>
>>>     When xen-blkback module executes unmap operation, and at the same t=
ime the skb of network retansmission uses this map page, it will cause a cr=
ash of hostos.
>>> The crash stack of this problem is like below.
>>> <ffffffff8041133e>{do_page_fault+0x38e}
>>> <ffffffff8040d9e8>{page_fault+0x28}
>>> <ffffffff80223cdb>{memcpy+0xb}
>>> <ffffffff802325c2>{swiotlb_tbl_map_single+0x212}
>>> <ffffffff8023274a>{swiotlb_map_page+0x17a}
>>> <ffffffffa03468e6>{tg3:tg3_start_xmit+0x656}
>>> <ffffffff80354d14>{dev_hard_start_xmit+0x334}
>>> <ffffffff803721be>{sch_direct_xmit+0x1ae}
>>>
>>>     I search website, found citrix engineers has met this problem long =
time ago. And I realized citrix engineers solve this problem according to m=
odify kernel stack.
>>> Because this modification is very large, linux kernel community hasn't =
accept it until now. I have a immature thought, in dispatch_rw_block_io fun=
ction, if this io
>>> is a write operation, we use grant copy hypercall instead of grant map =
hypercall. I verify my modification and it can solve this problem.
>>>
>>>     What's your opinion of my modification? I am very looking forward t=
o your reply. Any reply is appreciated.
>>
>> Hello,
>>
>> Yes, using grant-copy instead of grant-map is going to solve the
>> problem, but it also defeats the purpose of persistent grants. I'm
>> afraid it is going to introduce a noticeable performance penalty.
>>
> Roger, you are right. We found 20%+ performance penalty in 1M 100% sequen=
tial write 128 depth
> =

> when workload is running on ramdisk.
> =

>> IMHO a better solution would be to use GNTTABOP_unmap_and_replace with
>> the scratch balloon page instead of GNTTABOP_unmap_grant_ref. See
>> arch/x86/xen/p2m.c m2p_remove_override for an example implementation of
>> this procedure.
>>
> You mean if we replace GNTTABOP_unmap_grant_ref with GNTTABOP_unmap_and_r=
eplace in xen-blkback module,
> =

> that will solve the problem. Is my understanding right?

Well, it's not a straight replacement. You will need to issue a
multicall that bundles the grant ref replacement and a MMU operation to
update the scratch page VA to point to the MFN. This is because the
grant replace will remove the MFN from the scratch page VA.

You can find an example about how to do this in m2p_remove_override on
the Linux kernel file arch/x86/xen/p2m.c.

Another option would be to introduce a new hypercall like David
suggests, that does a replacement without redirecting <new_addr> to the
null entry, this way you should be able to avoid the multicall.

Roger.