All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bob Liu <bob.liu@oracle.com>
To: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xenproject.org,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Jenny Herbert <jennifer.herbert@citrix.com>
Subject: Re: [PATCHv5 12/14] xen-blkback: safely unmap grants in case they are still in use
Date: Mon, 09 Mar 2015 17:09:05 +0800	[thread overview]
Message-ID: <54FD6331.8040503@oracle.com> (raw)
In-Reply-To: <1422377057-19221-13-git-send-email-david.vrabel@citrix.com>

Hi David,

Recently I met an issue which is likely related with this patch. It
happened when running block benchmark on domU, the backend was an iSCSI
disk connected to dom0. I got below panic at put_page_testzero() on
dom0, at that time the ixgbe network card was freeing skb pages in
__skb_frag_unref() but the page->_count was already 0.
Do you think is it possiable that page was already freed by blkback?

Thanks,
-Bob
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:288!
invalid opcode: 0000 [#1] SMP
Modules linked in: dm_queue_length ib_iser rdma_cm ib_cm iw_cm ib_sa
ib_mad ib_core ib_addr iscsi_tcp xt_mac xt_nat nf_conntrack_netlink
xt_conntrack ipt_REJECT xt_TCPMSS xt_comment xt_connmark iptable_raw
xt_REDIRECT ext4 jbd2 xt_state xt_NFQUEUE iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_gre gre
nfnetlink_queue nfnetlink ip6table_filter ip6_tables ebtable_nat
ebtables softdog iptable_filter ip_tables xen_pciback xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd 8021q
garp bridge stp llc sunrpc bonding mlx4_en mlx4_core ipv6 ipmi_devintf
ipmi_si ipmi_msghandler vhost_net macvtap macvlan tun iTCO_wdt
iTCO_vendor_support coretemp freq_table mperf intel_powerclamp
ghash_clmulni_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core
shpchp ioatdma sg ext3 jbd mbcache dm_round_robin sd_mod crc_t10dif
aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci
megaraid_sas ixgbe hwmon dca dm_multipath dm_mirror dm_region_hash
dm_log dm_mod crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i
libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi
scsi_transport_iscsi [last unloaded: bonding]
CPU 0
Pid: 31309, comm: kworker/0:0 Tainted: G        W
3.8.13-48.el6uek.20150306.x86_64 #2 Oracle Corporation SUN FIRE X4170 M3
    /ASSY,MOTHERBOARD,1U
RIP: e030:[<ffffffff8113d481>]  [<ffffffff8113d481>] put_page+0x31/0x50
RSP: e02b:ffff880278e03d10  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8802692257b8 RCX: 00000000ffffffff
RDX: ffff88026ea4b2c0 RSI: 0000000000000200 RDI: ffffea00088670c0
RBP: ffff880278e03d10 R08: ffff88026a6e4500 R09: ffff880270a25098
R10: 0000000000000001 R11: ffff880278e03cf0 R12: 0000000000000006
R13: 00000000ffffff8e R14: ffff880270a25098 R15: ffff88026c95d9f0
FS:  00007fbb497cf700(0000) GS:ffff880278e00000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000007aaed0 CR3: 00000002703c2000 CR4: 0000000000042660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:0 (pid: 31309, threadinfo ffff88020c608000, task
ffff880200e24140)
Stack:
 ffff880278e03d30 ffffffff814c3855 ffff8802692257b8 ffff8802692257b8
 ffff880278e03d50 ffffffff814c38ee 0000000000000000 ffffc9001188faa0
 ffff880278e03d70 ffffffff814c3ff1 ffffc9001188faa0 ffff88026c95d8e0
Call Trace:
 <IRQ>
 [<ffffffff814c3855>] skb_release_data+0x75/0xf0
 [<ffffffff814c38ee>] __kfree_skb+0x1e/0xa0
 [<ffffffff814c3ff1>] consume_skb+0x31/0x70
 [<ffffffff814ce6ed>] dev_kfree_skb_any+0x3d/0x50
 [<ffffffffa01d0bdc>] ixgbe_clean_tx_irq+0xac/0x530 [ixgbe]
 [<ffffffffa01d10b3>] ixgbe_poll+0x53/0x1a0 [ixgbe]
 [<ffffffff814d3d05>] net_rx_action+0x105/0x2b0
 [<ffffffff81066587>] __do_softirq+0xd7/0x240
 [<ffffffff815a7c5c>] call_softirq+0x1c/0x30
 [<ffffffff810174b5>] do_softirq+0x65/0xa0
 [<ffffffff8106636d>] irq_exit+0xbd/0xe0
 [<ffffffff8133d3e5>] xen_evtchn_do_upcall+0x35/0x50
 [<ffffffff815a7cbe>] xen_do_hypervisor_callback+0x1e/0x30
 <EOI>
 [<ffffffff8100128a>] ? xen_hypercall_grant_table_op+0xa/0x20
 [<ffffffff8100128a>] ? xen_hypercall_grant_table_op+0xa/0x20
 [<ffffffff8133a4e6>] ? gnttab_unmap_refs+0x26/0x70
 [<ffffffff8133a5ba>] ? __gnttab_unmap_refs_async+0x8a/0xb0
 [<ffffffff8133a672>] ? gnttab_unmap_work+0x22/0x30
 [<ffffffff8107bf10>] ? process_one_work+0x180/0x420
 [<ffffffff8107df4e>] ? worker_thread+0x12e/0x390
 [<ffffffff8107de20>] ? manage_workers+0x180/0x180
 [<ffffffff8108329e>] ? kthread+0xce/0xe0
 [<ffffffff810039ee>] ? xen_end_context_switch+0x1e/0x30
 [<ffffffff810831d0>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff815a682c>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff810831d0>] ? kthread_freezable_should_stop+0x70/0x70
Code: 66 66 90 66 f7 07 00 c0 75 25 8b 47 1c 85 c0 74 1a f0 ff 4f 1c 0f
94 c0 84 c0 75 06 c9 c3 0f 1f 40 00 e8 43 fd ff ff c9 66 90 c3 <0f> 0b
eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 e8 5b fd ff ff c9
RIP  [<ffffffff8113d481>] put_page+0x31/0x50
 RSP <ffff880278e03d10>
---[ end trace 9f93fe018444fc09 ]---
Kernel panic - not syncing: Fatal exception in interrupt
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

On 01/28/2015 12:44 AM, David Vrabel wrote:
> From: Jennifer Herbert <jennifer.herbert@citrix.com>
> 
> Use gnttab_unmap_refs_async() to wait until the mapped pages are no
> longer in use before unmapping them.
> 
> This allows blkback to use network storage which may retain refs to
> pages in queued skbs after the block I/O has completed.
> 
> Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> Acked-by: Jens Axboe <axboe@kernel.de>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  drivers/block/xen-blkback/blkback.c |  169 ++++++++++++++++++++++++-----------
>  drivers/block/xen-blkback/common.h  |    3 +
>  2 files changed, 122 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 908e630..2a04d34 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -47,6 +47,7 @@
>  #include <asm/xen/hypervisor.h>
>  #include <asm/xen/hypercall.h>
>  #include <xen/balloon.h>
> +#include <xen/grant_table.h>
>  #include "common.h"
>  
>  /*
> @@ -262,6 +263,17 @@ static void put_persistent_gnt(struct xen_blkif *blkif,
>  	atomic_dec(&blkif->persistent_gnt_in_use);
>  }
>  
> +static void free_persistent_gnts_unmap_callback(int result,
> +						struct gntab_unmap_queue_data *data)
> +{
> +	struct completion *c = data->data;
> +
> +	/* BUG_ON used to reproduce existing behaviour,
> +	   but is this the best way to deal with this? */
> +	BUG_ON(result);
> +	complete(c);
> +}
> +
>  static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>                                   unsigned int num)
>  {
> @@ -269,8 +281,17 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>  	struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	struct persistent_gnt *persistent_gnt;
>  	struct rb_node *n;
> -	int ret = 0;
>  	int segs_to_unmap = 0;
> +	struct gntab_unmap_queue_data unmap_data;
> +	struct completion unmap_completion;
> +
> +	init_completion(&unmap_completion);
> +
> +	unmap_data.data = &unmap_completion;
> +	unmap_data.done = &free_persistent_gnts_unmap_callback;
> +	unmap_data.pages = pages;
> +	unmap_data.unmap_ops = unmap;
> +	unmap_data.kunmap_ops = NULL;
>  
>  	foreach_grant_safe(persistent_gnt, n, root, node) {
>  		BUG_ON(persistent_gnt->handle ==
> @@ -285,9 +306,11 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>  
>  		if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST ||
>  			!rb_next(&persistent_gnt->node)) {
> -			ret = gnttab_unmap_refs(unmap, NULL, pages,
> -				segs_to_unmap);
> -			BUG_ON(ret);
> +
> +			unmap_data.count = segs_to_unmap;
> +			gnttab_unmap_refs_async(&unmap_data);
> +			wait_for_completion(&unmap_completion);
> +
>  			put_free_pages(blkif, pages, segs_to_unmap);
>  			segs_to_unmap = 0;
>  		}
> @@ -653,18 +676,14 @@ void xen_blkbk_free_caches(struct xen_blkif *blkif)
>  	shrink_free_pagepool(blkif, 0 /* All */);
>  }
>  
> -/*
> - * Unmap the grant references, and also remove the M2P over-rides
> - * used in the 'pending_req'.
> - */
> -static void xen_blkbk_unmap(struct xen_blkif *blkif,
> -                            struct grant_page *pages[],
> -                            int num)
> +static unsigned int xen_blkbk_unmap_prepare(
> +	struct xen_blkif *blkif,
> +	struct grant_page **pages,
> +	unsigned int num,
> +	struct gnttab_unmap_grant_ref *unmap_ops,
> +	struct page **unmap_pages)
>  {
> -	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> -	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	unsigned int i, invcount = 0;
> -	int ret;
>  
>  	for (i = 0; i < num; i++) {
>  		if (pages[i]->persistent_gnt != NULL) {
> @@ -674,21 +693,95 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
>  		if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
>  			continue;
>  		unmap_pages[invcount] = pages[i]->page;
> -		gnttab_set_unmap_op(&unmap[invcount], vaddr(pages[i]->page),
> +		gnttab_set_unmap_op(&unmap_ops[invcount], vaddr(pages[i]->page),
>  				    GNTMAP_host_map, pages[i]->handle);
>  		pages[i]->handle = BLKBACK_INVALID_HANDLE;
> -		if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
> -			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages,
> -			                        invcount);
> +		invcount++;
> +       }
> +
> +       return invcount;
> +}
> +
> +static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_queue_data *data)
> +{
> +	struct pending_req* pending_req = (struct pending_req*) (data->data);
> +	struct xen_blkif *blkif = pending_req->blkif;
> +
> +	/* BUG_ON used to reproduce existing behaviour,
> +	   but is this the best way to deal with this? */
> +	BUG_ON(result);
> +
> +	put_free_pages(blkif, data->pages, data->count);
> +	make_response(blkif, pending_req->id,
> +		      pending_req->operation, pending_req->status);
> +	free_req(blkif, pending_req);
> +	/*
> +	 * Make sure the request is freed before releasing blkif,
> +	 * or there could be a race between free_req and the
> +	 * cleanup done in xen_blkif_free during shutdown.
> +	 *
> +	 * NB: The fact that we might try to wake up pending_free_wq
> +	 * before drain_complete (in case there's a drain going on)
> +	 * it's not a problem with our current implementation
> +	 * because we can assure there's no thread waiting on
> +	 * pending_free_wq if there's a drain going on, but it has
> +	 * to be taken into account if the current model is changed.
> +	 */
> +	if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
> +		complete(&blkif->drain_complete);
> +	}
> +	xen_blkif_put(blkif);
> +}
> +
> +static void xen_blkbk_unmap_and_respond(struct pending_req *req)
> +{
> +	struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data;
> +	struct xen_blkif *blkif = req->blkif;
> +	struct grant_page **pages = req->segments;
> +	unsigned int invcount;
> +
> +	invcount = xen_blkbk_unmap_prepare(blkif, pages, req->nr_pages,
> +					   req->unmap, req->unmap_pages);
> +
> +	work->data = req;
> +	work->done = xen_blkbk_unmap_and_respond_callback;
> +	work->unmap_ops = req->unmap;
> +	work->kunmap_ops = NULL;
> +	work->pages = req->unmap_pages;
> +	work->count = invcount;
> +
> +	gnttab_unmap_refs_async(&req->gnttab_unmap_data);
> +}
> +
> +
> +/*
> + * Unmap the grant references.
> + *
> + * This could accumulate ops up to the batch size to reduce the number
> + * of hypercalls, but since this is only used in error paths there's
> + * no real need.
> + */
> +static void xen_blkbk_unmap(struct xen_blkif *blkif,
> +                            struct grant_page *pages[],
> +                            int num)
> +{
> +	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> +	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> +	unsigned int invcount = 0;
> +	int ret;
> +
> +	while (num) {
> +		unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
> +		
> +		invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
> +						   unmap, unmap_pages);
> +		if (invcount) {
> +			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>  			BUG_ON(ret);
>  			put_free_pages(blkif, unmap_pages, invcount);
> -			invcount = 0;
>  		}
> -	}
> -	if (invcount) {
> -		ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
> -		BUG_ON(ret);
> -		put_free_pages(blkif, unmap_pages, invcount);
> +		pages += batch;
> +		num -= batch;
>  	}
>  }
>  
> @@ -982,32 +1075,8 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
>  	 * the grant references associated with 'request' and provide
>  	 * the proper response on the ring.
>  	 */
> -	if (atomic_dec_and_test(&pending_req->pendcnt)) {
> -		struct xen_blkif *blkif = pending_req->blkif;
> -
> -		xen_blkbk_unmap(blkif,
> -		                pending_req->segments,
> -		                pending_req->nr_pages);
> -		make_response(blkif, pending_req->id,
> -			      pending_req->operation, pending_req->status);
> -		free_req(blkif, pending_req);
> -		/*
> -		 * Make sure the request is freed before releasing blkif,
> -		 * or there could be a race between free_req and the
> -		 * cleanup done in xen_blkif_free during shutdown.
> -		 *
> -		 * NB: The fact that we might try to wake up pending_free_wq
> -		 * before drain_complete (in case there's a drain going on)
> -		 * it's not a problem with our current implementation
> -		 * because we can assure there's no thread waiting on
> -		 * pending_free_wq if there's a drain going on, but it has
> -		 * to be taken into account if the current model is changed.
> -		 */
> -		if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
> -			complete(&blkif->drain_complete);
> -		}
> -		xen_blkif_put(blkif);
> -	}
> +	if (atomic_dec_and_test(&pending_req->pendcnt))
> +		xen_blkbk_unmap_and_respond(pending_req);
>  }
>  
>  /*
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index f65b807..cc90a84 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -350,6 +350,9 @@ struct pending_req {
>  	struct grant_page	*indirect_pages[MAX_INDIRECT_PAGES];
>  	struct seg_buf		seg[MAX_INDIRECT_SEGMENTS];
>  	struct bio		*biolist[MAX_INDIRECT_SEGMENTS];
> +	struct gnttab_unmap_grant_ref unmap[MAX_INDIRECT_SEGMENTS];
> +	struct page                   *unmap_pages[MAX_INDIRECT_SEGMENTS];
> +	struct gntab_unmap_queue_data gnttab_unmap_data;
>  };
>  

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2015-03-09  9:09 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-27 16:44 [PATCHv5 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
2015-01-27 16:44 ` [PATCHv5 01/14] mm: provide a find_special_page vma operation David Vrabel
2015-01-27 16:44 ` [PATCHv5 02/14] mm: add 'foreign' alias for the 'pinned' page flag David Vrabel
2015-01-27 16:44 ` [PATCHv5 03/14] xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() David Vrabel
2015-01-28 12:33   ` Stefano Stabellini
2015-01-27 16:44 ` [PATCHv5 04/14] xen: remove scratch frames for ballooned pages and m2p override David Vrabel
2015-01-28 12:35   ` Stefano Stabellini
2015-01-27 16:44 ` [PATCHv5 05/14] x86/xen: require ballooned pages for grant maps David Vrabel
2015-01-27 16:44 ` [PATCHv5 06/14] xen/grant-table: add helpers for allocating pages David Vrabel
2015-01-27 16:44 ` [PATCHv5 07/14] xen: mark grant mapped pages as foreign David Vrabel
2015-01-27 16:44 ` [PATCHv5 08/14] xen-netback: use foreign page information from the pages themselves David Vrabel
2015-01-27 16:44 ` [PATCHv5 09/14] xen/grant-table: add a mechanism to safely unmap pages that are in use David Vrabel
2015-01-27 16:44 ` [PATCHv5 10/14] xen/gntdev: convert priv->lock to a mutex David Vrabel
2015-01-27 16:44 ` [PATCHv5 11/14] xen/gntdev: safely unmap grants in case they are still in use David Vrabel
2015-01-27 16:44 ` [PATCHv5 12/14] xen-blkback: " David Vrabel
2015-03-09  9:09   ` Bob Liu [this message]
2015-03-09  9:30     ` David Vrabel
2015-03-09 10:51       ` Bob Liu
2015-03-09 11:02         ` David Vrabel
2015-03-09 11:22           ` Roger Pau Monné
2015-03-09 12:36             ` Bob Liu
2015-03-12  0:08           ` Bob Liu
2015-03-12  3:30             ` Roger Pau Monné
2015-03-12 18:25               ` David Vrabel
2015-03-13  0:43                 ` Bob Liu
2015-01-27 16:44 ` [PATCHv5 13/14] xen/gntdev: mark userspace PTEs as special on x86 PV guests David Vrabel
2015-01-28 12:37   ` Stefano Stabellini
2015-01-27 16:44 ` [PATCHv5 14/14] xen/gntdev: provide find_special_page VMA operation David Vrabel
2015-01-28 14:15 ` [PATCHv5 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
2015-01-30 21:35 ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54FD6331.8040503@oracle.com \
    --to=bob.liu@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=jennifer.herbert@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.