From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Create a iSCSI DomU with disks in another DomU running on the same Dom0 Date: Fri, 11 Jan 2013 10:06:55 -0500 Message-ID: <20130111150655.GB15353@phenom.dumpdata.com> References: <50D41DF3.306@citrix.com> <20121221140320.GD25526@phenom.dumpdata.com> <50D47678.2050903@citrix.com> <20121221173513.GB27893@phenom.dumpdata.com> <50E430B0.3070605@citrix.com> <20130102213621.GA15765@phenom.dumpdata.com> <50EDC3C1.3010100@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline In-Reply-To: <50EDC3C1.3010100@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Roger Pau =?iso-8859-1?Q?Monn=E9?= Cc: xen-devel List-Id: xen-devel@lists.xenproject.org On Wed, Jan 09, 2013 at 08:23:45PM +0100, Roger Pau Monn=E9 wrote: > On 02/01/13 22:36, Konrad Rzeszutek Wilk wrote: > >>> I think we are just swizzling the PFNs with a different MFN when you > >>> do the domU -> domX, using two ring protocols. Weird thought as the > >>> m2p code has checks WARN_ON(PagePrivate(..)) to catch this sort of > >>> thing. > >>> > >>> What happens if the dom0/domU are all 3.8 with the persistent grant > >>> patches? > >> > >> Sorry for the delay, the same error happens when Dom0/DomU is using a > >> persistent grants enabled kernel, although I had to backport the > >> persistent grants patch to 3.2, because I was unable to get iSCSI > >> Enterprise Target dkms working with 3.8. I'm also seeing this messages > >> in the DomU that's running the iSCSI target: > >> > >> [ 511.338845] net_ratelimit: 36 callbacks suppressed > >> [ 511.338851] net eth0: rx->offset: 0, size: 4294967295 > > = > > -1 ?! I saw this somewhere with 9000 MTUs. > > = > >> [ 512.288282] net eth0: rx->offset: 0, size: 4294967295 > >> [ 512.525639] net eth0: rx->offset: 0, size: 4294967295 > >> [ 512.800729] net eth0: rx->offset: 0, size: 4294967295 > > = > > But wow. It is just all over. > > = > > Could you instrument the M2P code to print out the PFN/MFN > > values are they are being altered (along with __builtin_func(1) to > > get an idea who is doing it). Perhaps that will shed light whether > > my theory (that we are overwritting the MFNs) is truly happening. > > It does not help that it ends up using multicalls - so it might be > > that they are being done both in bathess - so there are multiple > > MFN updates. Perhaps the multicalls have two or more changes to the > > same MFN? > = > A little more info, I've found that we are passing FOREIGN_FRAMES in > the sbk fragments on netback. When we try to perform the grant copy > operation using a foreign mfn as source, we hit the error. Here is > the stack trace of the addition of the bogus sbk: > = > [ 107.094109] Pid: 64, comm: kworker/u:5 Not tainted 3.7.0-rc3+ #8 > [ 107.094114] Call Trace: > [ 107.094126] [] xen_netbk_queue_tx_skb+0x16b/0x1aa > [ 107.094135] [] xenvif_start_xmit+0x7b/0x9e > [ 107.094143] [] dev_hard_start_xmit+0x25e/0x3db > [ 107.094151] [] sch_direct_xmit+0x6e/0x150 > [ 107.094159] [] dev_queue_xmit+0x16a/0x360 > [ 107.094168] [] br_dev_queue_push_xmit+0x5c/0x62 > [ 107.094175] [] br_deliver+0x35/0x3f > [ 107.094182] [] br_dev_xmit+0xd7/0xef > [ 107.094189] [] dev_hard_start_xmit+0x25e/0x3db > [ 107.094197] [] ? __alloc_skb+0x8d/0x187 > [ 107.094204] [] dev_queue_xmit+0x2a5/0x360 > [ 107.094212] [] ip_finish_output2+0x25c/0x2b7 > [ 107.094219] [] ip_finish_output+0x76/0x7b > [ 107.094226] [] ip_output+0x3a/0x3c > [ 107.094235] [] dst_output+0xf/0x11 > [ 107.094242] [] ip_local_out+0x5c/0x5e > [ 107.094249] [] ip_queue_xmit+0x2ce/0x2fc > [ 107.094256] [] tcp_transmit_skb+0x746/0x787 > [ 107.094264] [] tcp_write_xmit+0x837/0x949 > [ 107.094273] [] ? virt_to_head_page+0x9/0x2c > [ 107.094281] [] ? ksize+0x1a/0x24 > [ 107.094288] [] ? __alloc_skb+0xa1/0x187 > [ 107.094295] [] __tcp_push_pending_frames+0x2c/0x59 > [ 107.094302] [] tcp_push+0x87/0x89 > [ 107.094309] [] tcp_sendpage+0x448/0x480 > [ 107.094317] [] inet_sendpage+0xa0/0xb5 > [ 107.094327] [] iscsi_sw_tcp_pdu_xmit+0xa2/0x236 > [ 107.094335] [] iscsi_tcp_task_xmit+0x34/0x236 > [ 107.094345] [] ? __spin_time_accum+0x17/0x2e > [ 107.094352] [] ? __xen_spin_lock+0xb7/0xcd > [ 107.094360] [] iscsi_xmit_task+0x52/0x94 > [ 107.094367] [] iscsi_xmitworker+0x1c2/0x2b9 > [ 107.094375] [] ? iscsi_prep_scsi_cmd_pdu+0x604/0x604 > [ 107.094384] [] process_one_work+0x20b/0x2f9 > [ 107.094391] [] worker_thread+0x16b/0x272 > [ 107.094398] [] ? process_one_work+0x2f9/0x2f9 > [ 107.094406] [] kthread+0xb0/0xb8 > [ 107.094414] [] ? kthread_freezable_should_stop+0x5b= /0x5b > [ 107.094422] [] ret_from_fork+0x7c/0xb0 > [ 107.094430] [] ? kthread_freezable_should_stop+0x5b= /0x5b > = > I will try to find out who is setting that sbk frags. Do you have any > idea Konrad? m2p_add_override. > = > =