blktap2 and CONFIG_XEN_BLKBACK

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
@ 2010-07-14 23:59 Kaushik Kumar Ram
  2010-07-15 18:19 ` Shriram Rajagopalan
  0 siblings, 1 reply; 11+ messages in thread
From: Kaushik Kumar Ram @ 2010-07-14 23:59 UTC (permalink / raw)
  To: xen-devel

Is it necessary to use blkback_pagemap with blktap2? Since the use of blkback_pagemap is configurable I tried without it and my system crashed (crash dump attached below). Or is it a bug?

I am using about a month old xen-unstable.hg with linux-2.6.18-xen.hg (both 64 bit). 

Thanks.
-Kaushik

(XEN) mm.c:889:d0 Error getting mfn 80765 (pfn 3fba6) from L1 entry 8000000080765027 for l1e_owner=0, pg_owner=0
(XEN) mm.c:5046:d0 ptwr_emulate: could not get_page_from_l1e()
Unable to handle kernel paging request at ffff8800388f6688 RIP: 
 [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0
PGD 1140067 PUD 1141067 PMD 1306067 PTE 80100000388f6065
Oops: 0003 [1] SMP 
CPU 0 
Modules linked in: e1000e sd_mod ata_piix libata thermal fan
Pid: 4183, comm: blkback.1.sda1 Not tainted 2.6.18.8-xen0 #40
RIP: e030:[<ffffffff803dc7d6>]  [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0
RSP: e02b:ffff880039d01840  EFLAGS: 00010297
RAX: 8000000080765027 RBX: ffff8800388f6688 RCX: ffff880039d01908
RDX: 00002b218a8d1000 RSI: ffff880001fb15d0 RDI: ffff8800388f6688
RBP: ffff880039d01850 R08: 00000000000388f6 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000000002c8 R12: ffff8800388f6688
R13: 00002b218a8d1000 R14: 00002b218a8d2000 R15: ffff88003890e2a0
FS:  00002af9674c06e0(0000) GS:ffffffff8058c000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process blkback.1.sda1 (pid: 4183, threadinfo ffff880039d00000, task ffff88003e8cf080)
Stack:  ffff8800388f6688 ffff880001fb15d0 ffff880039d018f0 ffffffff80270033
 0000001a00000039 ffff880039d01908 ffffffff803dc730 ffff88003a714080
 ffff8800389802b0 00002b218a8d2000 00002b218a8d2000 ffff88003c03b430
Call Trace:
 [<ffffffff80270033>] apply_to_page_range+0x4e3/0x590
 [<ffffffff803dc730>] blktap_map_uaddr_fn+0x0/0xc0
 [<ffffffff803dac01>] blktap_map_uaddr+0x21/0x30
 [<ffffffff803db70c>] blktap_device_do_request+0x67c/0xfe0
 [<ffffffff8023f36c>] __mod_timer+0xbc/0xe0
 [<ffffffff802088b0>] __switch_to+0x370/0x5b0
 [<ffffffff8023f1dc>] lock_timer_base+0x2c/0x60
 [<ffffffff8023f9c6>] del_timer+0x56/0x70
 [<ffffffff80344715>] __generic_unplug_device+0x25/0x30
 [<ffffffff803459d0>] generic_unplug_device+0x20/0x60
 [<ffffffff803d3196>] unplug_queue+0x26/0x50
 [<ffffffff803d3dea>] blkif_schedule+0x55a/0x690
 [<ffffffff803d3890>] blkif_schedule+0x0/0x690
 [<ffffffff8024b12a>] kthread+0xda/0x110
 [<ffffffff8020a428>] child_rip+0xa/0x12
 [<ffffffff8024b050>] kthread+0x0/0x110

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
  2010-07-14 23:59 blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP Kaushik Kumar Ram
@ 2010-07-15 18:19 ` Shriram Rajagopalan
  2010-07-15 19:02   ` Kaushik Kumar Ram
  0 siblings, 1 reply; 11+ messages in thread
From: Shriram Rajagopalan @ 2010-07-15 18:19 UTC (permalink / raw)
  To: Kaushik Kumar Ram; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3200 bytes --]

IIRC during my early experiments with blkback & blktap2, I hit a similar
error.. tracing through the code, I gathered that the pagemap stuff is used
to manage page grants to blktap2 kernel driver . So, the #else (ie
!BLKBK_PAGEMAP) code is not going to work.
I suggest, you try to look at the blkback_pagemap.c and the blktap2/device.c
or something like that to get a better picture.

On Wed, Jul 14, 2010 at 4:59 PM, Kaushik Kumar Ram <kaushik@rice.edu> wrote:

> Is it necessary to use blkback_pagemap with blktap2? Since the use of
> blkback_pagemap is configurable I tried without it and my system crashed
> (crash dump attached below). Or is it a bug?
>
> I am using about a month old xen-unstable.hg with linux-2.6.18-xen.hg (both
> 64 bit).
>
> Thanks.
> -Kaushik
>
> (XEN) mm.c:889:d0 Error getting mfn 80765 (pfn 3fba6) from L1 entry
> 8000000080765027 for l1e_owner=0, pg_owner=0
> (XEN) mm.c:5046:d0 ptwr_emulate: could not get_page_from_l1e()
> Unable to handle kernel paging request at ffff8800388f6688 RIP:
>  [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0
> PGD 1140067 PUD 1141067 PMD 1306067 PTE 80100000388f6065
> Oops: 0003 [1] SMP
> CPU 0
> Modules linked in: e1000e sd_mod ata_piix libata thermal fan
> Pid: 4183, comm: blkback.1.sda1 Not tainted 2.6.18.8-xen0 #40
> RIP: e030:[<ffffffff803dc7d6>]  [<ffffffff803dc7d6>]
> blktap_map_uaddr_fn+0xa6/0xc0
> RSP: e02b:ffff880039d01840  EFLAGS: 00010297
> RAX: 8000000080765027 RBX: ffff8800388f6688 RCX: ffff880039d01908
> RDX: 00002b218a8d1000 RSI: ffff880001fb15d0 RDI: ffff8800388f6688
> RBP: ffff880039d01850 R08: 00000000000388f6 R09: 0000000000000000
> R10: 0000000000000000 R11: 00000000000002c8 R12: ffff8800388f6688
> R13: 00002b218a8d1000 R14: 00002b218a8d2000 R15: ffff88003890e2a0
> FS:  00002af9674c06e0(0000) GS:ffffffff8058c000(0000)
> knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000
> Process blkback.1.sda1 (pid: 4183, threadinfo ffff880039d00000, task
> ffff88003e8cf080)
> Stack:  ffff8800388f6688 ffff880001fb15d0 ffff880039d018f0 ffffffff80270033
>  0000001a00000039 ffff880039d01908 ffffffff803dc730 ffff88003a714080
>  ffff8800389802b0 00002b218a8d2000 00002b218a8d2000 ffff88003c03b430
> Call Trace:
>  [<ffffffff80270033>] apply_to_page_range+0x4e3/0x590
>  [<ffffffff803dc730>] blktap_map_uaddr_fn+0x0/0xc0
>  [<ffffffff803dac01>] blktap_map_uaddr+0x21/0x30
>  [<ffffffff803db70c>] blktap_device_do_request+0x67c/0xfe0
>  [<ffffffff8023f36c>] __mod_timer+0xbc/0xe0
>  [<ffffffff802088b0>] __switch_to+0x370/0x5b0
>  [<ffffffff8023f1dc>] lock_timer_base+0x2c/0x60
>  [<ffffffff8023f9c6>] del_timer+0x56/0x70
>  [<ffffffff80344715>] __generic_unplug_device+0x25/0x30
>  [<ffffffff803459d0>] generic_unplug_device+0x20/0x60
>  [<ffffffff803d3196>] unplug_queue+0x26/0x50
>  [<ffffffff803d3dea>] blkif_schedule+0x55a/0x690
>  [<ffffffff803d3890>] blkif_schedule+0x0/0x690
>  [<ffffffff8024b12a>] kthread+0xda/0x110
>  [<ffffffff8020a428>] child_rip+0xa/0x12
>  [<ffffffff8024b050>] kthread+0x0/0x110
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>



-- 
perception is but an offspring of its own self

[-- Attachment #1.2: Type: text/html, Size: 3860 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
  2010-07-15 18:19 ` Shriram Rajagopalan
@ 2010-07-15 19:02   ` Kaushik Kumar Ram
  2010-07-15 22:06     ` Shriram Rajagopalan
  2010-08-06 12:33     ` Jan Beulich
  0 siblings, 2 replies; 11+ messages in thread
From: Kaushik Kumar Ram @ 2010-07-15 19:02 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: xen-devel


On Jul 15, 2010, at 1:19 PM, Shriram Rajagopalan wrote:

> IIRC during my early experiments with blkback & blktap2, I hit a similar error.. tracing through the code, I gathered that the pagemap stuff is used to manage page grants to blktap2 kernel driver . So, the #else (ie !BLKBK_PAGEMAP) code is not going to work.
> I suggest, you try to look at the blkback_pagemap.c and the blktap2/device.c or something like that to get a better picture. 

Thanks Shriram.0 I have been looking at the code over the past few days. Since I am not familiar with the Linux block I/O layers its taking a lot of time! 

It seems like on enabling CONFIG_BLKBACK_PAGEMAP the grant mechanism is used to map guest pages into user space too. This means the guest pages are mapped twice using the grant mechanism, first into dom0 kernel space (in blkback/blback.c) and then into tapdisk process's address space (in blktap2/device.c).  This is the new implementation of blkback.

On disabling CONFIG_BLKBACK_PAGEMAP, the code falls back on the old implementation. Here, the guest pages are mapped into user space by directly manipulating the page tables without going through the grant mechanism. (Things seem slightly different when XENFEAT_auto_translated_physmap is set but I will ignore that for now). First, does the old way still work? The problem seems to arise when the page table entry is set in blktap_umap_uaddr_fn() (in blktap2/device.c) which leads to a page fault and Xen does not seem to like this page fault to handle it correctly and this results in a panic. Why is the page table entry set directly without using a hypercall here? 

Any further explanation will be much appreciated.

Thanks.
-Kaushik 

> On Wed, Jul 14, 2010 at 4:59 PM, Kaushik Kumar Ram <kaushik@rice.edu> wrote:
> Is it necessary to use blkback_pagemap with blktap2? Since the use of blkback_pagemap is configurable I tried without it and my system crashed (crash dump attached below). Or is it a bug?
> 
> I am using about a month old xen-unstable.hg with linux-2.6.18-xen.hg (both 64 bit).
> 
> Thanks.
> -Kaushik
> 
> (XEN) mm.c:889:d0 Error getting mfn 80765 (pfn 3fba6) from L1 entry 8000000080765027 for l1e_owner=0, pg_owner=0
> (XEN) mm.c:5046:d0 ptwr_emulate: could not get_page_from_l1e()
> Unable to handle kernel paging request at ffff8800388f6688 RIP:
>  [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0
> PGD 1140067 PUD 1141067 PMD 1306067 PTE 80100000388f6065
> Oops: 0003 [1] SMP
> CPU 0
> Modules linked in: e1000e sd_mod ata_piix libata thermal fan
> Pid: 4183, comm: blkback.1.sda1 Not tainted 2.6.18.8-xen0 #40
> RIP: e030:[<ffffffff803dc7d6>]  [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0
> RSP: e02b:ffff880039d01840  EFLAGS: 00010297
> RAX: 8000000080765027 RBX: ffff8800388f6688 RCX: ffff880039d01908
> RDX: 00002b218a8d1000 RSI: ffff880001fb15d0 RDI: ffff8800388f6688
> RBP: ffff880039d01850 R08: 00000000000388f6 R09: 0000000000000000
> R10: 0000000000000000 R11: 00000000000002c8 R12: ffff8800388f6688
> R13: 00002b218a8d1000 R14: 00002b218a8d2000 R15: ffff88003890e2a0
> FS:  00002af9674c06e0(0000) GS:ffffffff8058c000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000
> Process blkback.1.sda1 (pid: 4183, threadinfo ffff880039d00000, task ffff88003e8cf080)
> Stack:  ffff8800388f6688 ffff880001fb15d0 ffff880039d018f0 ffffffff80270033
>  0000001a00000039 ffff880039d01908 ffffffff803dc730 ffff88003a714080
>  ffff8800389802b0 00002b218a8d2000 00002b218a8d2000 ffff88003c03b430
> Call Trace:
>  [<ffffffff80270033>] apply_to_page_range+0x4e3/0x590
>  [<ffffffff803dc730>] blktap_map_uaddr_fn+0x0/0xc0
>  [<ffffffff803dac01>] blktap_map_uaddr+0x21/0x30
>  [<ffffffff803db70c>] blktap_device_do_request+0x67c/0xfe0
>  [<ffffffff8023f36c>] __mod_timer+0xbc/0xe0
>  [<ffffffff802088b0>] __switch_to+0x370/0x5b0
>  [<ffffffff8023f1dc>] lock_timer_base+0x2c/0x60
>  [<ffffffff8023f9c6>] del_timer+0x56/0x70
>  [<ffffffff80344715>] __generic_unplug_device+0x25/0x30
>  [<ffffffff803459d0>] generic_unplug_device+0x20/0x60
>  [<ffffffff803d3196>] unplug_queue+0x26/0x50
>  [<ffffffff803d3dea>] blkif_schedule+0x55a/0x690
>  [<ffffffff803d3890>] blkif_schedule+0x0/0x690
>  [<ffffffff8024b12a>] kthread+0xda/0x110
>  [<ffffffff8020a428>] child_rip+0xa/0x12
>  [<ffffffff8024b050>] kthread+0x0/0x110
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> 
> -- 
> perception is but an offspring of its own self

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
  2010-07-15 19:02   ` Kaushik Kumar Ram
@ 2010-07-15 22:06     ` Shriram Rajagopalan
  2010-07-16 20:16       ` Daniel Stodden
  2010-08-06 12:33     ` Jan Beulich
  1 sibling, 1 reply; 11+ messages in thread
From: Shriram Rajagopalan @ 2010-07-15 22:06 UTC (permalink / raw)
  To: Kaushik Kumar Ram; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5231 bytes --]

On Thu, Jul 15, 2010 at 12:02 PM, Kaushik Kumar Ram <kaushik@rice.edu>wrote:

>
> On Jul 15, 2010, at 1:19 PM, Shriram Rajagopalan wrote:
>
> > IIRC during my early experiments with blkback & blktap2, I hit a similar
> error.. tracing through the code, I gathered that the pagemap stuff is used
> to manage page grants to blktap2 kernel driver . So, the #else (ie
> !BLKBK_PAGEMAP) code is not going to work.
> > I suggest, you try to look at the blkback_pagemap.c and the
> blktap2/device.c or something like that to get a better picture.
>
> Thanks Shriram.0 I have been looking at the code over the past few days.
> Since I am not familiar with the Linux block I/O layers its taking a lot of
> time!
>
> It seems like on enabling CONFIG_BLKBACK_PAGEMAP the grant mechanism is
> used to map guest pages into user space too. This means the guest pages are
> mapped twice using the grant mechanism, first into dom0 kernel space (in
> blkback/blback.c) and then into tapdisk process's address space (in
> blktap2/device.c).  This is the new implementation of blkback.
>
> yep..




> On disabling CONFIG_BLKBACK_PAGEMAP, the code falls back on the old
> implementation. Here, the guest pages are mapped into user space by directly
> manipulating the page tables without going through the grant mechanism.
> (Things seem slightly different when XENFEAT_auto_translated_physmap is set
> but I will ignore that for now).

IIRC, that XENFEAT_auto_translated_physmap is kinda deprecated..  it was
used in xen 3.1 or so  I guess.. (basically, it makes pfn = mfn, instead of
the current style : p2m & m2p tables)



> First, does the old way still work?

AFAIK, nope. I am not sure if some other config needs to be set to get that
old code to work. It looks like dead code to me. I cannot figure out the
"backward compatibility" angle to it either.

> The problem seems to arise when the page table entry is set in
> blktap_umap_uaddr_fn() (in blktap2/device.c) which leads to a page fault and
> Xen does not seem to like this page fault to handle it correctly and this
> results in a panic. Why is the page table entry set directly without using a
> hypercall here?
>
> Any further explanation will be much appreciated.
>
> Thanks.
> -Kaushik
>
> > On Wed, Jul 14, 2010 at 4:59 PM, Kaushik Kumar Ram <kaushik@rice.edu>
> wrote:
> > Is it necessary to use blkback_pagemap with blktap2? Since the use of
> blkback_pagemap is configurable I tried without it and my system crashed
> (crash dump attached below). Or is it a bug?
> >
> > I am using about a month old xen-unstable.hg with linux-2.6.18-xen.hg
> (both 64 bit).
> >
> > Thanks.
> > -Kaushik
> >
> > (XEN) mm.c:889:d0 Error getting mfn 80765 (pfn 3fba6) from L1 entry
> 8000000080765027 for l1e_owner=0, pg_owner=0
> > (XEN) mm.c:5046:d0 ptwr_emulate: could not get_page_from_l1e()
> > Unable to handle kernel paging request at ffff8800388f6688 RIP:
> >  [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0
> > PGD 1140067 PUD 1141067 PMD 1306067 PTE 80100000388f6065
> > Oops: 0003 [1] SMP
> > CPU 0
> > Modules linked in: e1000e sd_mod ata_piix libata thermal fan
> > Pid: 4183, comm: blkback.1.sda1 Not tainted 2.6.18.8-xen0 #40
> > RIP: e030:[<ffffffff803dc7d6>]  [<ffffffff803dc7d6>]
> blktap_map_uaddr_fn+0xa6/0xc0
> > RSP: e02b:ffff880039d01840  EFLAGS: 00010297
> > RAX: 8000000080765027 RBX: ffff8800388f6688 RCX: ffff880039d01908
> > RDX: 00002b218a8d1000 RSI: ffff880001fb15d0 RDI: ffff8800388f6688
> > RBP: ffff880039d01850 R08: 00000000000388f6 R09: 0000000000000000
> > R10: 0000000000000000 R11: 00000000000002c8 R12: ffff8800388f6688
> > R13: 00002b218a8d1000 R14: 00002b218a8d2000 R15: ffff88003890e2a0
> > FS:  00002af9674c06e0(0000) GS:ffffffff8058c000(0000)
> knlGS:0000000000000000
> > CS:  e033 DS: 0000 ES: 0000
> > Process blkback.1.sda1 (pid: 4183, threadinfo ffff880039d00000, task
> ffff88003e8cf080)
> > Stack:  ffff8800388f6688 ffff880001fb15d0 ffff880039d018f0
> ffffffff80270033
> >  0000001a00000039 ffff880039d01908 ffffffff803dc730 ffff88003a714080
> >  ffff8800389802b0 00002b218a8d2000 00002b218a8d2000 ffff88003c03b430
> > Call Trace:
> >  [<ffffffff80270033>] apply_to_page_range+0x4e3/0x590
> >  [<ffffffff803dc730>] blktap_map_uaddr_fn+0x0/0xc0
> >  [<ffffffff803dac01>] blktap_map_uaddr+0x21/0x30
> >  [<ffffffff803db70c>] blktap_device_do_request+0x67c/0xfe0
> >  [<ffffffff8023f36c>] __mod_timer+0xbc/0xe0
> >  [<ffffffff802088b0>] __switch_to+0x370/0x5b0
> >  [<ffffffff8023f1dc>] lock_timer_base+0x2c/0x60
> >  [<ffffffff8023f9c6>] del_timer+0x56/0x70
> >  [<ffffffff80344715>] __generic_unplug_device+0x25/0x30
> >  [<ffffffff803459d0>] generic_unplug_device+0x20/0x60
> >  [<ffffffff803d3196>] unplug_queue+0x26/0x50
> >  [<ffffffff803d3dea>] blkif_schedule+0x55a/0x690
> >  [<ffffffff803d3890>] blkif_schedule+0x0/0x690
> >  [<ffffffff8024b12a>] kthread+0xda/0x110
> >  [<ffffffff8020a428>] child_rip+0xa/0x12
> >  [<ffffffff8024b050>] kthread+0x0/0x110
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >
> >
> >
> > --
> > perception is but an offspring of its own self
>
>


-- 
perception is but an offspring of its own self

[-- Attachment #1.2: Type: text/html, Size: 6776 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
  2010-07-15 22:06     ` Shriram Rajagopalan
@ 2010-07-16 20:16       ` Daniel Stodden
  2010-07-17  0:53         ` Jeremy Fitzhardinge
  2010-07-19 13:36         ` Ian Campbell
  0 siblings, 2 replies; 11+ messages in thread
From: Daniel Stodden @ 2010-07-16 20:16 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: Kaushik Kumar Ram, xen-devel@lists.xensource.com

On Thu, 2010-07-15 at 18:06 -0400, Shriram Rajagopalan wrote:
> 
> 
> On Thu, Jul 15, 2010 at 12:02 PM, Kaushik Kumar Ram <kaushik@rice.edu>
> wrote:
>         
>         On Jul 15, 2010, at 1:19 PM, Shriram Rajagopalan wrote:
>         
>         > IIRC during my early experiments with blkback & blktap2, I
>         hit a similar error.. tracing through the code, I gathered
>         that the pagemap stuff is used to manage page grants to
>         blktap2 kernel driver . So, the #else (ie !BLKBK_PAGEMAP) code
>         is not going to work.
>         > I suggest, you try to look at the blkback_pagemap.c and the
>         blktap2/device.c or something like that to get a better
>         picture.
>         
>         
>         Thanks Shriram.0 I have been looking at the code over the past
>         few days. Since I am not familiar with the Linux block I/O
>         layers its taking a lot of time!
>         
>         It seems like on enabling CONFIG_BLKBACK_PAGEMAP the grant
>         mechanism is used to map guest pages into user space too. This
>         means the guest pages are mapped twice using the grant
>         mechanism, first into dom0 kernel space (in blkback/blback.c)
>         and then into tapdisk process's address space (in
>         blktap2/device.c).  This is the new implementation of blkback.
>         
> yep..
> 

Yes, it's pretty mandatory. It's needed to map foreign frames which have
been mapped by blkback back to their grants. I guess the Kconfigs should
reflect that. Didn't expect that it's just set to optional anywhere. 

The reason for the duplicate mapping is that userspace has to re-queue
those frames at the physical device layer, and -- iirc -- the problem
was that queuing pages twice, once on the blktap2 bdev and once on the
underlying disk, will deadlock.

So the second grant map basically creates an alias under a second pfn.
One page locally separate in two frames. Not exactly beautiful, but
effective.

>         On disabling CONFIG_BLKBACK_PAGEMAP, the code falls back on
>         the old implementation. Here, the guest pages are mapped into
>         user space by directly manipulating the page tables without
>         going through the grant mechanism. (Things seem slightly
>         different when XENFEAT_auto_translated_physmap is set but I
>         will ignore that for now). 
> IIRC, that XENFEAT_auto_translated_physmap is kinda deprecated..  it
> was used in xen 3.1 or so  I guess.. (basically, it makes pfn = mfn,
> instead of the current style : p2m & m2p tables)

Yes. That code has been there forever and then got carried over from
blktap1 to blktap2, basically as-is. Even to pvops, where it's probaby
broken. Empirical proof that nobody is using blktap2 with
autotranslation, at least not on recent kernels.

I guess it's going to stay there until autotranslation either gets more
en vogue again or evaporates altogether.

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
  2010-07-16 20:16       ` Daniel Stodden
@ 2010-07-17  0:53         ` Jeremy Fitzhardinge
  2010-07-17  0:56           ` Daniel Stodden
  2010-07-19 13:36         ` Ian Campbell
  1 sibling, 1 reply; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2010-07-17  0:53 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: xen-devel@lists.xensource.com, Kaushik Kumar Ram

On 07/16/2010 01:16 PM, Daniel Stodden wrote:
> Yes. That code has been there forever and then got carried over from
> blktap1 to blktap2, basically as-is. Even to pvops, where it's probaby
> broken. Empirical proof that nobody is using blktap2 with
> autotranslation, at least not on recent kernels.
>
> I guess it's going to stay there until autotranslation either gets more
> en vogue again or evaporates altogether.
>   

auto_translate_physmap will come back if people want to use EPT/NPT with
PV guests.

    J

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
  2010-07-17  0:53         ` Jeremy Fitzhardinge
@ 2010-07-17  0:56           ` Daniel Stodden
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Stodden @ 2010-07-17  0:56 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel@lists.xensource.com, Kaushik Kumar Ram

On Fri, 2010-07-16 at 20:53 -0400, Jeremy Fitzhardinge wrote:
> On 07/16/2010 01:16 PM, Daniel Stodden wrote:
> > Yes. That code has been there forever and then got carried over from
> > blktap1 to blktap2, basically as-is. Even to pvops, where it's probaby
> > broken. Empirical proof that nobody is using blktap2 with
> > autotranslation, at least not on recent kernels.
> >
> > I guess it's going to stay there until autotranslation either gets more
> > en vogue again or evaporates altogether.
> >   
> 
> auto_translate_physmap will come back if people want to use EPT/NPT with
> PV guests.

Ah. I always thought that's going to be rather the replacement. Rather
than being synonymous in the kernel sources. Now that you say it, it
might make sense. :)

Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
  2010-07-16 20:16       ` Daniel Stodden
  2010-07-17  0:53         ` Jeremy Fitzhardinge
@ 2010-07-19 13:36         ` Ian Campbell
  2010-07-19 16:53           ` Daniel Stodden
  1 sibling, 1 reply; 11+ messages in thread
From: Ian Campbell @ 2010-07-19 13:36 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: xen-devel, Kaushik Kumar Ram

On Fri, 2010-07-16 at 21:16 +0100, Daniel Stodden wrote:
> 
> The reason for the duplicate mapping is that userspace has to re-queue
> those frames at the physical device layer, and -- iirc -- the problem
> was that queuing pages twice, once on the blktap2 bdev and once on the
> underlying disk, will deadlock.

I was wondering what the duplicate mappings were for just last week.

So is this need to play tricks with the p2m to avoid a deadlock the only
dependency blktap2 has on Xen? IOW if we could find another way around
the deadlock would a) blktap2 be esable on native and/or b) would all
the Xen specific bits (grant mappings etc) be confined to blkback only?

I guess the difference between blktap and e.g. device mapper is that in
the later case the requeuing is done in the kernel and in the former the
page goes via userspace and hence the association with the original I/O
is lost?

Ian.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
  2010-07-19 13:36         ` Ian Campbell
@ 2010-07-19 16:53           ` Daniel Stodden
  2010-07-19 17:27             ` Jake Wires
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Stodden @ 2010-07-19 16:53 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel@lists.xensource.com, Jake Wires, Kaushik Kumar Ram

On Mon, 2010-07-19 at 09:36 -0400, Ian Campbell wrote:
> On Fri, 2010-07-16 at 21:16 +0100, Daniel Stodden wrote:
> > 
> > The reason for the duplicate mapping is that userspace has to re-queue
> > those frames at the physical device layer, and -- iirc -- the problem
> > was that queuing pages twice, once on the blktap2 bdev and once on the
> > underlying disk, will deadlock.
> 
> I was wondering what the duplicate mappings were for just last week.
> 
> So is this need to play tricks with the p2m to avoid a deadlock the only
> dependency blktap2 has on Xen? IOW if we could find another way around
> the deadlock would a) blktap2 be esable on native and/or b) would all
> the Xen specific bits (grant mappings etc) be confined to blkback only?

[cc Jake. Did most of the mapping code, and still the one who knows best
what prevents that path from getting simpler.]

Both the xen and native datapaths are presently inlined in the same disk
type. The solution to that would be an ops struct to separate the
handling. But that's certainly not a hard problem.

Apart from that, I believe native was more of a problem than blkback.

Only out my memory: Consider non-foreign r/w in dom0. There's going to
be a page lock foregoing queuing on the tapdev. And a second lock
attempt on the path from tapdisk to the physical device, because what
userland is sending down the native I/O path is sold as normal user
memory.

So it's probably rather tribute to zero-copy than anything else. The
problem might evaporate if the physical I/O were bounced off anon
memory. That might be one possible alternative.

Note that the blkback path is different, because it directly goes for
the disk queue, not through the filemap. I'd expect that to just work. 

> I guess the difference between blktap and e.g. device mapper is that in
> the later case the requeuing is done in the kernel and in the former the
> page goes via userspace and hence the association with the original I/O
> is lost?

Yep.

I think another difference was that dm nodes only do request
translation, then just pass them on the the physical layer. So dm nodes
are rather thin compared to a tapdev. But that might not matter here.

Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
  2010-07-19 16:53           ` Daniel Stodden
@ 2010-07-19 17:27             ` Jake Wires
  0 siblings, 0 replies; 11+ messages in thread
From: Jake Wires @ 2010-07-19 17:27 UTC (permalink / raw)
  To: Daniel Stodden, Ian Campbell
  Cc: xen-devel@lists.xensource.com, Kaushik Kumar Ram



> -----Original Message-----
> From: Daniel Stodden
> Sent: Monday, July 19, 2010 9:53 AM
> To: Ian Campbell
> Cc: Shriram Rajagopalan; Kaushik Kumar Ram; xen-devel@lists.xensource.com;
> Jake Wires
> Subject: Re: [Xen-devel] blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
> 
> On Mon, 2010-07-19 at 09:36 -0400, Ian Campbell wrote:
> > On Fri, 2010-07-16 at 21:16 +0100, Daniel Stodden wrote:
> > >
> > > The reason for the duplicate mapping is that userspace has to re-queue
> > > those frames at the physical device layer, and -- iirc -- the problem
> > > was that queuing pages twice, once on the blktap2 bdev and once on the
> > > underlying disk, will deadlock.
> >
> > I was wondering what the duplicate mappings were for just last week.
> >
> > So is this need to play tricks with the p2m to avoid a deadlock the only
> > dependency blktap2 has on Xen? IOW if we could find another way around
> > the deadlock would a) blktap2 be esable on native and/or b) would all
> > the Xen specific bits (grant mappings etc) be confined to blkback only?
> 
> [cc Jake. Did most of the mapping code, and still the one who knows best
> what prevents that path from getting simpler.]
> 
> Both the xen and native datapaths are presently inlined in the same disk
> type. The solution to that would be an ops struct to separate the
> handling. But that's certainly not a hard problem.
> 
> Apart from that, I believe native was more of a problem than blkback.
> 
> Only out my memory: Consider non-foreign r/w in dom0. There's going to
> be a page lock foregoing queuing on the tapdev. And a second lock
> attempt on the path from tapdisk to the physical device, because what
> userland is sending down the native I/O path is sold as normal user
> memory.
> 
> So it's probably rather tribute to zero-copy than anything else. The
> problem might evaporate if the physical I/O were bounced off anon
> memory. That might be one possible alternative.

Daniel is correct -- in the non-xen case, the blktap mapping is used merely
to give us a new (unlocked) page struct that tapdisk can send back down
through the IO stack.  we could do away with this in the non-xen case if we
give up zero-copy.  blktap would still need to use xen to map foreign pages
to tapdisk.

> Note that the blkback path is different, because it directly goes for
> the disk queue, not through the filemap. I'd expect that to just work.
> 
> 
> > I guess the difference between blktap and e.g. device mapper is that in
> > the later case the requeuing is done in the kernel and in the former the
> > page goes via userspace and hence the association with the original I/O
> > is lost?
> 
> Yep.
> 
> I think another difference was that dm nodes only do request
> translation, then just pass them on the the physical layer. So dm nodes
> are rather thin compared to a tapdev. But that might not matter here.
> 
> Daniel
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
  2010-07-15 19:02   ` Kaushik Kumar Ram
  2010-07-15 22:06     ` Shriram Rajagopalan
@ 2010-08-06 12:33     ` Jan Beulich
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2010-08-06 12:33 UTC (permalink / raw)
  To: Shriram Rajagopalan, Kaushik Kumar Ram; +Cc: xen-devel

>>> On 15.07.10 at 21:02, Kaushik Kumar Ram <kaushik@rice.edu> wrote:
> It seems like on enabling CONFIG_BLKBACK_PAGEMAP the grant mechanism is used 
> to map guest pages into user space too. This means the guest pages are mapped 
> twice using the grant mechanism, first into dom0 kernel space (in 
> blkback/blback.c) and then into tapdisk process's address space (in 
> blktap2/device.c).  This is the new implementation of blkback.
> 
> On disabling CONFIG_BLKBACK_PAGEMAP, the code falls back on the old 
> implementation. Here, the guest pages are mapped into user space by directly 
> manipulating the page tables without going through the grant mechanism. 
> (Things seem slightly different when XENFEAT_auto_translated_physmap is set 
> but I will ignore that for now). First, does the old way still work? The 
> problem seems to arise when the page table entry is set in 
> blktap_umap_uaddr_fn() (in blktap2/device.c) which leads to a page fault and 
> Xen does not seem to like this page fault to handle it correctly and this 
> results in a panic. Why is the page table entry set directly without using a 
> hypercall here? 
> 
> Any further explanation will be much appreciated.

How could you have disabled XEN_BLKBACK_PAGEMAP in the first
place? It's a prompt-less option after all (for the very reason that
it's not optional).

Jan 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-08-06 12:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-14 23:59 blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP Kaushik Kumar Ram
2010-07-15 18:19 ` Shriram Rajagopalan
2010-07-15 19:02   ` Kaushik Kumar Ram
2010-07-15 22:06     ` Shriram Rajagopalan
2010-07-16 20:16       ` Daniel Stodden
2010-07-17  0:53         ` Jeremy Fitzhardinge
2010-07-17  0:56           ` Daniel Stodden
2010-07-19 13:36         ` Ian Campbell
2010-07-19 16:53           ` Daniel Stodden
2010-07-19 17:27             ` Jake Wires
2010-08-06 12:33     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).