All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
@ 2013-04-18 15:05 Sylvain Munaut
  2013-04-18 19:35 ` Wido den Hollander
  2013-04-19  6:45 ` Pasi Kärkkäinen
  0 siblings, 2 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-04-18 15:05 UTC (permalink / raw)
  To: ceph-devel, xen-devel

Hi,

I've been working on getting a working blktap driver allowing to
access ceph RBD block devices without relying on the RBD kernel driver
and it finally got to a point where, it works and is testable.

Some of the advantages are:
 - Easier to update to newer RBD version
 - Allows functionality only available in the userspace RBD library
(write cache, layering, ...)
 - Less issue when you have OSD as domU on the same dom0
 - Contains crash to user space :p (they shouldn't happen, but ...)

It's still an early prototype, but if you want to give it a shot and
give feedback.

You can find the code there https://github.com/smunaut/blktap/tree/rbd
 (rbd branch).

Currently the username, poolname and image name are hardcoded ...
(look for FIXME in the code). I'll get to that next, once I figured
the best format for arguments.

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-18 15:05 Xen blktap driver for Ceph RBD : Anybody wants to test ? :p Sylvain Munaut
@ 2013-04-18 19:35 ` Wido den Hollander
  2013-04-19 14:37   ` Sylvain Munaut
  2013-04-19  6:45 ` Pasi Kärkkäinen
  1 sibling, 1 reply; 99+ messages in thread
From: Wido den Hollander @ 2013-04-18 19:35 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel

Hi,

On 04/18/2013 05:05 PM, Sylvain Munaut wrote:
> Hi,
>
> I've been working on getting a working blktap driver allowing to
> access ceph RBD block devices without relying on the RBD kernel driver
> and it finally got to a point where, it works and is testable.
>

This is really cool! I already pointed a couple at Citrix at this since 
I'd really love to see this go into XenServer at some point and expand 
the CloudStack RBD support from KVM to XenServer as well.

> Some of the advantages are:
>   - Easier to update to newer RBD version
>   - Allows functionality only available in the userspace RBD library
> (write cache, layering, ...)
>   - Less issue when you have OSD as domU on the same dom0
>   - Contains crash to user space :p (they shouldn't happen, but ...)
>
> It's still an early prototype, but if you want to give it a shot and
> give feedback.
>
> You can find the code there https://github.com/smunaut/blktap/tree/rbd
>   (rbd branch).
>
> Currently the username, poolname and image name are hardcoded ...
> (look for FIXME in the code). I'll get to that next, once I figured
> the best format for arguments.
>

My Xen is kind of rusty, last time I used it was about 3 years ago, but 
can't you do something similar like with Qemu? Just submit all the 
arguments semi-column separated?

> Cheers,
>
>      Sylvain
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-18 15:05 Xen blktap driver for Ceph RBD : Anybody wants to test ? :p Sylvain Munaut
  2013-04-18 19:35 ` Wido den Hollander
@ 2013-04-19  6:45 ` Pasi Kärkkäinen
  2013-04-19 14:41   ` [Xen-devel] " Sylvain Munaut
                     ` (2 more replies)
  1 sibling, 3 replies; 99+ messages in thread
From: Pasi Kärkkäinen @ 2013-04-19  6:45 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel, xen-devel

On Thu, Apr 18, 2013 at 05:05:29PM +0200, Sylvain Munaut wrote:
> Hi,
>

Hi,
 
> I've been working on getting a working blktap driver allowing to
> access ceph RBD block devices without relying on the RBD kernel driver
> and it finally got to a point where, it works and is testable.
> 

Great! Ceph distributed block storage is cool.

> Some of the advantages are:
>  - Easier to update to newer RBD version
>  - Allows functionality only available in the userspace RBD library
> (write cache, layering, ...)
>  - Less issue when you have OSD as domU on the same dom0
>  - Contains crash to user space :p (they shouldn't happen, but ...)
> 
> It's still an early prototype, but if you want to give it a shot and
> give feedback.
> 
> You can find the code there https://github.com/smunaut/blktap/tree/rbd
>  (rbd branch).
> 
> Currently the username, poolname and image name are hardcoded ...
> (look for FIXME in the code). I'll get to that next, once I figured
> the best format for arguments.
> 

If you have time to write up some lines about steps required to test this,
that'd be nice, it'll help people to test this stuff.

Thanks,

-- Pasi

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-18 19:35 ` Wido den Hollander
@ 2013-04-19 14:37   ` Sylvain Munaut
  2013-04-19 14:40     ` Bernard Grymonpon
  0 siblings, 1 reply; 99+ messages in thread
From: Sylvain Munaut @ 2013-04-19 14:37 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel

Hi,

> My Xen is kind of rusty, last time I used it was about 3 years ago, but
> can't you do something similar like with Qemu? Just submit all the arguments
> semi-column separated?

Yes probably, I just didn't get to it. I wanted to check first if this
approach was solving the issues I had with RBD kernel driver.

I'll try to stick as closely to the qemu syntax.

Cheers,

   Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-19 14:37   ` Sylvain Munaut
@ 2013-04-19 14:40     ` Bernard Grymonpon
  2013-04-23 10:02       ` Sylvain Munaut
  0 siblings, 1 reply; 99+ messages in thread
From: Bernard Grymonpon @ 2013-04-19 14:40 UTC (permalink / raw)
  To: ceph-devel

We can test this, but just a couple of lines of input might be needed to get us going with this without digging through all the code.

Rgds,
Bernard
Openminds BVBA

On 19 Apr 2013, at 16:37, Sylvain Munaut <s.munaut@whatever-company.com> wrote:

> Hi,
> 
>> My Xen is kind of rusty, last time I used it was about 3 years ago, but
>> can't you do something similar like with Qemu? Just submit all the arguments
>> semi-column separated?
> 
> Yes probably, I just didn't get to it. I wanted to check first if this
> approach was solving the issues I had with RBD kernel driver.
> 
> I'll try to stick as closely to the qemu syntax.
> 
> Cheers,
> 
>   Sylvain
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-19  6:45 ` Pasi Kärkkäinen
@ 2013-04-19 14:41   ` Sylvain Munaut
  2013-11-29 11:05     ` James Harper
  2013-04-19 14:41   ` Sylvain Munaut
  2013-08-01  2:12   ` James Harper
  2 siblings, 1 reply; 99+ messages in thread
From: Sylvain Munaut @ 2013-04-19 14:41 UTC (permalink / raw)
  To: Pasi Kärkkäinen; +Cc: ceph-devel, xen-devel

> If you have time to write up some lines about steps required to test this,
> that'd be nice, it'll help people to test this stuff.

To quickly test, I compiled the package and just replaced the tapdisk
binary from my "normal" blktap install with the newly compiled one.

Then you need to setup a RBD image named 'test' in the default 'rbd'
pool. You also need to setup a proper ceph.conf and keyring file on
the client (since librbd will use those for the parameters). The
keyring must contain the 'client.admin' key

Then in the config file, use something like
"tap2:tapdisk:rbd:xxx,xvda1,w"  the 'xxx' part is currently ignored
...


Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-19  6:45 ` Pasi Kärkkäinen
  2013-04-19 14:41   ` [Xen-devel] " Sylvain Munaut
@ 2013-04-19 14:41   ` Sylvain Munaut
  2013-08-01  2:12   ` James Harper
  2 siblings, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-04-19 14:41 UTC (permalink / raw)
  To: Pasi Kärkkäinen; +Cc: ceph-devel, xen-devel

> If you have time to write up some lines about steps required to test this,
> that'd be nice, it'll help people to test this stuff.

To quickly test, I compiled the package and just replaced the tapdisk
binary from my "normal" blktap install with the newly compiled one.

Then you need to setup a RBD image named 'test' in the default 'rbd'
pool. You also need to setup a proper ceph.conf and keyring file on
the client (since librbd will use those for the parameters). The
keyring must contain the 'client.admin' key

Then in the config file, use something like
"tap2:tapdisk:rbd:xxx,xvda1,w"  the 'xxx' part is currently ignored
...


Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-19 14:40     ` Bernard Grymonpon
@ 2013-04-23 10:02       ` Sylvain Munaut
  2013-04-23 14:56         ` Bernard Grymonpon
  2013-04-23 16:38         ` Nick Couchman
  0 siblings, 2 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-04-23 10:02 UTC (permalink / raw)
  To: Bernard Grymonpon, Pasi Kärkkäinen, Wido den Hollander
  Cc: ceph-devel

Hi,

> We can test this, but just a couple of lines of input might be needed to get us going with this without digging through all the code.

Ok, so I added proper argument parsing (using the same format as the
qemu rbd driver) now, so it's easier to test.

First off, you need a working blktap setup for your distribution.
So for example, you should be able to use
"tap2:tapdisk:aio:/path/to/image.raw"  as a vbd.

Starting for there you need to :

 - Download and compile the rbd branch of git://github.com/smunaut/blktap.git
   You will need the ceph development files/packages for this ( for
debian librados-dev librbd-dev )

$ git clone git://github.com/smunaut/blktap.git
$ cd blktap
$ git checkout -b rbd origin/rbd
$ ./autogen.sh
$ ./configure
$ make

 - Replace the installed tapdisk binary with the new one

$ sudo cp ./drivers/.libs/tapdisk  /usr/bin/tapdisk

 - Add 'rbd' as supported format in 'xm'
   For some reason 'xm' checks the image format itself before handing
off to the tap-ctl ...

   Edit /usr/lib/xen-4.1/lib/python/xen/xend/server/BlktapController.py
and add 'rbd' in the blktap2_disk_types list at the top.
   (location of file will vary depending on xen version and distribution)

 - Setup a proper /etc/ceph/ceph.conf containing at least the mon addresses
   Also make sure you have a /etc/ceph/keyring with the user key if
you use cephx

Once that's done, you should be able to attach disk to a running VM using :

$ xm block-attach test_vm tap2:tapdisk:rbd:rbd/test xvda2 w

"rbd/test" above is the pool_name/image_name
You can add the same options as with the qemu driver.


Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-23 10:02       ` Sylvain Munaut
@ 2013-04-23 14:56         ` Bernard Grymonpon
  2013-04-23 15:06           ` Sylvain Munaut
  2013-04-23 16:38         ` Nick Couchman
  1 sibling, 1 reply; 99+ messages in thread
From: Bernard Grymonpon @ 2013-04-23 14:56 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org; +Cc: Sylvain Munaut

Hi,

I've quickly tested this, and its works correctly on  a small setup: 
- cluster: 3 machines, 9 osds as cluster, a very old ceph release (was a testsetup i had laying around) 
- client host dom0: debian wheezy 3.2 kernel, xen 4.1 from debian, blktap dkms from debian
- client dom0: a simple quick debootstrap, and a low amount of memory to bypass buffers
- all wired together through a simple gig network, nothing fancy
- no tuning at all at any level
- compiled as indicated below

Some quick stupid benches below:

Bonnie (ext3)

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
rbdtest        300M   274  98 17452   4  8081   1   727  99 94701   9 649.7   8
Latency             68696us    3419ms    1712ms   12597us   14505us     823ms
Version  1.96       ------Sequential Create------ --------Random Create--------
rbdtest             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 20866  40 +++++ +++ 25937  34 18965  36 +++++ +++ 19078  25
Latency             17850us     661us     670us   15308us      23us      73us
1.96,1.96,rbdtest,1,1366725798,300M,,274,98,17452,4,8081,1,727,99,94701,9,649.7,8,16,,,,,20866,40,+++++,+++,25937,34,18965,36,+++++,+++,19078,25,68696us,3419ms,1712ms,12597us,14505us,823ms,17850us,661us,670us,15308us,23us,73us

Iozone (snippet, this is about the performance I get at the other sizes also, except read performance seems to drop after a while)

                                                            random  random    bkwd   record   stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
           65536      64   40598    9072    44273    83811   45508   35023   68808  3334003   130052    51170    14997   43470    81707
           65536     128   37153   13131    86507    92858   84661   41665   16636  3588446   116051    27582    39906   35435    57426
           65536     256   24175   45743    18685    97440  144667   39326   27777  3442738   200095    47460    21743   79367   107165
           65536     512   42953   28775    95780    54974  215821   36543   32774  3413877   343260    40613    20087  980947  2697292
           65536    1024   43360   47328  1709920  5051713 5042816   38018   88782  3489856  5024565    47151    45020 5036533  5114787
           65536    2048   13168   39873  4982578  5113550 5100456   28686 4993258  3109063  5024565    14277    50709 4971224  5085452
           65536    4096   12283   19615   299031  4403132 4444640   30234 4400735  2835581  4998524    53654    48808 4359487  4431741
           65536    8192   51552   12918  3921516  3938033 3951790   51662 3914925  2413309  3919335    58271    41831 3910525  3942947
           65536   16384   46308   10670  3955601  3976199 3959418   32890 3945608    19116  3829833    67583    45037 3916152  3874477

Thx for the work you've put in this! Seems to work nicely at first glance, but I'll do some more tests later on (with a more recent ceph cluster)... 

Rgds,
Bernard
Openminds 

On 23 Apr 2013, at 12:02, Sylvain Munaut <s.munaut@whatever-company.com> wrote:

> Hi,
> 
>> We can test this, but just a couple of lines of input might be needed to get us going with this without digging through all the code.
> 
> Ok, so I added proper argument parsing (using the same format as the
> qemu rbd driver) now, so it's easier to test.
> 
> First off, you need a working blktap setup for your distribution.
> So for example, you should be able to use
> "tap2:tapdisk:aio:/path/to/image.raw"  as a vbd.
> 
> Starting for there you need to :
> 
> - Download and compile the rbd branch of git://github.com/smunaut/blktap.git
>   You will need the ceph development files/packages for this ( for
> debian librados-dev librbd-dev )
> 
> $ git clone git://github.com/smunaut/blktap.git
> $ cd blktap
> $ git checkout -b rbd origin/rbd
> $ ./autogen.sh
> $ ./configure
> $ make
> 
> - Replace the installed tapdisk binary with the new one
> 
> $ sudo cp ./drivers/.libs/tapdisk  /usr/bin/tapdisk
> 
> - Add 'rbd' as supported format in 'xm'
>   For some reason 'xm' checks the image format itself before handing
> off to the tap-ctl ...
> 
>   Edit /usr/lib/xen-4.1/lib/python/xen/xend/server/BlktapController.py
> and add 'rbd' in the blktap2_disk_types list at the top.
>   (location of file will vary depending on xen version and distribution)
> 
> - Setup a proper /etc/ceph/ceph.conf containing at least the mon addresses
>   Also make sure you have a /etc/ceph/keyring with the user key if
> you use cephx
> 
> Once that's done, you should be able to attach disk to a running VM using :
> 
> $ xm block-attach test_vm tap2:tapdisk:rbd:rbd/test xvda2 w
> 
> "rbd/test" above is the pool_name/image_name
> You can add the same options as with the qemu driver.
> 
> 
> Cheers,
> 
>    Sylvain
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-23 14:56         ` Bernard Grymonpon
@ 2013-04-23 15:06           ` Sylvain Munaut
  2013-04-23 19:13             ` Bernard Grymonpon
  0 siblings, 1 reply; 99+ messages in thread
From: Sylvain Munaut @ 2013-04-23 15:06 UTC (permalink / raw)
  To: Bernard Grymonpon; +Cc: ceph-devel@vger.kernel.org

Hi,

> - client dom0: a simple quick debootstrap, and a low amount of memory to bypass buffers

I assume you meant domU ?
You ran those tests in a VM right ?

> Thx for the work you've put in this! Seems to work nicely at first glance, but I'll do some more tests later on (with a more recent ceph cluster)...

Thanks for testing. When you do more tests, it'd be interesting to
compare the kernel driver with the blktap driver.

I've actually identified a bottleneck caused the Xen IO ring splitting
large requests into small 44k chunks, which tends to lower RBD
performance a lot ... I'l still investigating possible solutions to
that.

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-23 10:02       ` Sylvain Munaut
  2013-04-23 14:56         ` Bernard Grymonpon
@ 2013-04-23 16:38         ` Nick Couchman
  2013-04-23 18:51           ` Sylvain Munaut
  1 sibling, 1 reply; 99+ messages in thread
From: Nick Couchman @ 2013-04-23 16:38 UTC (permalink / raw)
  To: Wido den Hollander, pasik, Bernard Grymonpon, Sylvain Munaut; +Cc: ceph-devel

(Apologies in advance, this is somewhat off-topic from ceph-devel...)
> 
> First off, you need a working blktap setup for your distribution.
> So for example, you should be able to use
> "tap2:tapdisk:aio:/path/to/image.raw"  as a vbd.

So, this works perfectly fine *before* replacing my /usr/sbin/tapdisk2 binary with the one built from the git repo.

> 
> Starting for there you need to :
> 
>  - Download and compile the rbd branch of git://github.com/smunaut/blktap.git
>    You will need the ceph development files/packages for this ( for
> debian librados-dev librbd-dev )
> 
> $ git clone git://github.com/smunaut/blktap.git
> $ cd blktap
> $ git checkout -b rbd origin/rbd
> $ ./autogen.sh
> $ ./configure
> $ make

This goes just fine, no issues here.

> 
>  - Replace the installed tapdisk binary with the new one
> 
> $ sudo cp ./drivers/.libs/tapdisk  /usr/bin/tapdisk

My distro (openSuSE 12.1) has /usr/sbin/tapdisk for original tapdisk v1, and /usr/sbin/tapdisk2 for version 2 stuff.  I'm replacing /usr/sbin/tapdisk2.

> 
>  - Add 'rbd' as supported format in 'xm'
>    For some reason 'xm' checks the image format itself before handing
> off to the tap-ctl ...
> 
>    Edit /usr/lib/xen-4.1/lib/python/xen/xend/server/BlktapController.py
> and add 'rbd' in the blktap2_disk_types list at the top.
>    (location of file will vary depending on xen version and distribution)

Fixed this.

> 
>  - Setup a proper /etc/ceph/ceph.conf containing at least the mon addresses
>    Also make sure you have a /etc/ceph/keyring with the user key if
> you use cephx
> 
> Once that's done, you should be able to attach disk to a running VM using :
> 
> $ xm block-attach test_vm tap2:tapdisk:rbd:rbd/test xvda2 w
> 

After replacing the /usr/sbin/tapdisk2 binary with the newly-built one from the git repo, I'm unable to attach RBD *or* RAW images to a VM.  Here's the error I get upon attempting to start the VM:

Error: ('create', '-aaio:/tmp/small.img') failed (1280  )

This is attempting to just use "tap2:aio" for the image format ("tap2:tapdisk:aio" fails identically, as does "tap2:rbd:rbd/test").  In my /var/log/messages file, I see the following:
Apr 23 10:34:51 se004922 kernel: [ 6328.409708] blktap_control_allocate_tap: allocated tap ffff8807a4898800
Apr 23 10:34:51 se004922 tapdisk[32464]: tapdisk-control: init, 10 x 4k buffers
Apr 23 10:34:51 se004922 tapdisk[32464]: I/O queue driver: lio
Apr 23 10:34:51 se004922 tapdisk[32464]: tapdisk-log: started, level 0
Apr 23 10:34:51 se004922 tapdisk[32464]: nbd: Set up local unix domain socket on path '/var/run/blktap-control/nbdclient32464'
Apr 23 10:34:53 se004922 tapdisk[32464]: ERROR: errno -5 at tapdisk_control_read_message: failure reading message at offset 280/536
Apr 23 10:34:53 se004922 tap-ctl: tap-err:tap_ctl_read_message: failure reading message
Apr 23 10:34:53 se004922 tap-ctl: tap-err:tap_ctl_send_and_receive: failed to receive 'unknown' message

That "nbd" portion is particularly puzzling - why would it try to start nbd when I've specified a simple tap2:aio image?

-Nick



--------

This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-23 16:38         ` Nick Couchman
@ 2013-04-23 18:51           ` Sylvain Munaut
  2013-04-23 20:09             ` Nick Couchman
  0 siblings, 1 reply; 99+ messages in thread
From: Sylvain Munaut @ 2013-04-23 18:51 UTC (permalink / raw)
  To: Nick Couchman; +Cc: Wido den Hollander, pasik, Bernard Grymonpon, ceph-devel

Hi,

> My distro (openSuSE 12.1) has /usr/sbin/tapdisk for original tapdisk v1, and /usr/sbin/tapdisk2 for version 2 stuff.  I'm replacing /usr/sbin/tapdisk2.

Mm, do you know where I could find the source for those tapdisk binaries ?


> This is attempting to just use "tap2:aio" for the image format ("tap2:tapdisk:aio" fails identically, as does "tap2:rbd:rbd/test").  In my /var/log/messages file, I see the following:
> Apr 23 10:34:51 se004922 kernel: [ 6328.409708] blktap_control_allocate_tap: allocated tap ffff8807a4898800
> Apr 23 10:34:51 se004922 tapdisk[32464]: tapdisk-control: init, 10 x 4k buffers
> Apr 23 10:34:51 se004922 tapdisk[32464]: I/O queue driver: lio
> Apr 23 10:34:51 se004922 tapdisk[32464]: tapdisk-log: started, level 0
> Apr 23 10:34:51 se004922 tapdisk[32464]: nbd: Set up local unix domain socket on path '/var/run/blktap-control/nbdclient32464'
> Apr 23 10:34:53 se004922 tapdisk[32464]: ERROR: errno -5 at tapdisk_control_read_message: failure reading message at offset 280/536
> Apr 23 10:34:53 se004922 tap-ctl: tap-err:tap_ctl_read_message: failure reading message
> Apr 23 10:34:53 se004922 tap-ctl: tap-err:tap_ctl_send_and_receive: failed to receive 'unknown' message
>
> That "nbd" portion is particularly puzzling - why would it try to start nbd when I've specified a simple tap2:aio image?

The tree I used seem to init some of the NBD logic in all cases (nbd
is integrated in the core rather than purely in the driver).

My guess is that the tree I used is close enough to the debian source
for it to work with just switching tapdisk, but different from
opensuse package.
Can you try to ./configure --prefix=/usr  and then make install ?
This will replace all the binaries including tap-ctl. If they're in a
distinct package, you can pre-uninstall the blktap-utils from your
distro.
'

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-23 15:06           ` Sylvain Munaut
@ 2013-04-23 19:13             ` Bernard Grymonpon
  0 siblings, 0 replies; 99+ messages in thread
From: Bernard Grymonpon @ 2013-04-23 19:13 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel@vger.kernel.org



On 23 Apr 2013, at 17:06, Sylvain Munaut <s.munaut@whatever-company.com> wrote:

> Hi,
> 
>> - client dom0: a simple quick debootstrap, and a low amount of memory to bypass buffers
> 
> I assume you meant domU ?
> You ran those tests in a VM right ?

Right! DomU indeed, not dom0. The benchmarks are done on /dev/xvdb, being the rbd device. I was able to block-attach/detach them also on the dom0, but that is trivial once it works.

> 
>> Thx for the work you've put in this! Seems to work nicely at first glance, but I'll do some more tests later on (with a more recent ceph cluster)...
> 
> Thanks for testing. When you do more tests, it'd be interesting to
> compare the kernel driver with the blktap driver.

I'll be upgrading my testcluster first to the latest stable ceph release, and then run some endurance-tests against the setup. But that won't happen before mid next week.

> 
> I've actually identified a bottleneck caused the Xen IO ring splitting
> large requests into small 44k chunks, which tends to lower RBD
> performance a lot ... I'l still investigating possible solutions to
> that.

Feel free to bug me once it is fixed, if you want some testing.

Rgds,
Bernard
Openminds BVBA

> 
> Cheers,
> 
>    Sylvain
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-23 18:51           ` Sylvain Munaut
@ 2013-04-23 20:09             ` Nick Couchman
  2013-04-26 13:07               ` Sylvain Munaut
  0 siblings, 1 reply; 99+ messages in thread
From: Nick Couchman @ 2013-04-23 20:09 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: Wido den Hollander, pasik, Bernard Grymonpon, ceph-devel

>>> On 2013/04/23 at 12:51, Sylvain Munaut <s.munaut@whatever-company.com> wrote: 
> Hi,
> 
>> My distro (openSuSE 12.1) has /usr/sbin/tapdisk for original tapdisk v1, and 
> /usr/sbin/tapdisk2 for version 2 stuff.  I'm replacing /usr/sbin/tapdisk2.
> 
> Mm, do you know where I could find the source for those tapdisk binaries ?

It looks to me like openSuSE is just using the blktap stuff included with the Xen source code.  If I grab the xen 4.1.4 tarball from xen.org and unpack it, go to tools, there are two directories "blktap" and "blktap2" under that directory that generate the tapdisk and tapdisk2 binaries, along with several other tapdisk-related binaries.  openSuSE uses this with a few minor patches.

> 
> 
>> This is attempting to just use "tap2:aio" for the image format 
> ("tap2:tapdisk:aio" fails identically, as does "tap2:rbd:rbd/test").  In my 
> /var/log/messages file, I see the following:
>> Apr 23 10:34:51 se004922 kernel: [ 6328.409708] blktap_control_allocate_tap: 
> allocated tap ffff8807a4898800
>> Apr 23 10:34:51 se004922 tapdisk[32464]: tapdisk-control: init, 10 x 4k 
> buffers
>> Apr 23 10:34:51 se004922 tapdisk[32464]: I/O queue driver: lio
>> Apr 23 10:34:51 se004922 tapdisk[32464]: tapdisk-log: started, level 0
>> Apr 23 10:34:51 se004922 tapdisk[32464]: nbd: Set up local unix domain 
> socket on path '/var/run/blktap-control/nbdclient32464'
>> Apr 23 10:34:53 se004922 tapdisk[32464]: ERROR: errno -5 at 
> tapdisk_control_read_message: failure reading message at offset 280/536
>> Apr 23 10:34:53 se004922 tap-ctl: tap-err:tap_ctl_read_message: failure 
> reading message
>> Apr 23 10:34:53 se004922 tap-ctl: tap-err:tap_ctl_send_and_receive: failed to 
> receive 'unknown' message
>>
>> That "nbd" portion is particularly puzzling - why would it try to start nbd 
> when I've specified a simple tap2:aio image?
> 
> The tree I used seem to init some of the NBD logic in all cases (nbd
> is integrated in the core rather than purely in the driver).

Okay, makes sense.

> 
> My guess is that the tree I used is close enough to the debian source
> for it to work with just switching tapdisk, but different from
> opensuse package.
> Can you try to ./configure --prefix=/usr  and then make install ?
> This will replace all the binaries including tap-ctl. If they're in a
> distinct package, you can pre-uninstall the blktap-utils from your
> distro.
> '
> 

I tried doing the --prefix=/usr, but did not do a make install...I'll have to give that a shot and see what happens.  I need to get a test system set up here, since the one I was working on is someone else's workstation and I don't want to completely destroy it :-).

-Nick



--------

This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-23 20:09             ` Nick Couchman
@ 2013-04-26 13:07               ` Sylvain Munaut
  2013-04-26 15:51                 ` Sage Weil
  0 siblings, 1 reply; 99+ messages in thread
From: Sylvain Munaut @ 2013-04-26 13:07 UTC (permalink / raw)
  To: Nick Couchman, Wido den Hollander, pasik, Bernard Grymonpon; +Cc: ceph-devel

Hi,


I just wanted to mention that I implemented a simple request merging
strategy to counter-act the request splitting done by the Xen block if
protocol.

The results are pretty good. When comparing to using the rbd kernel
module, I can now get 2-4x better write performance and 2x read
performance.


Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-26 13:07               ` Sylvain Munaut
@ 2013-04-26 15:51                 ` Sage Weil
  2013-04-26 17:10                   ` Sylvain Munaut
  0 siblings, 1 reply; 99+ messages in thread
From: Sage Weil @ 2013-04-26 15:51 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Nick Couchman, Wido den Hollander, pasik, Bernard Grymonpon,
	ceph-devel

On Fri, 26 Apr 2013, Sylvain Munaut wrote:
> Hi,
> 
> 
> I just wanted to mention that I implemented a simple request merging
> strategy to counter-act the request splitting done by the Xen block if
> protocol.
> 
> The results are pretty good. When comparing to using the rbd kernel
> module, I can now get 2-4x better write performance and 2x read
> performance.

Is this in the blktap layer or in librbd?  FWIW, when rbd cache = true, 
the writes will get merged by the cache and written out in large extents 
on flush.

sage

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-26 15:51                 ` Sage Weil
@ 2013-04-26 17:10                   ` Sylvain Munaut
  0 siblings, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-04-26 17:10 UTC (permalink / raw)
  To: Sage Weil
  Cc: Nick Couchman, Wido den Hollander, pasik, Bernard Grymonpon,
	ceph-devel

Hi,

> Is this in the blktap layer or in librbd?  FWIW, when rbd cache = true,
> the writes will get merged by the cache and written out in large extents
> on flush.

In the blktap layer.

I don't have the cache enabled because FLUSH request from the VM are
not forwarded down to that layer, when you ack a request, it consider
that it's been written to disk.

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
@ 2013-06-21  7:21 Nathan O'Sullivan
  2013-06-21 11:21 ` Sylvain Munaut
  0 siblings, 1 reply; 99+ messages in thread
From: Nathan O'Sullivan @ 2013-06-21  7:21 UTC (permalink / raw)
  To: ceph-devel

I've been testing this on Ubuntu 12.04.02 64-bit with kernel 3.2.0-48 
and ceph 0.61.4

With rbd cache disabled, it works well enough in initial testing.

However when rbd cache is enabled with:
[client]
rbd_cache = true

the tapdisk process crashes if I do this in the domU:
dd if=/dev/xvda bs=1M > /dev/null


I grabbed the tapdisk stacktrace with gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f1677186700 (LWP 6507)]
0x00007f167d21857c in free () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f167d21857c in free () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f167daab84b in 
ceph::buffer::raw_posix_aligned::~raw_posix_aligned() () from 
/usr/lib/librados.so.2
#2  0x00007f167daa6f6e in ceph::buffer::ptr::release() () from 
/usr/lib/librados.so.2
#3  0x00007f167d5711c7 in std::_List_base<ceph::buffer::ptr, 
std::allocator<ceph::buffer::ptr> >::_M_clear() () from /usr/lib/librbd.so.1
#4  0x00007f167d5a8ffe in ObjectCacher::trim(long, long) () from 
/usr/lib/librbd.so.1
#5  0x00007f167d5b7e60 in ObjectCacher::_readx(ObjectCacher::OSDRead*, 
ObjectCacher::ObjectSet*, Context*, bool) () from /usr/lib/librbd.so.1
#6  0x00007f167d5bd620 in ObjectCacher::C_RetryRead::finish(int) () from 
/usr/lib/librbd.so.1
#7  0x00007f167d57281a in Context::complete(int) () from 
/usr/lib/librbd.so.1
#8  0x00007f167d5b8f65 in finish_contexts(CephContext*, 
std::list<Context*, std::allocator<Context*> >&, int) () from 
/usr/lib/librbd.so.1
#9  0x00007f167d5ae705 in ObjectCacher::bh_read_finish(long, sobject_t, 
long, unsigned long, ceph::buffer::list&, int, bool) ()
    from /usr/lib/librbd.so.1
#10 0x00007f167d5bc32f in ObjectCacher::C_ReadFinish::finish(int) () 
from /usr/lib/librbd.so.1
#11 0x00007f167d57281a in Context::complete(int) () from 
/usr/lib/librbd.so.1
#12 0x00007f167d5a31f5 in librbd::C_Request::finish(int) () from 
/usr/lib/librbd.so.1
#13 0x00007f167d5a1c14 in librbd::context_cb(void*, void*) () from 
/usr/lib/librbd.so.1
#14 0x00007f167d90f56d in librados::C_AioComplete::finish(int) () from 
/usr/lib/librados.so.2
#15 0x00007f167d97bb00 in Finisher::finisher_thread_entry() () from 
/usr/lib/librados.so.2
#16 0x00007f167c812e9a in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#17 0x00007f167d288ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x0000000000000000 in ?? ()


Regards
Nathan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-06-21  7:21 [Xen-devel] " Nathan O'Sullivan
@ 2013-06-21 11:21 ` Sylvain Munaut
  2013-07-01  9:57   ` Sylvain Munaut
  0 siblings, 1 reply; 99+ messages in thread
From: Sylvain Munaut @ 2013-06-21 11:21 UTC (permalink / raw)
  To: Nathan O'Sullivan; +Cc: ceph-devel

Hi,

> I've been testing this on Ubuntu 12.04.02 64-bit with kernel 3.2.0-48 and
> ceph 0.61.4

Thanks for testing :)

> However when rbd cache is enabled with:
> [client]
> rbd_cache = true
>
> the tapdisk process crashes if I do this in the domU:
> dd if=/dev/xvda bs=1M > /dev/null

Interesting. I'm currently away for I'll try to setup a test and see
if I can reproduce the issue locally.

I never really tried with the cache enabled.

Cheers,

   Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-06-21 11:21 ` Sylvain Munaut
@ 2013-07-01  9:57   ` Sylvain Munaut
  2013-07-02  3:32     ` Nathan O'Sullivan
  0 siblings, 1 reply; 99+ messages in thread
From: Sylvain Munaut @ 2013-07-01  9:57 UTC (permalink / raw)
  To: Nathan O'Sullivan; +Cc: ceph-devel

Hi again,

>> However when rbd cache is enabled with:
>> [client]
>> rbd_cache = true
>>
>> the tapdisk process crashes if I do this in the domU:
>> dd if=/dev/xvda bs=1M > /dev/null

I tested this locally and couldn't reproduce the issue.

Doing reads doesn't do anything bad AFAICT.
Doing writes OTOH seems to leak memory (or at least use much more
memory than the configured cache size).

I also rechecked the code and I don't see anything wrong with it.
AFAICT with or without cache shouldn't change anything so the issue
might be in librbd itself.

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-07-01  9:57   ` Sylvain Munaut
@ 2013-07-02  3:32     ` Nathan O'Sullivan
  0 siblings, 0 replies; 99+ messages in thread
From: Nathan O'Sullivan @ 2013-07-02  3:32 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel

I've installed debug symbols, perhaps that will give a better idea what 
is going on?

#0  __GI___libc_free (mem=0x7f5160650000) at malloc.c:2970
#1  0x00007f515f3ac84b in ~raw_posix_aligned (this=0x7f513c418f20, 
__in_chrg=<optimised out>) at common/buffer.cc:152
#2  ceph::buffer::raw_posix_aligned::~raw_posix_aligned (this=<optimised 
out>, __in_chrg=<optimised out>) at common/buffer.cc:155
#3  0x00007f515f3a7f6e in ceph::buffer::ptr::release 
(this=0x7f513801d600) at common/buffer.cc:328
#4  0x00007f515ee721c7 in ~ptr (this=0x7f513801d600, 
__in_chrg=<optimised out>) at ./include/buffer.h:159
#5  destroy (__p=0x7f513801d600, this=<optimised out>) at 
/usr/include/c++/4.6/ext/new_allocator.h:118
#6  std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> 
 >::_M_clear (this=0x15e3908) at /usr/include/c++/4.6/bits/list.tcc:78
#7  0x00007f515eea9ffe in ~_List_base (this=0x15e3908, 
__in_chrg=<optimised out>) at /usr/include/c++/4.6/bits/stl_list.h:372
#8  ~list (this=0x15e3908, __in_chrg=<optimised out>) at 
/usr/include/c++/4.6/bits/stl_list.h:429
#9  ~list (this=0x15e3908, __in_chrg=<optimised out>) at 
./include/buffer.h:304
#10 ~BufferHead (this=0x15e38c0, __in_chrg=<optimised out>) at 
osdc/ObjectCacher.h:84
#11 ObjectCacher::trim (this=0x1594a00, max_bytes=33554432, max_ob=42) 
at osdc/ObjectCacher.cc:949
#12 0x00007f515eeb8e60 in ObjectCacher::_readx (this=<optimised out>, 
rd=0x15f1f70, oset=0x1595110, onfinish=0x1591280, external_call=false) 
at osdc/ObjectCacher.cc:1240
#13 0x00007f515eebe620 in ObjectCacher::C_RetryRead::finish 
(this=0x15c3c30, r=<optimised out>) at osdc/ObjectCacher.h:554
#14 0x00007f515ee7381a in Context::complete (this=0x15c3c30, 
r=<optimised out>) at ./include/Context.h:41
#15 0x00007f515eeb9f65 in finish_contexts (cct=0x155cc30, finished=..., 
result=0) at ./include/Context.h:78
#16 0x00007f515eeaf705 in ObjectCacher::bh_read_finish (this=<optimised 
out>, poolid=<optimised out>, oid=..., start=983040, length=131072, 
bl=..., r=0, trust_enoent=true)
     at osdc/ObjectCacher.cc:773
#17 0x00007f515eebd32f in ObjectCacher::C_ReadFinish::finish 
(this=0x15ced30, r=0) at osdc/ObjectCacher.h:478
#18 0x00007f515ee7381a in Context::complete (this=0x15ced30, 
r=<optimised out>) at ./include/Context.h:41
#19 0x00007f515eea41f5 in librbd::C_Request::finish (this=0x159dfd0, 
r=0) at librbd/LibrbdWriteback.cc:55
#20 0x00007f515eea2c14 in librbd::context_cb (c=<optimised out>, 
arg=<optimised out>) at librbd/LibrbdWriteback.cc:35
#21 0x00007f515f21056d in librados::C_AioComplete::finish 
(this=<optimised out>, r=<optimised out>) at 
./librados/AioCompletionImpl.h:171
#22 0x00007f515f27cb00 in Finisher::finisher_thread_entry 
(this=0x1576d98) at common/Finisher.cc:56
#23 0x00007f515e113e9a in start_thread (arg=0x7f5158a87700) at 
pthread_create.c:308
#24 0x00007f515eb89ccd in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#25 0x0000000000000000 in ?? ()



On 1/07/2013 7:57 PM, Sylvain Munaut wrote:
> Hi again,
>
>>> However when rbd cache is enabled with:
>>> [client]
>>> rbd_cache = true
>>>
>>> the tapdisk process crashes if I do this in the domU:
>>> dd if=/dev/xvda bs=1M > /dev/null
> I tested this locally and couldn't reproduce the issue.
>
> Doing reads doesn't do anything bad AFAICT.
> Doing writes OTOH seems to leak memory (or at least use much more
> memory than the configured cache size).
>
> I also rechecked the code and I don't see anything wrong with it.
> AFAICT with or without cache shouldn't change anything so the issue
> might be in librbd itself.
>
> Cheers,
>
>      Sylvain


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-19  6:45 ` Pasi Kärkkäinen
  2013-04-19 14:41   ` [Xen-devel] " Sylvain Munaut
  2013-04-19 14:41   ` Sylvain Munaut
@ 2013-08-01  2:12   ` James Harper
  2013-08-05  9:41     ` Sylvain Munaut
  2013-08-05  9:41     ` [Xen-devel] " Sylvain Munaut
  2 siblings, 2 replies; 99+ messages in thread
From: James Harper @ 2013-08-01  2:12 UTC (permalink / raw)
  To: Pasi Kärkkäinen, Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

I'm about to start trying this out. Has anything changed since this email http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg13984.html ?

Thanks

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-01  2:12   ` James Harper
  2013-08-05  9:41     ` Sylvain Munaut
@ 2013-08-05  9:41     ` Sylvain Munaut
  2013-08-05  9:45       ` James Harper
                         ` (3 more replies)
  1 sibling, 4 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-05  9:41 UTC (permalink / raw)
  To: James Harper
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

Hi,


Yes the procedure didn't change.

If you're on debian I could also sent your prebuilt .deb for blktap
and for a patched xen version that includes userspace RBD support.

If you have any issue, I can be found on ceph's IRC under 'tnt' nick.


Cheers,

   Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-01  2:12   ` James Harper
@ 2013-08-05  9:41     ` Sylvain Munaut
  2013-08-05  9:41     ` [Xen-devel] " Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-05  9:41 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

Hi,


Yes the procedure didn't change.

If you're on debian I could also sent your prebuilt .deb for blktap
and for a patched xen version that includes userspace RBD support.

If you have any issue, I can be found on ceph's IRC under 'tnt' nick.


Cheers,

   Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05  9:41     ` [Xen-devel] " Sylvain Munaut
  2013-08-05  9:45       ` James Harper
@ 2013-08-05  9:45       ` James Harper
  2013-08-05 11:01         ` Sylvain Munaut
  2013-08-05 11:01         ` Sylvain Munaut
  2013-08-09  0:12       ` James Harper
  2013-08-09  0:12       ` [Xen-devel] " James Harper
  3 siblings, 2 replies; 99+ messages in thread
From: James Harper @ 2013-08-05  9:45 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

> 
> Yes the procedure didn't change.
> 
> If you're on debian I could also sent your prebuilt .deb for blktap
> and for a patched xen version that includes userspace RBD support.
> 

It's working great so far. I just pulled the source and built it then copied blktap in.

For some reason I already had a tapdisk in /usr/sbin, as well as the one in /usr/bin, which confused the issue for a while. I must have installed something manually but I don't remember what.

Xen also includes tap-ctl:

blktap-utils: /usr/sbin/tap-ctl
xen-utils-4.1: /usr/lib/xen-4.1/bin/tap-ctl

and I removed the one from xen and linked it to the one in /usr/sbin. I did that before I found the other tapdisk in /usr/sbin so I'm not sure if that step was necessary.

Any chance this will be rolled into the main blktap sources?

> If you have any issue, I can be found on ceph's IRC under 'tnt' nick.
> 

Even though I have been on the internet since 94, I never got the hang of IRC... always found the stream of information a little overwhelming.

Thanks

James


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05  9:41     ` [Xen-devel] " Sylvain Munaut
@ 2013-08-05  9:45       ` James Harper
  2013-08-05  9:45       ` [Xen-devel] " James Harper
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-05  9:45 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> 
> Yes the procedure didn't change.
> 
> If you're on debian I could also sent your prebuilt .deb for blktap
> and for a patched xen version that includes userspace RBD support.
> 

It's working great so far. I just pulled the source and built it then copied blktap in.

For some reason I already had a tapdisk in /usr/sbin, as well as the one in /usr/bin, which confused the issue for a while. I must have installed something manually but I don't remember what.

Xen also includes tap-ctl:

blktap-utils: /usr/sbin/tap-ctl
xen-utils-4.1: /usr/lib/xen-4.1/bin/tap-ctl

and I removed the one from xen and linked it to the one in /usr/sbin. I did that before I found the other tapdisk in /usr/sbin so I'm not sure if that step was necessary.

Any chance this will be rolled into the main blktap sources?

> If you have any issue, I can be found on ceph's IRC under 'tnt' nick.
> 

Even though I have been on the internet since 94, I never got the hang of IRC... always found the stream of information a little overwhelming.

Thanks

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05  9:45       ` [Xen-devel] " James Harper
@ 2013-08-05 11:01         ` Sylvain Munaut
  2013-08-05 11:03           ` James Harper
                             ` (2 more replies)
  2013-08-05 11:01         ` Sylvain Munaut
  1 sibling, 3 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-05 11:01 UTC (permalink / raw)
  To: James Harper
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

Hi,


> It's working great so far. I just pulled the source and built it then copied blktap in.

Good to hear :)

I've been using it more and more recently and it'll been good for me
too, even with live migrations.


> For some reason I already had a tapdisk in /usr/sbin, as well as the one in /usr/bin, which confused the issue for a while. I must have installed something manually but I don't remember what.

What distribution are you using ?


> Any chance this will be rolled into the main blktap sources?

I'd like to ... but I ave no idea how or even who to contact for that
... blktap is so fragmented ...

You have blktap2 which is in the man Xen tree. But that's not what's
used in debian (it's not installed / compiled)

You have the so called blktap2.5 which is what's on github and what I
have based my stuff on. It's also what's shipped with debian as
blktap-utils I think.
I also think Citrix have their own version based off blktap2.5 as well.

And soon there will be blktap3 in the official Xen tree.

I want to at least get it merged in blktap3 but since that code is not
ready (or even merged) yet, it's a bit early for that. That's also
probably Xen 4.4 or Xen 4.5 stuff and so won't hit debian for a while.


Cheers,

   Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05  9:45       ` [Xen-devel] " James Harper
  2013-08-05 11:01         ` Sylvain Munaut
@ 2013-08-05 11:01         ` Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-05 11:01 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

Hi,


> It's working great so far. I just pulled the source and built it then copied blktap in.

Good to hear :)

I've been using it more and more recently and it'll been good for me
too, even with live migrations.


> For some reason I already had a tapdisk in /usr/sbin, as well as the one in /usr/bin, which confused the issue for a while. I must have installed something manually but I don't remember what.

What distribution are you using ?


> Any chance this will be rolled into the main blktap sources?

I'd like to ... but I ave no idea how or even who to contact for that
... blktap is so fragmented ...

You have blktap2 which is in the man Xen tree. But that's not what's
used in debian (it's not installed / compiled)

You have the so called blktap2.5 which is what's on github and what I
have based my stuff on. It's also what's shipped with debian as
blktap-utils I think.
I also think Citrix have their own version based off blktap2.5 as well.

And soon there will be blktap3 in the official Xen tree.

I want to at least get it merged in blktap3 but since that code is not
ready (or even merged) yet, it's a bit early for that. That's also
probably Xen 4.4 or Xen 4.5 stuff and so won't hit debian for a while.


Cheers,

   Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 11:01         ` Sylvain Munaut
  2013-08-05 11:03           ` James Harper
@ 2013-08-05 11:03           ` James Harper
  2013-08-05 11:12           ` Pasi Kärkkäinen
  2 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-05 11:03 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

> 
> > For some reason I already had a tapdisk in /usr/sbin, as well as the one in
> > /usr/bin, which confused the issue for a while. I must have installed
> > something manually but I don't remember what.
> 
> What distribution are you using ?
> 

Debian Wheezy

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 11:01         ` Sylvain Munaut
@ 2013-08-05 11:03           ` James Harper
  2013-08-05 11:03           ` [Xen-devel] " James Harper
  2013-08-05 11:12           ` Pasi Kärkkäinen
  2 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-05 11:03 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> 
> > For some reason I already had a tapdisk in /usr/sbin, as well as the one in
> > /usr/bin, which confused the issue for a while. I must have installed
> > something manually but I don't remember what.
> 
> What distribution are you using ?
> 

Debian Wheezy

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 11:01         ` Sylvain Munaut
  2013-08-05 11:03           ` James Harper
  2013-08-05 11:03           ` [Xen-devel] " James Harper
@ 2013-08-05 11:12           ` Pasi Kärkkäinen
  2013-08-05 12:03             ` Sylvain Munaut
  2013-08-05 12:03             ` [Xen-devel] " Sylvain Munaut
  2 siblings, 2 replies; 99+ messages in thread
From: Pasi Kärkkäinen @ 2013-08-05 11:12 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org

On Mon, Aug 05, 2013 at 01:01:35PM +0200, Sylvain Munaut wrote:
> 
> > Any chance this will be rolled into the main blktap sources?
> 
> I'd like to ... but I ave no idea how or even who to contact for that
> ... blktap is so fragmented ...
> 
> You have blktap2 which is in the man Xen tree. But that's not what's
> used in debian (it's not installed / compiled)
> 
> You have the so called blktap2.5 which is what's on github and what I
> have based my stuff on. It's also what's shipped with debian as
> blktap-utils I think.
> I also think Citrix have their own version based off blktap2.5 as well.
>

Yep, XenServer is using blktap2.5. 

Also the Centos-6 Xen packages have blktap2.5 patched in.
 
> And soon there will be blktap3 in the official Xen tree.
> 
> I want to at least get it merged in blktap3 but since that code is not
> ready (or even merged) yet, it's a bit early for that. That's also
> probably Xen 4.4 or Xen 4.5 stuff and so won't hit debian for a while.
> 

I think I saw an announcement recently on xen-devel that blktap3 development has been stopped.. 


-- Pasi

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 11:12           ` Pasi Kärkkäinen
  2013-08-05 12:03             ` Sylvain Munaut
@ 2013-08-05 12:03             ` Sylvain Munaut
  2013-08-05 13:35               ` George Dunlap
  2013-08-05 13:35               ` George Dunlap
  1 sibling, 2 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-05 12:03 UTC (permalink / raw)
  To: Pasi Kärkkäinen
  Cc: James Harper, ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> I think I saw an announcement recently on xen-devel that blktap3 development has been stopped..

Oh :(

In the mail it speaks about QEMU but is it possible to use the QEMU
driver model when booting PV domains ? (and not PVHVM).

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 11:12           ` Pasi Kärkkäinen
@ 2013-08-05 12:03             ` Sylvain Munaut
  2013-08-05 12:03             ` [Xen-devel] " Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-05 12:03 UTC (permalink / raw)
  To: Pasi Kärkkäinen
  Cc: ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org

> I think I saw an announcement recently on xen-devel that blktap3 development has been stopped..

Oh :(

In the mail it speaks about QEMU but is it possible to use the QEMU
driver model when booting PV domains ? (and not PVHVM).

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 12:03             ` [Xen-devel] " Sylvain Munaut
@ 2013-08-05 13:35               ` George Dunlap
  2013-08-05 13:55                 ` Sylvain Munaut
  2013-08-05 13:55                 ` Sylvain Munaut
  2013-08-05 13:35               ` George Dunlap
  1 sibling, 2 replies; 99+ messages in thread
From: George Dunlap @ 2013-08-05 13:35 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	James Harper, xen-devel@lists.xen.org

On Mon, Aug 5, 2013 at 1:03 PM, Sylvain Munaut
<s.munaut@whatever-company.com> wrote:
>> I think I saw an announcement recently on xen-devel that blktap3 development has been stopped..
>
> Oh :(
>
> In the mail it speaks about QEMU but is it possible to use the QEMU
> driver model when booting PV domains ? (and not PVHVM).

Yes; qemu knows how to be a Xen PV block back-end.

One of the reasons for stopping work on blktap3 (AIUI) was that it
should in theory have performance characteristics similar to blktap3,
and tends to get newer protocols like ceph "for free" (i.e.,
implemented by someone else).

 -George

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 12:03             ` [Xen-devel] " Sylvain Munaut
  2013-08-05 13:35               ` George Dunlap
@ 2013-08-05 13:35               ` George Dunlap
  1 sibling, 0 replies; 99+ messages in thread
From: George Dunlap @ 2013-08-05 13:35 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org

On Mon, Aug 5, 2013 at 1:03 PM, Sylvain Munaut
<s.munaut@whatever-company.com> wrote:
>> I think I saw an announcement recently on xen-devel that blktap3 development has been stopped..
>
> Oh :(
>
> In the mail it speaks about QEMU but is it possible to use the QEMU
> driver model when booting PV domains ? (and not PVHVM).

Yes; qemu knows how to be a Xen PV block back-end.

One of the reasons for stopping work on blktap3 (AIUI) was that it
should in theory have performance characteristics similar to blktap3,
and tends to get newer protocols like ceph "for free" (i.e.,
implemented by someone else).

 -George

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 13:35               ` George Dunlap
@ 2013-08-05 13:55                 ` Sylvain Munaut
  2013-08-05 14:04                   ` George Dunlap
  2013-08-05 14:04                   ` George Dunlap
  2013-08-05 13:55                 ` Sylvain Munaut
  1 sibling, 2 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-05 13:55 UTC (permalink / raw)
  To: George Dunlap
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	James Harper, xen-devel@lists.xen.org

Hi George,


> Yes; qemu knows how to be a Xen PV block back-end.

Very interesting. Is there documentation about this somewhere ?
I had a look some time ago and it was really not very clear.

Things like what Xen version support this. And with which features (
indirect descriptors, persistent grants, discard, flush, ...) and/or
which limitation.


> One of the reasons for stopping work on blktap3 (AIUI) was that it
> should in theory have performance characteristics similar to blktap3,

And did anyone check the theory currently ? :)


> and tends to get newer protocols like ceph "for free" (i.e.,
> implemented by someone else).

Yes I can definitely see the appeal.


Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 13:35               ` George Dunlap
  2013-08-05 13:55                 ` Sylvain Munaut
@ 2013-08-05 13:55                 ` Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-05 13:55 UTC (permalink / raw)
  To: George Dunlap
  Cc: ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org

Hi George,


> Yes; qemu knows how to be a Xen PV block back-end.

Very interesting. Is there documentation about this somewhere ?
I had a look some time ago and it was really not very clear.

Things like what Xen version support this. And with which features (
indirect descriptors, persistent grants, discard, flush, ...) and/or
which limitation.


> One of the reasons for stopping work on blktap3 (AIUI) was that it
> should in theory have performance characteristics similar to blktap3,

And did anyone check the theory currently ? :)


> and tends to get newer protocols like ceph "for free" (i.e.,
> implemented by someone else).

Yes I can definitely see the appeal.


Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 13:55                 ` Sylvain Munaut
@ 2013-08-05 14:04                   ` George Dunlap
  2013-08-05 15:18                     ` Wei Liu
  2013-08-05 15:18                     ` Wei Liu
  2013-08-05 14:04                   ` George Dunlap
  1 sibling, 2 replies; 99+ messages in thread
From: George Dunlap @ 2013-08-05 14:04 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	James Harper, xen-devel@lists.xen.org, Stefano Stabellini,
	Roger Pau Monne, Wei Liu

On 05/08/13 14:55, Sylvain Munaut wrote:
> Hi George,
>
>
>> Yes; qemu knows how to be a Xen PV block back-end.
> Very interesting. Is there documentation about this somewhere ?
> I had a look some time ago and it was really not very clear.
>
> Things like what Xen version support this. And with which features (
> indirect descriptors, persistent grants, discard, flush, ...) and/or
> which limitation.

I don't think this is documented anywhere; you'll need to ask the 
experts.  Stefano? Roger? Wei?

>
>
>> One of the reasons for stopping work on blktap3 (AIUI) was that it
>> should in theory have performance characteristics similar to blktap3,
> And did anyone check the theory currently ? :)

I say "in theory" because they are using the same basic architecture: a 
normal process running in dom0, with no special kernel support.  If 
there were a performance difference, it would be something that should 
(in theory) be able to be optimized.

I don't think we have comparisons between qdisk (which is what we call 
qemu-as-pv-backend in Xen) and blktap3 (and since blktap3 wasn't 
finished they wouldn't mean much anyway); but I think qdisk compares 
reasonably with blkback.

  -George

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 13:55                 ` Sylvain Munaut
  2013-08-05 14:04                   ` George Dunlap
@ 2013-08-05 14:04                   ` George Dunlap
  1 sibling, 0 replies; 99+ messages in thread
From: George Dunlap @ 2013-08-05 14:04 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: James Harper, Wei Liu, Stefano Stabellini,
	xen-devel@lists.xen.org, ceph-devel@vger.kernel.org,
	Roger Pau Monne

On 05/08/13 14:55, Sylvain Munaut wrote:
> Hi George,
>
>
>> Yes; qemu knows how to be a Xen PV block back-end.
> Very interesting. Is there documentation about this somewhere ?
> I had a look some time ago and it was really not very clear.
>
> Things like what Xen version support this. And with which features (
> indirect descriptors, persistent grants, discard, flush, ...) and/or
> which limitation.

I don't think this is documented anywhere; you'll need to ask the 
experts.  Stefano? Roger? Wei?

>
>
>> One of the reasons for stopping work on blktap3 (AIUI) was that it
>> should in theory have performance characteristics similar to blktap3,
> And did anyone check the theory currently ? :)

I say "in theory" because they are using the same basic architecture: a 
normal process running in dom0, with no special kernel support.  If 
there were a performance difference, it would be something that should 
(in theory) be able to be optimized.

I don't think we have comparisons between qdisk (which is what we call 
qemu-as-pv-backend in Xen) and blktap3 (and since blktap3 wasn't 
finished they wouldn't mean much anyway); but I think qdisk compares 
reasonably with blkback.

  -George

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 14:04                   ` George Dunlap
@ 2013-08-05 15:18                     ` Wei Liu
  2013-08-05 15:20                       ` George Dunlap
  2013-08-05 15:20                       ` George Dunlap
  2013-08-05 15:18                     ` Wei Liu
  1 sibling, 2 replies; 99+ messages in thread
From: Wei Liu @ 2013-08-05 15:18 UTC (permalink / raw)
  To: George Dunlap
  Cc: Sylvain Munaut, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org,
	Stefano Stabellini, Roger Pau Monne, Wei Liu

On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:
> On 05/08/13 14:55, Sylvain Munaut wrote:
> >Hi George,
> >
> >
> >>Yes; qemu knows how to be a Xen PV block back-end.
> >Very interesting. Is there documentation about this somewhere ?
> >I had a look some time ago and it was really not very clear.
> >
> >Things like what Xen version support this. And with which features (
> >indirect descriptors, persistent grants, discard, flush, ...) and/or
> >which limitation.
> 
> I don't think this is documented anywhere; you'll need to ask the
> experts.  Stefano? Roger? Wei?
> 

These are Linux features not Xen ones AFAICT. In theory they are not
bound to specific Xen versions.

For the network part I don't think new features depend on any specific
hypercall. However for block Roger and Stefano seem to introduce
new hypercalls for certain features (I might be wrong though).


Wei.

> >
> >
> >>One of the reasons for stopping work on blktap3 (AIUI) was that it
> >>should in theory have performance characteristics similar to blktap3,
> >And did anyone check the theory currently ? :)
> 
> I say "in theory" because they are using the same basic
> architecture: a normal process running in dom0, with no special
> kernel support.  If there were a performance difference, it would be
> something that should (in theory) be able to be optimized.
> 
> I don't think we have comparisons between qdisk (which is what we
> call qemu-as-pv-backend in Xen) and blktap3 (and since blktap3
> wasn't finished they wouldn't mean much anyway); but I think qdisk
> compares reasonably with blkback.
> 
>  -George

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 14:04                   ` George Dunlap
  2013-08-05 15:18                     ` Wei Liu
@ 2013-08-05 15:18                     ` Wei Liu
  1 sibling, 0 replies; 99+ messages in thread
From: Wei Liu @ 2013-08-05 15:18 UTC (permalink / raw)
  To: George Dunlap
  Cc: James Harper, Wei Liu, Stefano Stabellini, Sylvain Munaut,
	xen-devel@lists.xen.org, ceph-devel@vger.kernel.org,
	Roger Pau Monne

On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:
> On 05/08/13 14:55, Sylvain Munaut wrote:
> >Hi George,
> >
> >
> >>Yes; qemu knows how to be a Xen PV block back-end.
> >Very interesting. Is there documentation about this somewhere ?
> >I had a look some time ago and it was really not very clear.
> >
> >Things like what Xen version support this. And with which features (
> >indirect descriptors, persistent grants, discard, flush, ...) and/or
> >which limitation.
> 
> I don't think this is documented anywhere; you'll need to ask the
> experts.  Stefano? Roger? Wei?
> 

These are Linux features not Xen ones AFAICT. In theory they are not
bound to specific Xen versions.

For the network part I don't think new features depend on any specific
hypercall. However for block Roger and Stefano seem to introduce
new hypercalls for certain features (I might be wrong though).


Wei.

> >
> >
> >>One of the reasons for stopping work on blktap3 (AIUI) was that it
> >>should in theory have performance characteristics similar to blktap3,
> >And did anyone check the theory currently ? :)
> 
> I say "in theory" because they are using the same basic
> architecture: a normal process running in dom0, with no special
> kernel support.  If there were a performance difference, it would be
> something that should (in theory) be able to be optimized.
> 
> I don't think we have comparisons between qdisk (which is what we
> call qemu-as-pv-backend in Xen) and blktap3 (and since blktap3
> wasn't finished they wouldn't mean much anyway); but I think qdisk
> compares reasonably with blkback.
> 
>  -George

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 15:18                     ` Wei Liu
@ 2013-08-05 15:20                       ` George Dunlap
  2013-08-05 15:32                         ` Wei Liu
  2013-08-05 15:32                         ` Wei Liu
  2013-08-05 15:20                       ` George Dunlap
  1 sibling, 2 replies; 99+ messages in thread
From: George Dunlap @ 2013-08-05 15:20 UTC (permalink / raw)
  To: Wei Liu
  Cc: Sylvain Munaut, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org,
	Stefano Stabellini, Roger Pau Monne

On 05/08/13 16:18, Wei Liu wrote:
> On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:
>> On 05/08/13 14:55, Sylvain Munaut wrote:
>>> Hi George,
>>>
>>>
>>>> Yes; qemu knows how to be a Xen PV block back-end.
>>> Very interesting. Is there documentation about this somewhere ?
>>> I had a look some time ago and it was really not very clear.
>>>
>>> Things like what Xen version support this. And with which features (
>>> indirect descriptors, persistent grants, discard, flush, ...) and/or
>>> which limitation.
>> I don't think this is documented anywhere; you'll need to ask the
>> experts.  Stefano? Roger? Wei?
>>
> These are Linux features not Xen ones AFAICT. In theory they are not
> bound to specific Xen versions.
>
> For the network part I don't think new features depend on any specific
> hypercall. However for block Roger and Stefano seem to introduce
> new hypercalls for certain features (I might be wrong though).

We're talking about qemu; so the toolstack needs to know how to set up 
qdisk, and I think qdisk would need to be programmed to use, for 
example, persistent grants, yes?

  -G


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 15:18                     ` Wei Liu
  2013-08-05 15:20                       ` George Dunlap
@ 2013-08-05 15:20                       ` George Dunlap
  1 sibling, 0 replies; 99+ messages in thread
From: George Dunlap @ 2013-08-05 15:20 UTC (permalink / raw)
  To: Wei Liu
  Cc: James Harper, Stefano Stabellini, Sylvain Munaut,
	xen-devel@lists.xen.org, ceph-devel@vger.kernel.org,
	Roger Pau Monne

On 05/08/13 16:18, Wei Liu wrote:
> On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:
>> On 05/08/13 14:55, Sylvain Munaut wrote:
>>> Hi George,
>>>
>>>
>>>> Yes; qemu knows how to be a Xen PV block back-end.
>>> Very interesting. Is there documentation about this somewhere ?
>>> I had a look some time ago and it was really not very clear.
>>>
>>> Things like what Xen version support this. And with which features (
>>> indirect descriptors, persistent grants, discard, flush, ...) and/or
>>> which limitation.
>> I don't think this is documented anywhere; you'll need to ask the
>> experts.  Stefano? Roger? Wei?
>>
> These are Linux features not Xen ones AFAICT. In theory they are not
> bound to specific Xen versions.
>
> For the network part I don't think new features depend on any specific
> hypercall. However for block Roger and Stefano seem to introduce
> new hypercalls for certain features (I might be wrong though).

We're talking about qemu; so the toolstack needs to know how to set up 
qdisk, and I think qdisk would need to be programmed to use, for 
example, persistent grants, yes?

  -G

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 15:20                       ` George Dunlap
@ 2013-08-05 15:32                         ` Wei Liu
  2013-08-05 15:32                         ` Wei Liu
  1 sibling, 0 replies; 99+ messages in thread
From: Wei Liu @ 2013-08-05 15:32 UTC (permalink / raw)
  To: George Dunlap
  Cc: Wei Liu, Sylvain Munaut, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org,
	Stefano Stabellini, Roger Pau Monne

On Mon, Aug 05, 2013 at 04:20:20PM +0100, George Dunlap wrote:
> On 05/08/13 16:18, Wei Liu wrote:
> >On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:
> >>On 05/08/13 14:55, Sylvain Munaut wrote:
> >>>Hi George,
> >>>
> >>>
> >>>>Yes; qemu knows how to be a Xen PV block back-end.
> >>>Very interesting. Is there documentation about this somewhere ?
> >>>I had a look some time ago and it was really not very clear.
> >>>
> >>>Things like what Xen version support this. And with which features (
> >>>indirect descriptors, persistent grants, discard, flush, ...) and/or
> >>>which limitation.
> >>I don't think this is documented anywhere; you'll need to ask the
> >>experts.  Stefano? Roger? Wei?
> >>
> >These are Linux features not Xen ones AFAICT. In theory they are not
> >bound to specific Xen versions.
> >
> >For the network part I don't think new features depend on any specific
> >hypercall. However for block Roger and Stefano seem to introduce
> >new hypercalls for certain features (I might be wrong though).
> 
> We're talking about qemu; so the toolstack needs to know how to set
> up qdisk, and I think qdisk would need to be programmed to use, for
> example, persistent grants, yes?
> 

I don't think toolstack needs to involve in this. At least for the
network part FE and BE negotiate what features to use. The general idea
is that new feature will always be of benifit to enable so we make use
of them whenever possible. Certain features do have sysfs entries to
configure but that's not coded into libxl.

I cannot speak for block drivers, but grepping the source code I don't
think you can configure persistent grants via libxl either.


Wei.

>  -G

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05 15:20                       ` George Dunlap
  2013-08-05 15:32                         ` Wei Liu
@ 2013-08-05 15:32                         ` Wei Liu
  1 sibling, 0 replies; 99+ messages in thread
From: Wei Liu @ 2013-08-05 15:32 UTC (permalink / raw)
  To: George Dunlap
  Cc: James Harper, Wei Liu, Stefano Stabellini, Sylvain Munaut,
	xen-devel@lists.xen.org, ceph-devel@vger.kernel.org,
	Roger Pau Monne

On Mon, Aug 05, 2013 at 04:20:20PM +0100, George Dunlap wrote:
> On 05/08/13 16:18, Wei Liu wrote:
> >On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:
> >>On 05/08/13 14:55, Sylvain Munaut wrote:
> >>>Hi George,
> >>>
> >>>
> >>>>Yes; qemu knows how to be a Xen PV block back-end.
> >>>Very interesting. Is there documentation about this somewhere ?
> >>>I had a look some time ago and it was really not very clear.
> >>>
> >>>Things like what Xen version support this. And with which features (
> >>>indirect descriptors, persistent grants, discard, flush, ...) and/or
> >>>which limitation.
> >>I don't think this is documented anywhere; you'll need to ask the
> >>experts.  Stefano? Roger? Wei?
> >>
> >These are Linux features not Xen ones AFAICT. In theory they are not
> >bound to specific Xen versions.
> >
> >For the network part I don't think new features depend on any specific
> >hypercall. However for block Roger and Stefano seem to introduce
> >new hypercalls for certain features (I might be wrong though).
> 
> We're talking about qemu; so the toolstack needs to know how to set
> up qdisk, and I think qdisk would need to be programmed to use, for
> example, persistent grants, yes?
> 

I don't think toolstack needs to involve in this. At least for the
network part FE and BE negotiate what features to use. The general idea
is that new feature will always be of benifit to enable so we make use
of them whenever possible. Certain features do have sysfs entries to
configure but that's not coded into libxl.

I cannot speak for block drivers, but grepping the source code I don't
think you can configure persistent grants via libxl either.


Wei.

>  -G

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05  9:41     ` [Xen-devel] " Sylvain Munaut
                         ` (2 preceding siblings ...)
  2013-08-09  0:12       ` James Harper
@ 2013-08-09  0:12       ` James Harper
  2013-08-09  9:21         ` Sylvain Munaut
  2013-08-09  9:21         ` [Xen-devel] " Sylvain Munaut
  3 siblings, 2 replies; 99+ messages in thread
From: James Harper @ 2013-08-09  0:12 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

> 
> Yes the procedure didn't change.
> 
> If you're on debian I could also sent your prebuilt .deb for blktap
> and for a patched xen version that includes userspace RBD support.
> 
> If you have any issue, I can be found on ceph's IRC under 'tnt' nick.
> 

I've had a few occasions where tapdisk has segfaulted:

tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
tapdisk:9180 blocked for more than 120 seconds.
tapdisk         D ffff88043fc13540     0  9180      1 0x00000000

and then like:

end_request: I/O error, dev tdc, sector 472008

I can't be sure but I suspect that when this happened either one OSD was offline, or the cluster lost quorum briefly.

James



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-05  9:41     ` [Xen-devel] " Sylvain Munaut
  2013-08-05  9:45       ` James Harper
  2013-08-05  9:45       ` [Xen-devel] " James Harper
@ 2013-08-09  0:12       ` James Harper
  2013-08-09  0:12       ` [Xen-devel] " James Harper
  3 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-09  0:12 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> 
> Yes the procedure didn't change.
> 
> If you're on debian I could also sent your prebuilt .deb for blktap
> and for a patched xen version that includes userspace RBD support.
> 
> If you have any issue, I can be found on ceph's IRC under 'tnt' nick.
> 

I've had a few occasions where tapdisk has segfaulted:

tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
tapdisk:9180 blocked for more than 120 seconds.
tapdisk         D ffff88043fc13540     0  9180      1 0x00000000

and then like:

end_request: I/O error, dev tdc, sector 472008

I can't be sure but I suspect that when this happened either one OSD was offline, or the cluster lost quorum briefly.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-09  0:12       ` [Xen-devel] " James Harper
  2013-08-09  9:21         ` Sylvain Munaut
@ 2013-08-09  9:21         ` Sylvain Munaut
  2013-08-11  0:51           ` James Harper
  2013-08-11  0:51           ` James Harper
  1 sibling, 2 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-09  9:21 UTC (permalink / raw)
  To: James Harper
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

Hi,

> I've had a few occasions where tapdisk has segfaulted:
>
> tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> tapdisk:9180 blocked for more than 120 seconds.
> tapdisk         D ffff88043fc13540     0  9180      1 0x00000000
>
> and then like:
>
> end_request: I/O error, dev tdc, sector 472008
>
> I can't be sure but I suspect that when this happened either one OSD was offline, or the cluster lost quorum briefly.

Interesting. There might be an issue if a request ends in error, I'll
have to check that.
I'll have a look on monday.

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-09  0:12       ` [Xen-devel] " James Harper
@ 2013-08-09  9:21         ` Sylvain Munaut
  2013-08-09  9:21         ` [Xen-devel] " Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-09  9:21 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

Hi,

> I've had a few occasions where tapdisk has segfaulted:
>
> tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> tapdisk:9180 blocked for more than 120 seconds.
> tapdisk         D ffff88043fc13540     0  9180      1 0x00000000
>
> and then like:
>
> end_request: I/O error, dev tdc, sector 472008
>
> I can't be sure but I suspect that when this happened either one OSD was offline, or the cluster lost quorum briefly.

Interesting. There might be an issue if a request ends in error, I'll
have to check that.
I'll have a look on monday.

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-09  9:21         ` [Xen-devel] " Sylvain Munaut
@ 2013-08-11  0:51           ` James Harper
  2013-08-11  1:02             ` James Harper
  2013-08-11  1:02             ` James Harper
  2013-08-11  0:51           ` James Harper
  1 sibling, 2 replies; 99+ messages in thread
From: James Harper @ 2013-08-11  0:51 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

> 
> Hi,
> 
> > I've had a few occasions where tapdisk has segfaulted:
> >
> > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> > tapdisk:9180 blocked for more than 120 seconds.
> > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000
> >
> > and then like:
> >
> > end_request: I/O error, dev tdc, sector 472008
> >
> > I can't be sure but I suspect that when this happened either one OSD was
> > offline, or the cluster lost quorum briefly.
> 
> Interesting. There might be an issue if a request ends in error, I'll
> have to check that.
> I'll have a look on monday.
> 

You say in tdrbd_finish_aiocb:

        while (1) {
                /* POSIX says write will be atomic or blocking */
                rv = write(prv->pipe_fds[1], (void*)&req, sizeof(req));

but from what I've read in "man 7 pipe", the statement about being atomic only applies if the pipe is open in non-blocking mode, and you open it with a call to pipe() (same as pipe2(,0)) and you never call fcntl to change it. This would be consistent with the random crashes I'm seeing - I thought they were related to transient errors but my ceph cluster has been perfectly stable for a few days now and it's still happening.

What do you think?

Thanks

James


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-09  9:21         ` [Xen-devel] " Sylvain Munaut
  2013-08-11  0:51           ` James Harper
@ 2013-08-11  0:51           ` James Harper
  1 sibling, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-11  0:51 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> 
> Hi,
> 
> > I've had a few occasions where tapdisk has segfaulted:
> >
> > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> > tapdisk:9180 blocked for more than 120 seconds.
> > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000
> >
> > and then like:
> >
> > end_request: I/O error, dev tdc, sector 472008
> >
> > I can't be sure but I suspect that when this happened either one OSD was
> > offline, or the cluster lost quorum briefly.
> 
> Interesting. There might be an issue if a request ends in error, I'll
> have to check that.
> I'll have a look on monday.
> 

You say in tdrbd_finish_aiocb:

        while (1) {
                /* POSIX says write will be atomic or blocking */
                rv = write(prv->pipe_fds[1], (void*)&req, sizeof(req));

but from what I've read in "man 7 pipe", the statement about being atomic only applies if the pipe is open in non-blocking mode, and you open it with a call to pipe() (same as pipe2(,0)) and you never call fcntl to change it. This would be consistent with the random crashes I'm seeing - I thought they were related to transient errors but my ceph cluster has been perfectly stable for a few days now and it's still happening.

What do you think?

Thanks

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-11  0:51           ` James Harper
@ 2013-08-11  1:02             ` James Harper
  2013-08-12 14:13               ` Sylvain Munaut
  2013-08-12 14:13               ` [Xen-devel] " Sylvain Munaut
  2013-08-11  1:02             ` James Harper
  1 sibling, 2 replies; 99+ messages in thread
From: James Harper @ 2013-08-11  1:02 UTC (permalink / raw)
  To: James Harper, Sylvain Munaut
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

> >
> > Hi,
> >
> > > I've had a few occasions where tapdisk has segfaulted:
> > >
> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> > > tapdisk:9180 blocked for more than 120 seconds.
> > > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000
> > >
> > > and then like:
> > >
> > > end_request: I/O error, dev tdc, sector 472008
> > >
> > > I can't be sure but I suspect that when this happened either one OSD was
> > > offline, or the cluster lost quorum briefly.
> >
> > Interesting. There might be an issue if a request ends in error, I'll
> > have to check that.
> > I'll have a look on monday.
> >
> 
> You say in tdrbd_finish_aiocb:
> 
>         while (1) {
>                 /* POSIX says write will be atomic or blocking */
>                 rv = write(prv->pipe_fds[1], (void*)&req, sizeof(req));
> 
> but from what I've read in "man 7 pipe", the statement about being atomic
> only applies if the pipe is open in non-blocking mode, and you open it with a
> call to pipe() (same as pipe2(,0)) and you never call fcntl to change it. This
> would be consistent with the random crashes I'm seeing - I thought they
> were related to transient errors but my ceph cluster has been perfectly
> stable for a few days now and it's still happening.
> 
> What do you think?
> 

Actually maybe not. What I was reading only applies for large number of bytes written to the pipe, and even then I got confused by the double negatives. Sorry for the noise.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-11  0:51           ` James Harper
  2013-08-11  1:02             ` James Harper
@ 2013-08-11  1:02             ` James Harper
  1 sibling, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-11  1:02 UTC (permalink / raw)
  To: James Harper, Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> >
> > Hi,
> >
> > > I've had a few occasions where tapdisk has segfaulted:
> > >
> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> > > tapdisk:9180 blocked for more than 120 seconds.
> > > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000
> > >
> > > and then like:
> > >
> > > end_request: I/O error, dev tdc, sector 472008
> > >
> > > I can't be sure but I suspect that when this happened either one OSD was
> > > offline, or the cluster lost quorum briefly.
> >
> > Interesting. There might be an issue if a request ends in error, I'll
> > have to check that.
> > I'll have a look on monday.
> >
> 
> You say in tdrbd_finish_aiocb:
> 
>         while (1) {
>                 /* POSIX says write will be atomic or blocking */
>                 rv = write(prv->pipe_fds[1], (void*)&req, sizeof(req));
> 
> but from what I've read in "man 7 pipe", the statement about being atomic
> only applies if the pipe is open in non-blocking mode, and you open it with a
> call to pipe() (same as pipe2(,0)) and you never call fcntl to change it. This
> would be consistent with the random crashes I'm seeing - I thought they
> were related to transient errors but my ceph cluster has been perfectly
> stable for a few days now and it's still happening.
> 
> What do you think?
> 

Actually maybe not. What I was reading only applies for large number of bytes written to the pipe, and even then I got confused by the double negatives. Sorry for the noise.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-11  1:02             ` James Harper
  2013-08-12 14:13               ` Sylvain Munaut
@ 2013-08-12 14:13               ` Sylvain Munaut
  2013-08-12 23:26                 ` James Harper
                                   ` (3 more replies)
  1 sibling, 4 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-12 14:13 UTC (permalink / raw)
  To: James Harper
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

Hi,

>> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
>> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
>> > > tapdisk:9180 blocked for more than 120 seconds.
>> > > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000

You can try generating a core file by changing the ulimit on the running process

http://superuser.com/questions/404239/setting-ulimit-on-a-running-process

A backtrace would be useful :)


> Actually maybe not. What I was reading only applies for large number of bytes written to the pipe, and even then I got confused by the double negatives. Sorry for the noise.

Yes, as you discovered but size < PIPE_BUF, they should be atomic even
in non-blocking mode. But I could still add assert() there to make
sure it is.


I did find a bug where it could "leak" requests which may lead to
hang. But it shouldn't crash ...

Here's an (untested yet) patch in the rbd error path:


diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
index 68fbed7..ab2d2c5 100644
--- a/drivers/block-rbd.c
+++ b/drivers/block-rbd.c
@@ -560,6 +560,9 @@ err:
        if (c)
                rbd_aio_release(c);

+       list_move(&req->queue, &prv->reqs_free);
+       prv->reqs_free_count++;
+
        return rv;
 }


Cheers,

     Sylvain

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-11  1:02             ` James Harper
@ 2013-08-12 14:13               ` Sylvain Munaut
  2013-08-12 14:13               ` [Xen-devel] " Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-12 14:13 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

Hi,

>> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
>> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
>> > > tapdisk:9180 blocked for more than 120 seconds.
>> > > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000

You can try generating a core file by changing the ulimit on the running process

http://superuser.com/questions/404239/setting-ulimit-on-a-running-process

A backtrace would be useful :)


> Actually maybe not. What I was reading only applies for large number of bytes written to the pipe, and even then I got confused by the double negatives. Sorry for the noise.

Yes, as you discovered but size < PIPE_BUF, they should be atomic even
in non-blocking mode. But I could still add assert() there to make
sure it is.


I did find a bug where it could "leak" requests which may lead to
hang. But it shouldn't crash ...

Here's an (untested yet) patch in the rbd error path:


diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
index 68fbed7..ab2d2c5 100644
--- a/drivers/block-rbd.c
+++ b/drivers/block-rbd.c
@@ -560,6 +560,9 @@ err:
        if (c)
                rbd_aio_release(c);

+       list_move(&req->queue, &prv->reqs_free);
+       prv->reqs_free_count++;
+
        return rv;
 }


Cheers,

     Sylvain

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-12 14:13               ` [Xen-devel] " Sylvain Munaut
  2013-08-12 23:26                 ` James Harper
@ 2013-08-12 23:26                 ` James Harper
  2013-08-13  0:39                 ` James Harper
  2013-08-13  0:39                 ` [Xen-devel] " James Harper
  3 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-12 23:26 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

> >> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> >> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> >> > > tapdisk:9180 blocked for more than 120 seconds.
> >> > > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000
> 
> You can try generating a core file by changing the ulimit on the running
> process
> 
> A backtrace would be useful :)
> 

I found it was actually dumping core in /, but gdb doesn't seem to work nicely and all I get is this:

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Cannot find new threads: generic error
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:163
163     ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.

Even when I attach to a running process.

One VM segfaults on startup, pretty much everytime except never when I attach strace to it, meaning it's probably a race condition and may not actually be in your code...

> 
> > Actually maybe not. What I was reading only applies for large number of
> > bytes written to the pipe, and even then I got confused by the double
> > negatives. Sorry for the noise.
> 
> Yes, as you discovered but size < PIPE_BUF, they should be atomic even
> in non-blocking mode. But I could still add assert() there to make
> sure it is.

Nah I got that completely backwards. I see now you are only passing a pointer so yes it should never be non-atomic.

> I did find a bug where it could "leak" requests which may lead to
> hang. But it shouldn't crash ...
> 
> Here's an (untested yet) patch in the rbd error path:
> 

I'll try that later this morning when I get a minute.

I've done the poor-mans-debugger thing and riddled the code with printf's but as far as I can determine every routine starts and ends. My thinking at the moment is that it's either a race (the VM's most likely to crash have multiple disks), or a buffer overflow that trips it up either immediately, or later.

I have definitely observed multiple VM's crash when something in ceph hiccup's (eg I bring a mon up or down), if that helps.

I also followed through the rbd_aio_release idea on the weekend - I can see that if the read returns failure it means the callback was never called so the release is then the responsibility of the caller.

Thanks

James


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-12 14:13               ` [Xen-devel] " Sylvain Munaut
@ 2013-08-12 23:26                 ` James Harper
  2013-08-12 23:26                 ` [Xen-devel] " James Harper
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-12 23:26 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> >> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> >> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> >> > > tapdisk:9180 blocked for more than 120 seconds.
> >> > > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000
> 
> You can try generating a core file by changing the ulimit on the running
> process
> 
> A backtrace would be useful :)
> 

I found it was actually dumping core in /, but gdb doesn't seem to work nicely and all I get is this:

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Cannot find new threads: generic error
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:163
163     ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.

Even when I attach to a running process.

One VM segfaults on startup, pretty much everytime except never when I attach strace to it, meaning it's probably a race condition and may not actually be in your code...

> 
> > Actually maybe not. What I was reading only applies for large number of
> > bytes written to the pipe, and even then I got confused by the double
> > negatives. Sorry for the noise.
> 
> Yes, as you discovered but size < PIPE_BUF, they should be atomic even
> in non-blocking mode. But I could still add assert() there to make
> sure it is.

Nah I got that completely backwards. I see now you are only passing a pointer so yes it should never be non-atomic.

> I did find a bug where it could "leak" requests which may lead to
> hang. But it shouldn't crash ...
> 
> Here's an (untested yet) patch in the rbd error path:
> 

I'll try that later this morning when I get a minute.

I've done the poor-mans-debugger thing and riddled the code with printf's but as far as I can determine every routine starts and ends. My thinking at the moment is that it's either a race (the VM's most likely to crash have multiple disks), or a buffer overflow that trips it up either immediately, or later.

I have definitely observed multiple VM's crash when something in ceph hiccup's (eg I bring a mon up or down), if that helps.

I also followed through the rbd_aio_release idea on the weekend - I can see that if the read returns failure it means the callback was never called so the release is then the responsibility of the caller.

Thanks

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-12 14:13               ` [Xen-devel] " Sylvain Munaut
                                   ` (2 preceding siblings ...)
  2013-08-13  0:39                 ` James Harper
@ 2013-08-13  0:39                 ` James Harper
  2013-08-13  8:32                   ` Sylvain Munaut
  2013-08-13  8:32                   ` [Xen-devel] " Sylvain Munaut
  3 siblings, 2 replies; 99+ messages in thread
From: James Harper @ 2013-08-13  0:39 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

> Here's an (untested yet) patch in the rbd error path:
> 
> diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
> index 68fbed7..ab2d2c5 100644
> --- a/drivers/block-rbd.c
> +++ b/drivers/block-rbd.c
> @@ -560,6 +560,9 @@ err:
>         if (c)
>                 rbd_aio_release(c);
> 
> +       list_move(&req->queue, &prv->reqs_free);
> +       prv->reqs_free_count++;
> +
>         return rv;
>  }
> 

FWIW, I can confirm via printf's that this error path is never hit in at least some of the crashes I'm seeing.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-12 14:13               ` [Xen-devel] " Sylvain Munaut
  2013-08-12 23:26                 ` James Harper
  2013-08-12 23:26                 ` [Xen-devel] " James Harper
@ 2013-08-13  0:39                 ` James Harper
  2013-08-13  0:39                 ` [Xen-devel] " James Harper
  3 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-13  0:39 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> Here's an (untested yet) patch in the rbd error path:
> 
> diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
> index 68fbed7..ab2d2c5 100644
> --- a/drivers/block-rbd.c
> +++ b/drivers/block-rbd.c
> @@ -560,6 +560,9 @@ err:
>         if (c)
>                 rbd_aio_release(c);
> 
> +       list_move(&req->queue, &prv->reqs_free);
> +       prv->reqs_free_count++;
> +
>         return rv;
>  }
> 

FWIW, I can confirm via printf's that this error path is never hit in at least some of the crashes I'm seeing.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13  0:39                 ` [Xen-devel] " James Harper
  2013-08-13  8:32                   ` Sylvain Munaut
@ 2013-08-13  8:32                   ` Sylvain Munaut
  2013-08-13  9:12                     ` James Harper
  2013-08-13  9:12                     ` [Xen-devel] " James Harper
  1 sibling, 2 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-13  8:32 UTC (permalink / raw)
  To: James Harper
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

> FWIW, I can confirm via printf's that this error path is never hit in at least some of the crashes I'm seeing.

Ok thanks.

Are you using cache btw ?

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13  0:39                 ` [Xen-devel] " James Harper
@ 2013-08-13  8:32                   ` Sylvain Munaut
  2013-08-13  8:32                   ` [Xen-devel] " Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-13  8:32 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> FWIW, I can confirm via printf's that this error path is never hit in at least some of the crashes I'm seeing.

Ok thanks.

Are you using cache btw ?

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13  8:32                   ` [Xen-devel] " Sylvain Munaut
  2013-08-13  9:12                     ` James Harper
@ 2013-08-13  9:12                     ` James Harper
  2013-08-13  9:20                       ` Sylvain Munaut
  2013-08-13  9:20                       ` Sylvain Munaut
  1 sibling, 2 replies; 99+ messages in thread
From: James Harper @ 2013-08-13  9:12 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

> 
> > FWIW, I can confirm via printf's that this error path is never hit in at least
> some of the crashes I'm seeing.
> 
> Ok thanks.
> 
> Are you using cache btw ?
> 

I hope not. How could I tell? It's not something I've explicitly enabled.

Thanks

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13  8:32                   ` [Xen-devel] " Sylvain Munaut
@ 2013-08-13  9:12                     ` James Harper
  2013-08-13  9:12                     ` [Xen-devel] " James Harper
  1 sibling, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-13  9:12 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> 
> > FWIW, I can confirm via printf's that this error path is never hit in at least
> some of the crashes I'm seeing.
> 
> Ok thanks.
> 
> Are you using cache btw ?
> 

I hope not. How could I tell? It's not something I've explicitly enabled.

Thanks

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13  9:12                     ` [Xen-devel] " James Harper
@ 2013-08-13  9:20                       ` Sylvain Munaut
  2013-08-13 14:59                         ` Frederik Thuysbaert
                                           ` (4 more replies)
  2013-08-13  9:20                       ` Sylvain Munaut
  1 sibling, 5 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-13  9:20 UTC (permalink / raw)
  To: James Harper
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

Hi,

> I hope not. How could I tell? It's not something I've explicitly enabled.

It's disabled by default.

So you'd have to have enabled it either in ceph.conf  or directly in
the device path in the xen config. (option is 'rbd cache',
http://ceph.com/docs/next/rbd/rbd-config-ref/ )

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13  9:12                     ` [Xen-devel] " James Harper
  2013-08-13  9:20                       ` Sylvain Munaut
@ 2013-08-13  9:20                       ` Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-13  9:20 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

Hi,

> I hope not. How could I tell? It's not something I've explicitly enabled.

It's disabled by default.

So you'd have to have enabled it either in ceph.conf  or directly in
the device path in the xen config. (option is 'rbd cache',
http://ceph.com/docs/next/rbd/rbd-config-ref/ )

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13  9:20                       ` Sylvain Munaut
@ 2013-08-13 14:59                         ` Frederik Thuysbaert
  2013-08-13 14:59                         ` Frederik Thuysbaert
                                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 99+ messages in thread
From: Frederik Thuysbaert @ 2013-08-13 14:59 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: James Harper, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org


Hi,

I have been testing this a while now, and just finished testing your 
untested patch. The rbd caching problem still persists.

The system I am testing on has the following characteristics:

Dom0:
     - Linux xen-001 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64
     - Most recent git checkout of blktap rbd branch

DomU:
     - Same kernel as dom0
     - Root (xvda1) is a logical volume on dom0
     - xvda2 is a Rados Block Device format 1

Let me start by saying that the errors only occur with RBD client 
caching ON.
I will give the error messages of both dom0 and domU before and after I 
applied the patch.

Actions in domU to trigger errors:

~# mkfs.xfs -f /dev/xvda2
~# mount /dev/xvda2 /mnt
~# bonnie -u 0 -g 0 /mnt


Error messages:

BEFORE patch:

Without RBD cache:

dom0: no errors
domU: no errors

With RBD cache:

dom0: no errors

domU:
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960475] lost page write due 
to I/O error on xvda2
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960488] lost page write due 
to I/O error on xvda2
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960501] lost page write due 
to I/O error on xvda2
...
Aug 13 18:18:52 debian-vm-101 kernel: [   56.394645] XFS (xvda2): 
xfs_do_force_shutdown(0x2) called from line 1007 of file 
/build/linux-s5x2oE/linux-3.2.46/fs/xfs/xfs_log.c.  Return address = 
0xffffffffa013ced5
Aug 13 18:19:19 debian-vm-101 kernel: [   83.941539] XFS (xvda2): 
xfs_log_force: error 5 returned.
Aug 13 18:19:19 debian-vm-101 kernel: [   83.941565] XFS (xvda2): 
xfs_log_force: error 5 returned.
...

AFTER patch:

Without RBD cache:

dom0: no errors
domU: no errors

With RBD cache:

dom0:
Aug 13 16:40:49 xen-001 kernel: [   94.954734] tapdisk[3075]: segfault 
at 7f749ee86da0 ip 00007f749d060776 sp 00007f748ea7a460 error 7 in 
libpthread-2.13.so[7f749d059000+17000]


domU:
Same as before patch.



I would like to add that I have the time to test this, we are happy to 
help you in any way possible. However, since I am no C developer, I 
won't be able to do much more than testing.


Regards

Frederik


On 13-08-13 11:20, Sylvain Munaut wrote:
> Hi,
>
>> I hope not. How could I tell? It's not something I've explicitly enabled.
> It's disabled by default.
>
> So you'd have to have enabled it either in ceph.conf  or directly in
> the device path in the xen config. (option is 'rbd cache',
> http://ceph.com/docs/next/rbd/rbd-config-ref/ )
>
> Cheers,
>
>      Sylvain
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13  9:20                       ` Sylvain Munaut
  2013-08-13 14:59                         ` Frederik Thuysbaert
@ 2013-08-13 14:59                         ` Frederik Thuysbaert
       [not found]                         ` <520A4945.1030907@gmail.com>
                                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 99+ messages in thread
From: Frederik Thuysbaert @ 2013-08-13 14:59 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org


Hi,

I have been testing this a while now, and just finished testing your 
untested patch. The rbd caching problem still persists.

The system I am testing on has the following characteristics:

Dom0:
     - Linux xen-001 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64
     - Most recent git checkout of blktap rbd branch

DomU:
     - Same kernel as dom0
     - Root (xvda1) is a logical volume on dom0
     - xvda2 is a Rados Block Device format 1

Let me start by saying that the errors only occur with RBD client 
caching ON.
I will give the error messages of both dom0 and domU before and after I 
applied the patch.

Actions in domU to trigger errors:

~# mkfs.xfs -f /dev/xvda2
~# mount /dev/xvda2 /mnt
~# bonnie -u 0 -g 0 /mnt


Error messages:

BEFORE patch:

Without RBD cache:

dom0: no errors
domU: no errors

With RBD cache:

dom0: no errors

domU:
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960475] lost page write due 
to I/O error on xvda2
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960488] lost page write due 
to I/O error on xvda2
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960501] lost page write due 
to I/O error on xvda2
...
Aug 13 18:18:52 debian-vm-101 kernel: [   56.394645] XFS (xvda2): 
xfs_do_force_shutdown(0x2) called from line 1007 of file 
/build/linux-s5x2oE/linux-3.2.46/fs/xfs/xfs_log.c.  Return address = 
0xffffffffa013ced5
Aug 13 18:19:19 debian-vm-101 kernel: [   83.941539] XFS (xvda2): 
xfs_log_force: error 5 returned.
Aug 13 18:19:19 debian-vm-101 kernel: [   83.941565] XFS (xvda2): 
xfs_log_force: error 5 returned.
...

AFTER patch:

Without RBD cache:

dom0: no errors
domU: no errors

With RBD cache:

dom0:
Aug 13 16:40:49 xen-001 kernel: [   94.954734] tapdisk[3075]: segfault 
at 7f749ee86da0 ip 00007f749d060776 sp 00007f748ea7a460 error 7 in 
libpthread-2.13.so[7f749d059000+17000]


domU:
Same as before patch.



I would like to add that I have the time to test this, we are happy to 
help you in any way possible. However, since I am no C developer, I 
won't be able to do much more than testing.


Regards

Frederik


On 13-08-13 11:20, Sylvain Munaut wrote:
> Hi,
>
>> I hope not. How could I tell? It's not something I've explicitly enabled.
> It's disabled by default.
>
> So you'd have to have enabled it either in ceph.conf  or directly in
> the device path in the xen config. (option is 'rbd cache',
> http://ceph.com/docs/next/rbd/rbd-config-ref/ )
>
> Cheers,
>
>      Sylvain
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
       [not found]                         ` <520A4945.1030907@gmail.com>
@ 2013-08-13 15:39                           ` Sylvain Munaut
  2013-08-13 23:39                             ` James Harper
                                               ` (5 more replies)
  2013-08-13 15:39                           ` Sylvain Munaut
  1 sibling, 6 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-13 15:39 UTC (permalink / raw)
  To: Frederik Thuysbaert
  Cc: James Harper, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

Hi,

> I have been testing this a while now, and just finished testing your
> untested patch. The rbd caching problem still persists.

Yes, I wouldn't expect to change anything for caching. But I still
don't understand why caching would change anything at all ... all of
it should be handled within the librbd lib.


Note that I would recommend against caching anyway. The blktap layer
doesn't pass through the FLUSH commands and so this make it completely
unsafe because the VM will think things are commited to disk durably
even though they are not ...



> I will give the error messages of both dom0 and domU before and after I
> applied the patch.

It's actually strange that it changes anything at all.

Can you try adding a ERROR("HERE\n");  in that error path processing
and check syslog to see if it's triggered at all ?

A traceback would be great if you can get a core file. And possibly
compile tapdisk with debug symbols.


Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
       [not found]                         ` <520A4945.1030907@gmail.com>
  2013-08-13 15:39                           ` [Xen-devel] " Sylvain Munaut
@ 2013-08-13 15:39                           ` Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-13 15:39 UTC (permalink / raw)
  To: Frederik Thuysbaert
  Cc: ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org

Hi,

> I have been testing this a while now, and just finished testing your
> untested patch. The rbd caching problem still persists.

Yes, I wouldn't expect to change anything for caching. But I still
don't understand why caching would change anything at all ... all of
it should be handled within the librbd lib.


Note that I would recommend against caching anyway. The blktap layer
doesn't pass through the FLUSH commands and so this make it completely
unsafe because the VM will think things are commited to disk durably
even though they are not ...



> I will give the error messages of both dom0 and domU before and after I
> applied the patch.

It's actually strange that it changes anything at all.

Can you try adding a ERROR("HERE\n");  in that error path processing
and check syslog to see if it's triggered at all ?

A traceback would be great if you can get a core file. And possibly
compile tapdisk with debug symbols.


Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13  9:20                       ` Sylvain Munaut
                                           ` (3 preceding siblings ...)
  2013-08-13 21:47                         ` James Harper
@ 2013-08-13 21:47                         ` James Harper
  4 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-13 21:47 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

Just noticed email subject "qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]" where Sage noted that he has seen a completion called twice in the logs the OP posted. If that is actually happening (and not just an artefact of logging ring buffer overflowing or something) then I think that could easily cause a segfault in tapdisk rbd.

I'll try and see if I can log when that happens.

James

> -----Original Message-----
> From: Sylvain Munaut [mailto:s.munaut@whatever-company.com]
> Sent: Tuesday, 13 August 2013 7:20 PM
> To: James Harper
> Cc: Pasi Kärkkäinen; ceph-devel@vger.kernel.org; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to
> test ? :p
> 
> Hi,
> 
> > I hope not. How could I tell? It's not something I've explicitly enabled.
> 
> It's disabled by default.
> 
> So you'd have to have enabled it either in ceph.conf  or directly in
> the device path in the xen config. (option is 'rbd cache',
> http://ceph.com/docs/next/rbd/rbd-config-ref/ )
> 
> Cheers,
> 
>     Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13  9:20                       ` Sylvain Munaut
                                           ` (2 preceding siblings ...)
       [not found]                         ` <520A4945.1030907@gmail.com>
@ 2013-08-13 21:47                         ` James Harper
  2013-08-13 21:47                         ` [Xen-devel] " James Harper
  4 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-13 21:47 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

Just noticed email subject "qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]" where Sage noted that he has seen a completion called twice in the logs the OP posted. If that is actually happening (and not just an artefact of logging ring buffer overflowing or something) then I think that could easily cause a segfault in tapdisk rbd.

I'll try and see if I can log when that happens.

James

> -----Original Message-----
> From: Sylvain Munaut [mailto:s.munaut@whatever-company.com]
> Sent: Tuesday, 13 August 2013 7:20 PM
> To: James Harper
> Cc: Pasi Kärkkäinen; ceph-devel@vger.kernel.org; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to
> test ? :p
> 
> Hi,
> 
> > I hope not. How could I tell? It's not something I've explicitly enabled.
> 
> It's disabled by default.
> 
> So you'd have to have enabled it either in ceph.conf  or directly in
> the device path in the xen config. (option is 'rbd cache',
> http://ceph.com/docs/next/rbd/rbd-config-ref/ )
> 
> Cheers,
> 
>     Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 15:39                           ` [Xen-devel] " Sylvain Munaut
  2013-08-13 23:39                             ` James Harper
@ 2013-08-13 23:39                             ` James Harper
  2013-08-13 23:43                               ` Sylvain Munaut
  2013-08-13 23:43                               ` Sylvain Munaut
  2013-08-14  8:43                             ` [Xen-devel] " Frederik Thuysbaert
                                               ` (3 subsequent siblings)
  5 siblings, 2 replies; 99+ messages in thread
From: James Harper @ 2013-08-13 23:39 UTC (permalink / raw)
  To: Sylvain Munaut, Frederik Thuysbaert
  Cc: Pasi Kärkkäinen, ceph-devel@vger.kernel.org,
	xen-devel@lists.xen.org

I think I have a separate problem too - tapdisk will segfault almost immediately upon starting but seemingly only for Linux PV DomU's. Once it has started doing this I have to wait a few hours to a day before it starts working again. My Windows DomU's appear to be able to start normally though.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 15:39                           ` [Xen-devel] " Sylvain Munaut
@ 2013-08-13 23:39                             ` James Harper
  2013-08-13 23:39                             ` [Xen-devel] " James Harper
                                               ` (4 subsequent siblings)
  5 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-13 23:39 UTC (permalink / raw)
  To: Sylvain Munaut, Frederik Thuysbaert
  Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

I think I have a separate problem too - tapdisk will segfault almost immediately upon starting but seemingly only for Linux PV DomU's. Once it has started doing this I have to wait a few hours to a day before it starts working again. My Windows DomU's appear to be able to start normally though.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 23:39                             ` [Xen-devel] " James Harper
@ 2013-08-13 23:43                               ` Sylvain Munaut
  2013-08-13 23:51                                 ` James Harper
  2013-08-13 23:51                                 ` James Harper
  2013-08-13 23:43                               ` Sylvain Munaut
  1 sibling, 2 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-13 23:43 UTC (permalink / raw)
  To: James Harper
  Cc: Frederik Thuysbaert, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

On Wed, Aug 14, 2013 at 1:39 AM, James Harper
<james.harper@bendigoit.com.au> wrote:
> I think I have a separate problem too - tapdisk will segfault almost immediately upon starting but seemingly only for Linux PV DomU's. Once it has started doing this I have to wait a few hours to a day before it starts working again. My Windows DomU's appear to be able to start normally though.

What about other blktap driver ? like using blktap raw driver, does
that work without issue ?

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 23:39                             ` [Xen-devel] " James Harper
  2013-08-13 23:43                               ` Sylvain Munaut
@ 2013-08-13 23:43                               ` Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-13 23:43 UTC (permalink / raw)
  To: James Harper
  Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org,
	Frederik Thuysbaert

On Wed, Aug 14, 2013 at 1:39 AM, James Harper
<james.harper@bendigoit.com.au> wrote:
> I think I have a separate problem too - tapdisk will segfault almost immediately upon starting but seemingly only for Linux PV DomU's. Once it has started doing this I have to wait a few hours to a day before it starts working again. My Windows DomU's appear to be able to start normally though.

What about other blktap driver ? like using blktap raw driver, does
that work without issue ?

Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 23:43                               ` Sylvain Munaut
@ 2013-08-13 23:51                                 ` James Harper
  2013-08-13 23:59                                   ` James Harper
  2013-08-13 23:59                                   ` James Harper
  2013-08-13 23:51                                 ` James Harper
  1 sibling, 2 replies; 99+ messages in thread
From: James Harper @ 2013-08-13 23:51 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Frederik Thuysbaert, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> 
> On Wed, Aug 14, 2013 at 1:39 AM, James Harper
> <james.harper@bendigoit.com.au> wrote:
> > I think I have a separate problem too - tapdisk will segfault almost
> immediately upon starting but seemingly only for Linux PV DomU's. Once it
> has started doing this I have to wait a few hours to a day before it starts
> working again. My Windows DomU's appear to be able to start normally
> though.
> 
> What about other blktap driver ? like using blktap raw driver, does
> that work without issue ?
> 

What's the syntax for that? I use tap2:tapdisk:rbd for rbd, but I don't know how to specify raw and anything I try just says it doesn't understand

Thanks

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 23:43                               ` Sylvain Munaut
  2013-08-13 23:51                                 ` James Harper
@ 2013-08-13 23:51                                 ` James Harper
  1 sibling, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-13 23:51 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org,
	Frederik Thuysbaert

> 
> On Wed, Aug 14, 2013 at 1:39 AM, James Harper
> <james.harper@bendigoit.com.au> wrote:
> > I think I have a separate problem too - tapdisk will segfault almost
> immediately upon starting but seemingly only for Linux PV DomU's. Once it
> has started doing this I have to wait a few hours to a day before it starts
> working again. My Windows DomU's appear to be able to start normally
> though.
> 
> What about other blktap driver ? like using blktap raw driver, does
> that work without issue ?
> 

What's the syntax for that? I use tap2:tapdisk:rbd for rbd, but I don't know how to specify raw and anything I try just says it doesn't understand

Thanks

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 23:51                                 ` James Harper
@ 2013-08-13 23:59                                   ` James Harper
  2013-08-14 13:13                                     ` Sylvain Munaut
  2013-08-14 13:13                                     ` Sylvain Munaut
  2013-08-13 23:59                                   ` James Harper
  1 sibling, 2 replies; 99+ messages in thread
From: James Harper @ 2013-08-13 23:59 UTC (permalink / raw)
  To: James Harper, Sylvain Munaut
  Cc: Frederik Thuysbaert, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> 
> >
> > On Wed, Aug 14, 2013 at 1:39 AM, James Harper
> > <james.harper@bendigoit.com.au> wrote:
> > > I think I have a separate problem too - tapdisk will segfault almost
> > immediately upon starting but seemingly only for Linux PV DomU's. Once it
> > has started doing this I have to wait a few hours to a day before it starts
> > working again. My Windows DomU's appear to be able to start normally
> > though.
> >
> > What about other blktap driver ? like using blktap raw driver, does
> > that work without issue ?
> >
> 
> What's the syntax for that? I use tap2:tapdisk:rbd for rbd, but I don't know
> how to specify raw and anything I try just says it doesn't understand
> 

I just tested with tap2:aio and that worked (had an old image of the VM on lvm still so just tested with that). Switching back to rbd and it crashes every time, just as postgres is starting in the vm. Booting into single user mode, waiting 30 seconds, then letting the boot continue it still crashes at the same point so I think it's not a timing thing - maybe postgres has a disk access pattern that is triggering the bug?

Putting printf's in seems to make the problem go away sometimes, so it's hard to debug.

James


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 23:51                                 ` James Harper
  2013-08-13 23:59                                   ` James Harper
@ 2013-08-13 23:59                                   ` James Harper
  1 sibling, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-13 23:59 UTC (permalink / raw)
  To: James Harper, Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org,
	Frederik Thuysbaert

> 
> >
> > On Wed, Aug 14, 2013 at 1:39 AM, James Harper
> > <james.harper@bendigoit.com.au> wrote:
> > > I think I have a separate problem too - tapdisk will segfault almost
> > immediately upon starting but seemingly only for Linux PV DomU's. Once it
> > has started doing this I have to wait a few hours to a day before it starts
> > working again. My Windows DomU's appear to be able to start normally
> > though.
> >
> > What about other blktap driver ? like using blktap raw driver, does
> > that work without issue ?
> >
> 
> What's the syntax for that? I use tap2:tapdisk:rbd for rbd, but I don't know
> how to specify raw and anything I try just says it doesn't understand
> 

I just tested with tap2:aio and that worked (had an old image of the VM on lvm still so just tested with that). Switching back to rbd and it crashes every time, just as postgres is starting in the vm. Booting into single user mode, waiting 30 seconds, then letting the boot continue it still crashes at the same point so I think it's not a timing thing - maybe postgres has a disk access pattern that is triggering the bug?

Putting printf's in seems to make the problem go away sometimes, so it's hard to debug.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 15:39                           ` [Xen-devel] " Sylvain Munaut
  2013-08-13 23:39                             ` James Harper
  2013-08-13 23:39                             ` [Xen-devel] " James Harper
@ 2013-08-14  8:43                             ` Frederik Thuysbaert
  2013-08-14 15:03                               ` Sylvain Munaut
  2013-08-14 15:03                               ` Sylvain Munaut
  2013-08-14  8:43                             ` Frederik Thuysbaert
                                               ` (2 subsequent siblings)
  5 siblings, 2 replies; 99+ messages in thread
From: Frederik Thuysbaert @ 2013-08-14  8:43 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: James Harper, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

On 13-08-13 17:39, Sylvain Munaut wrote:
> Hi,
>
>> I have been testing this a while now, and just finished testing your
>> untested patch. The rbd caching problem still persists.
> Yes, I wouldn't expect to change anything for caching. But I still
> don't understand why caching would change anything at all ... all of
> it should be handled within the librbd lib.
>
>
> Note that I would recommend against caching anyway. The blktap layer
> doesn't pass through the FLUSH commands and so this make it completely
> unsafe because the VM will think things are commited to disk durably
> even though they are not ...
>
>
>
>> I will give the error messages of both dom0 and domU before and after I
>> applied the patch.
> It's actually strange that it changes anything at all.
>
> Can you try adding a ERROR("HERE\n");  in that error path processing
> and check syslog to see if it's triggered at all ?
I did this, and can confirm that it is not triggered.
>
> A traceback would be great if you can get a core file. And possibly
> compile tapdisk with debug symbols.
I'm not quite sure what u mean, can u give some more information on how 
I do this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not 
sure this is what u meant.
>
> Cheers,
>
>      Sylvain
Regards

- Frederik

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 15:39                           ` [Xen-devel] " Sylvain Munaut
                                               ` (2 preceding siblings ...)
  2013-08-14  8:43                             ` [Xen-devel] " Frederik Thuysbaert
@ 2013-08-14  8:43                             ` Frederik Thuysbaert
  2013-08-14  8:47                             ` Frederik Thuysbaert
  2013-08-14  8:47                             ` [Xen-devel] " Frederik Thuysbaert
  5 siblings, 0 replies; 99+ messages in thread
From: Frederik Thuysbaert @ 2013-08-14  8:43 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org

On 13-08-13 17:39, Sylvain Munaut wrote:
> Hi,
>
>> I have been testing this a while now, and just finished testing your
>> untested patch. The rbd caching problem still persists.
> Yes, I wouldn't expect to change anything for caching. But I still
> don't understand why caching would change anything at all ... all of
> it should be handled within the librbd lib.
>
>
> Note that I would recommend against caching anyway. The blktap layer
> doesn't pass through the FLUSH commands and so this make it completely
> unsafe because the VM will think things are commited to disk durably
> even though they are not ...
>
>
>
>> I will give the error messages of both dom0 and domU before and after I
>> applied the patch.
> It's actually strange that it changes anything at all.
>
> Can you try adding a ERROR("HERE\n");  in that error path processing
> and check syslog to see if it's triggered at all ?
I did this, and can confirm that it is not triggered.
>
> A traceback would be great if you can get a core file. And possibly
> compile tapdisk with debug symbols.
I'm not quite sure what u mean, can u give some more information on how 
I do this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not 
sure this is what u meant.
>
> Cheers,
>
>      Sylvain
Regards

- Frederik

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 15:39                           ` [Xen-devel] " Sylvain Munaut
                                               ` (4 preceding siblings ...)
  2013-08-14  8:47                             ` Frederik Thuysbaert
@ 2013-08-14  8:47                             ` Frederik Thuysbaert
  5 siblings, 0 replies; 99+ messages in thread
From: Frederik Thuysbaert @ 2013-08-14  8:47 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: James Harper, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

On 13-08-13 17:39, Sylvain Munaut wrote:
>
> It's actually strange that it changes anything at all.
>
> Can you try adding a ERROR("HERE\n");  in that error path processing
> and check syslog to see if it's triggered at all ?
>
> A traceback would be great if you can get a core file. And possibly
> compile tapdisk with debug symbols.
When halting the domU after the errors, I get the following in dom0 syslog:

Aug 14 10:43:57 xen-001 kernel: [ 5041.338756] INFO: task tapdisk:9690 
blocked for more than 120 seconds.
Aug 14 10:43:57 xen-001 kernel: [ 5041.338817] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 14 10:43:57 xen-001 kernel: [ 5041.338903] tapdisk         D 
ffff8800bf213780     0  9690      1 0x00000000
Aug 14 10:43:57 xen-001 kernel: [ 5041.338908]  ffff8800b4b0e730 
0000000000000246 ffff880000000000 ffffffff8160d020
Aug 14 10:43:57 xen-001 kernel: [ 5041.338912]  0000000000013780 
ffff8800b4ebffd8 ffff8800b4ebffd8 ffff8800b4b0e730
Aug 14 10:43:57 xen-001 kernel: [ 5041.338916]  ffff8800b4d36190 
0000000181199c37 ffff8800b5798c00 ffff8800b5798c00
Aug 14 10:43:57 xen-001 kernel: [ 5041.338921] Call Trace:
Aug 14 10:43:57 xen-001 kernel: [ 5041.338929] [<ffffffffa0308411>] ? 
blktap_device_destroy_sync+0x85/0x9b [blktap]
Aug 14 10:43:57 xen-001 kernel: [ 5041.338936] [<ffffffff8105fadf>] ? 
add_wait_queue+0x3c/0x3c
Aug 14 10:43:57 xen-001 kernel: [ 5041.338940] [<ffffffffa0307444>] ? 
blktap_ring_release+0x10/0x2d [blktap]
Aug 14 10:43:57 xen-001 kernel: [ 5041.338945] [<ffffffff810fb141>] ? 
fput+0xf9/0x1a1
Aug 14 10:43:57 xen-001 kernel: [ 5041.338949] [<ffffffff810f8e6c>] ? 
filp_close+0x62/0x6a
Aug 14 10:43:57 xen-001 kernel: [ 5041.338954] [<ffffffff81049831>] ? 
put_files_struct+0x60/0xad
Aug 14 10:43:57 xen-001 kernel: [ 5041.338958] [<ffffffff81049e38>] ? 
do_exit+0x292/0x713
Aug 14 10:43:57 xen-001 kernel: [ 5041.338961] [<ffffffff8104a539>] ? 
do_group_exit+0x74/0x9e
Aug 14 10:43:57 xen-001 kernel: [ 5041.338965] [<ffffffff81055f94>] ? 
get_signal_to_deliver+0x46d/0x48f
Aug 14 10:43:57 xen-001 kernel: [ 5041.338970] [<ffffffff81347759>] ? 
force_sig_info_fault+0x5b/0x63
Aug 14 10:43:57 xen-001 kernel: [ 5041.338975] [<ffffffff8100de27>] ? 
do_signal+0x38/0x610
Aug 14 10:43:57 xen-001 kernel: [ 5041.338979] [<ffffffff81070deb>] ? 
arch_local_irq_restore+0x7/0x8
Aug 14 10:43:57 xen-001 kernel: [ 5041.338983] [<ffffffff8134eb77>] ? 
_raw_spin_unlock_irqrestore+0xe/0xf
Aug 14 10:43:57 xen-001 kernel: [ 5041.338987] [<ffffffff8103f944>] ? 
wake_up_new_task+0xb9/0xc2
Aug 14 10:43:57 xen-001 kernel: [ 5041.338992] [<ffffffff8106f987>] ? 
sys_futex+0x120/0x151
Aug 14 10:43:57 xen-001 kernel: [ 5041.338995] [<ffffffff8100e435>] ? 
do_notify_resume+0x25/0x68
Aug 14 10:43:57 xen-001 kernel: [ 5041.338999] [<ffffffff8134ef3c>] ? 
retint_signal+0x48/0x8c
...
Aug 14 10:44:17 xen-001 tap-ctl: tap-err:tap_ctl_connect: couldn't 
connect to /var/run/blktap-control/ctl9478: 111

>
>
> Cheers,
>
>      Sylvain
Regards

- Frederik

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 15:39                           ` [Xen-devel] " Sylvain Munaut
                                               ` (3 preceding siblings ...)
  2013-08-14  8:43                             ` Frederik Thuysbaert
@ 2013-08-14  8:47                             ` Frederik Thuysbaert
  2013-08-14  8:47                             ` [Xen-devel] " Frederik Thuysbaert
  5 siblings, 0 replies; 99+ messages in thread
From: Frederik Thuysbaert @ 2013-08-14  8:47 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org

On 13-08-13 17:39, Sylvain Munaut wrote:
>
> It's actually strange that it changes anything at all.
>
> Can you try adding a ERROR("HERE\n");  in that error path processing
> and check syslog to see if it's triggered at all ?
>
> A traceback would be great if you can get a core file. And possibly
> compile tapdisk with debug symbols.
When halting the domU after the errors, I get the following in dom0 syslog:

Aug 14 10:43:57 xen-001 kernel: [ 5041.338756] INFO: task tapdisk:9690 
blocked for more than 120 seconds.
Aug 14 10:43:57 xen-001 kernel: [ 5041.338817] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 14 10:43:57 xen-001 kernel: [ 5041.338903] tapdisk         D 
ffff8800bf213780     0  9690      1 0x00000000
Aug 14 10:43:57 xen-001 kernel: [ 5041.338908]  ffff8800b4b0e730 
0000000000000246 ffff880000000000 ffffffff8160d020
Aug 14 10:43:57 xen-001 kernel: [ 5041.338912]  0000000000013780 
ffff8800b4ebffd8 ffff8800b4ebffd8 ffff8800b4b0e730
Aug 14 10:43:57 xen-001 kernel: [ 5041.338916]  ffff8800b4d36190 
0000000181199c37 ffff8800b5798c00 ffff8800b5798c00
Aug 14 10:43:57 xen-001 kernel: [ 5041.338921] Call Trace:
Aug 14 10:43:57 xen-001 kernel: [ 5041.338929] [<ffffffffa0308411>] ? 
blktap_device_destroy_sync+0x85/0x9b [blktap]
Aug 14 10:43:57 xen-001 kernel: [ 5041.338936] [<ffffffff8105fadf>] ? 
add_wait_queue+0x3c/0x3c
Aug 14 10:43:57 xen-001 kernel: [ 5041.338940] [<ffffffffa0307444>] ? 
blktap_ring_release+0x10/0x2d [blktap]
Aug 14 10:43:57 xen-001 kernel: [ 5041.338945] [<ffffffff810fb141>] ? 
fput+0xf9/0x1a1
Aug 14 10:43:57 xen-001 kernel: [ 5041.338949] [<ffffffff810f8e6c>] ? 
filp_close+0x62/0x6a
Aug 14 10:43:57 xen-001 kernel: [ 5041.338954] [<ffffffff81049831>] ? 
put_files_struct+0x60/0xad
Aug 14 10:43:57 xen-001 kernel: [ 5041.338958] [<ffffffff81049e38>] ? 
do_exit+0x292/0x713
Aug 14 10:43:57 xen-001 kernel: [ 5041.338961] [<ffffffff8104a539>] ? 
do_group_exit+0x74/0x9e
Aug 14 10:43:57 xen-001 kernel: [ 5041.338965] [<ffffffff81055f94>] ? 
get_signal_to_deliver+0x46d/0x48f
Aug 14 10:43:57 xen-001 kernel: [ 5041.338970] [<ffffffff81347759>] ? 
force_sig_info_fault+0x5b/0x63
Aug 14 10:43:57 xen-001 kernel: [ 5041.338975] [<ffffffff8100de27>] ? 
do_signal+0x38/0x610
Aug 14 10:43:57 xen-001 kernel: [ 5041.338979] [<ffffffff81070deb>] ? 
arch_local_irq_restore+0x7/0x8
Aug 14 10:43:57 xen-001 kernel: [ 5041.338983] [<ffffffff8134eb77>] ? 
_raw_spin_unlock_irqrestore+0xe/0xf
Aug 14 10:43:57 xen-001 kernel: [ 5041.338987] [<ffffffff8103f944>] ? 
wake_up_new_task+0xb9/0xc2
Aug 14 10:43:57 xen-001 kernel: [ 5041.338992] [<ffffffff8106f987>] ? 
sys_futex+0x120/0x151
Aug 14 10:43:57 xen-001 kernel: [ 5041.338995] [<ffffffff8100e435>] ? 
do_notify_resume+0x25/0x68
Aug 14 10:43:57 xen-001 kernel: [ 5041.338999] [<ffffffff8134ef3c>] ? 
retint_signal+0x48/0x8c
...
Aug 14 10:44:17 xen-001 tap-ctl: tap-err:tap_ctl_connect: couldn't 
connect to /var/run/blktap-control/ctl9478: 111

>
>
> Cheers,
>
>      Sylvain
Regards

- Frederik

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 23:59                                   ` James Harper
@ 2013-08-14 13:13                                     ` Sylvain Munaut
  2013-08-14 13:16                                       ` James Harper
                                                         ` (5 more replies)
  2013-08-14 13:13                                     ` Sylvain Munaut
  1 sibling, 6 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-14 13:13 UTC (permalink / raw)
  To: James Harper
  Cc: Frederik Thuysbaert, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

Hi,

> I just tested with tap2:aio and that worked (had an old image of the VM on lvm still so just tested with that). Switching back to rbd and it crashes every time, just as postgres is starting in the vm. Booting into single user mode, waiting 30 seconds, then letting the boot continue it still crashes at the same point so I think it's not a timing thing - maybe postgres has a disk access pattern that is triggering the bug?

Mmm, that's really interesting.

Could you try to disable request merging ? Just give option
max_merge_size=0 in the tap2 disk description. Something like
'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'

Cheers,

     Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-13 23:59                                   ` James Harper
  2013-08-14 13:13                                     ` Sylvain Munaut
@ 2013-08-14 13:13                                     ` Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-14 13:13 UTC (permalink / raw)
  To: James Harper
  Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org,
	Frederik Thuysbaert

Hi,

> I just tested with tap2:aio and that worked (had an old image of the VM on lvm still so just tested with that). Switching back to rbd and it crashes every time, just as postgres is starting in the vm. Booting into single user mode, waiting 30 seconds, then letting the boot continue it still crashes at the same point so I think it's not a timing thing - maybe postgres has a disk access pattern that is triggering the bug?

Mmm, that's really interesting.

Could you try to disable request merging ? Just give option
max_merge_size=0 in the tap2 disk description. Something like
'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'

Cheers,

     Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-14 13:13                                     ` Sylvain Munaut
  2013-08-14 13:16                                       ` James Harper
@ 2013-08-14 13:16                                       ` James Harper
  2013-08-15  7:20                                       ` James Harper
                                                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-14 13:16 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Frederik Thuysbaert, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> 
> Hi,
> 
> > I just tested with tap2:aio and that worked (had an old image of the VM on
> lvm still so just tested with that). Switching back to rbd and it crashes every
> time, just as postgres is starting in the vm. Booting into single user mode,
> waiting 30 seconds, then letting the boot continue it still crashes at the same
> point so I think it's not a timing thing - maybe postgres has a disk access
> pattern that is triggering the bug?
> 
> Mmm, that's really interesting.
> 
> Could you try to disable request merging ? Just give option
> max_merge_size=0 in the tap2 disk description. Something like
> 'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'
> 

Just as suddenly the problem went away and I can no longer reproduce the crash on startup. Very frustrating. Most likely it still crashed during heavy use but that can take days.

I've just upgraded librbd to dumpling (from cuttlefish) on that one server and will see what it's doing by morning. I'll disable merging when I can reproduce it next.

Thanks

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-14 13:13                                     ` Sylvain Munaut
@ 2013-08-14 13:16                                       ` James Harper
  2013-08-14 13:16                                       ` [Xen-devel] " James Harper
                                                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-14 13:16 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org,
	Frederik Thuysbaert

> 
> Hi,
> 
> > I just tested with tap2:aio and that worked (had an old image of the VM on
> lvm still so just tested with that). Switching back to rbd and it crashes every
> time, just as postgres is starting in the vm. Booting into single user mode,
> waiting 30 seconds, then letting the boot continue it still crashes at the same
> point so I think it's not a timing thing - maybe postgres has a disk access
> pattern that is triggering the bug?
> 
> Mmm, that's really interesting.
> 
> Could you try to disable request merging ? Just give option
> max_merge_size=0 in the tap2 disk description. Something like
> 'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'
> 

Just as suddenly the problem went away and I can no longer reproduce the crash on startup. Very frustrating. Most likely it still crashed during heavy use but that can take days.

I've just upgraded librbd to dumpling (from cuttlefish) on that one server and will see what it's doing by morning. I'll disable merging when I can reproduce it next.

Thanks

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-14  8:43                             ` [Xen-devel] " Frederik Thuysbaert
@ 2013-08-14 15:03                               ` Sylvain Munaut
  2013-08-16  8:27                                 ` Frederik Thuysbaert
  2013-08-16  8:27                                 ` [Xen-devel] " Frederik Thuysbaert
  2013-08-14 15:03                               ` Sylvain Munaut
  1 sibling, 2 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-14 15:03 UTC (permalink / raw)
  To: Frederik Thuysbaert
  Cc: James Harper, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

Hi Frederik,

>> A traceback would be great if you can get a core file. And possibly
>> compile tapdisk with debug symbols.
>
> I'm not quite sure what u mean, can u give some more information on how I do
> this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not sure this
> is what u meant.

Yes, ./configure CFLAGS=-g LDFLAGS=-g  is a good start.

Then when it crashes, if will leave a 'core' time somewhere. (not sure
where, maybe in / or in /tmp)
If it doesn't you may have to enable it. When the process is running,
use this on the tapdisk PID :

http://superuser.com/questions/404239/setting-ulimit-on-a-running-process

Then once you have a core file, you can use gdb along with the tapdisk
executable to generate a meaningful backtrace of where the crash
happenned :

See for ex http://publib.boulder.ibm.com/httpserv/ihsdiag/get_backtrace.html
for how to do it.


> When halting the domU after the errors, I get the following in dom0 syslog:

It's not really unexpected. If tapdisk crashes the IO ring is going to
be left hanging and god knows what weird behaviour will happen ...


Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-14  8:43                             ` [Xen-devel] " Frederik Thuysbaert
  2013-08-14 15:03                               ` Sylvain Munaut
@ 2013-08-14 15:03                               ` Sylvain Munaut
  1 sibling, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-08-14 15:03 UTC (permalink / raw)
  To: Frederik Thuysbaert
  Cc: ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org

Hi Frederik,

>> A traceback would be great if you can get a core file. And possibly
>> compile tapdisk with debug symbols.
>
> I'm not quite sure what u mean, can u give some more information on how I do
> this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not sure this
> is what u meant.

Yes, ./configure CFLAGS=-g LDFLAGS=-g  is a good start.

Then when it crashes, if will leave a 'core' time somewhere. (not sure
where, maybe in / or in /tmp)
If it doesn't you may have to enable it. When the process is running,
use this on the tapdisk PID :

http://superuser.com/questions/404239/setting-ulimit-on-a-running-process

Then once you have a core file, you can use gdb along with the tapdisk
executable to generate a meaningful backtrace of where the crash
happenned :

See for ex http://publib.boulder.ibm.com/httpserv/ihsdiag/get_backtrace.html
for how to do it.


> When halting the domU after the errors, I get the following in dom0 syslog:

It's not really unexpected. If tapdisk crashes the IO ring is going to
be left hanging and god knows what weird behaviour will happen ...


Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-14 13:13                                     ` Sylvain Munaut
                                                         ` (2 preceding siblings ...)
  2013-08-15  7:20                                       ` James Harper
@ 2013-08-15  7:20                                       ` James Harper
  2013-08-16  1:02                                       ` James Harper
  2013-08-16  1:02                                       ` [Xen-devel] " James Harper
  5 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-15  7:20 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Frederik Thuysbaert, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

> >
> > Hi,
> >
> > > I just tested with tap2:aio and that worked (had an old image of the VM
> on
> > lvm still so just tested with that). Switching back to rbd and it crashes every
> > time, just as postgres is starting in the vm. Booting into single user mode,
> > waiting 30 seconds, then letting the boot continue it still crashes at the
> same
> > point so I think it's not a timing thing - maybe postgres has a disk access
> > pattern that is triggering the bug?
> >
> > Mmm, that's really interesting.
> >
> > Could you try to disable request merging ? Just give option
> > max_merge_size=0 in the tap2 disk description. Something like
> > 'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'
> >
> 
> Just as suddenly the problem went away and I can no longer reproduce the
> crash on startup. Very frustrating. Most likely it still crashed during heavy use
> but that can take days.
> 
> I've just upgraded librbd to dumpling (from cuttlefish) on that one server and
> will see what it's doing by morning. I'll disable merging when I can reproduce
> it next.
> 

I just had a crash since upgrading to dumpling, and will disable merging tonight.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-14 13:13                                     ` Sylvain Munaut
  2013-08-14 13:16                                       ` James Harper
  2013-08-14 13:16                                       ` [Xen-devel] " James Harper
@ 2013-08-15  7:20                                       ` James Harper
  2013-08-15  7:20                                       ` [Xen-devel] " James Harper
                                                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-15  7:20 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org,
	Frederik Thuysbaert

> >
> > Hi,
> >
> > > I just tested with tap2:aio and that worked (had an old image of the VM
> on
> > lvm still so just tested with that). Switching back to rbd and it crashes every
> > time, just as postgres is starting in the vm. Booting into single user mode,
> > waiting 30 seconds, then letting the boot continue it still crashes at the
> same
> > point so I think it's not a timing thing - maybe postgres has a disk access
> > pattern that is triggering the bug?
> >
> > Mmm, that's really interesting.
> >
> > Could you try to disable request merging ? Just give option
> > max_merge_size=0 in the tap2 disk description. Something like
> > 'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'
> >
> 
> Just as suddenly the problem went away and I can no longer reproduce the
> crash on startup. Very frustrating. Most likely it still crashed during heavy use
> but that can take days.
> 
> I've just upgraded librbd to dumpling (from cuttlefish) on that one server and
> will see what it's doing by morning. I'll disable merging when I can reproduce
> it next.
> 

I just had a crash since upgrading to dumpling, and will disable merging tonight.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-14 13:13                                     ` Sylvain Munaut
                                                         ` (4 preceding siblings ...)
  2013-08-16  1:02                                       ` James Harper
@ 2013-08-16  1:02                                       ` James Harper
  5 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-16  1:02 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: Frederik Thuysbaert, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org


> 
> I just had a crash since upgrading to dumpling, and will disable merging
> tonight.
> 

Still crashes with merging disabled.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-14 13:13                                     ` Sylvain Munaut
                                                         ` (3 preceding siblings ...)
  2013-08-15  7:20                                       ` [Xen-devel] " James Harper
@ 2013-08-16  1:02                                       ` James Harper
  2013-08-16  1:02                                       ` [Xen-devel] " James Harper
  5 siblings, 0 replies; 99+ messages in thread
From: James Harper @ 2013-08-16  1:02 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, xen-devel@lists.xen.org,
	Frederik Thuysbaert


> 
> I just had a crash since upgrading to dumpling, and will disable merging
> tonight.
> 

Still crashes with merging disabled.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-14 15:03                               ` Sylvain Munaut
  2013-08-16  8:27                                 ` Frederik Thuysbaert
@ 2013-08-16  8:27                                 ` Frederik Thuysbaert
  1 sibling, 0 replies; 99+ messages in thread
From: Frederik Thuysbaert @ 2013-08-16  8:27 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: James Harper, Pasi Kärkkäinen,
	ceph-devel@vger.kernel.org, xen-devel@lists.xen.org

Hi Sylvain,

>> I'm not quite sure what u mean, can u give some more information on how I do
>> this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not sure this
>> is what u meant.
>
> Yes, ./configure CFLAGS=-g LDFLAGS=-g  is a good start.
>
>...
>
> Then once you have a core file, you can use gdb along with the tapdisk
> executable to generate a meaningful backtrace of where the crash
>

I did 2 runs, with a cold reboot in between just to be sure. I don't 
think I'm getting a lot of valuable information, but I will post it 
anyway. The reason for the cold reboot was a 'Cannot access memory at 
address ...' in gdb after the first frame, I thought it could help.

Here's what I got:

try 1:
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
Cannot access memory at address 0x7fb42f081c38
(gdb) frame 0
#0  0x00007fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) list
77	}
78	
79	int
80	main(int argc, char *argv[])
81	{
82		char *control;
83		int c, err, nodaemon;
84		FILE *out;
85	
86		control  = NULL;
(gdb) info locals
No symbol table info available.

try 2:
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
Cannot access memory at address 0x7fe05c2ba518
(gdb) frame 0
#0  0x00007fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) list
77	}
78	
79	int
80	main(int argc, char *argv[])
81	{
82		char *control;
83		int c, err, nodaemon;
84		FILE *out;
85	
86		control  = NULL;
(gdb) info locals
No symbol table info available.

Regards,

- Frederik


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-08-14 15:03                               ` Sylvain Munaut
@ 2013-08-16  8:27                                 ` Frederik Thuysbaert
  2013-08-16  8:27                                 ` [Xen-devel] " Frederik Thuysbaert
  1 sibling, 0 replies; 99+ messages in thread
From: Frederik Thuysbaert @ 2013-08-16  8:27 UTC (permalink / raw)
  To: Sylvain Munaut
  Cc: ceph-devel@vger.kernel.org, James Harper, xen-devel@lists.xen.org

Hi Sylvain,

>> I'm not quite sure what u mean, can u give some more information on how I do
>> this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not sure this
>> is what u meant.
>
> Yes, ./configure CFLAGS=-g LDFLAGS=-g  is a good start.
>
>...
>
> Then once you have a core file, you can use gdb along with the tapdisk
> executable to generate a meaningful backtrace of where the crash
>

I did 2 runs, with a cold reboot in between just to be sure. I don't 
think I'm getting a lot of valuable information, but I will post it 
anyway. The reason for the cold reboot was a 'Cannot access memory at 
address ...' in gdb after the first frame, I thought it could help.

Here's what I got:

try 1:
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
Cannot access memory at address 0x7fb42f081c38
(gdb) frame 0
#0  0x00007fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) list
77	}
78	
79	int
80	main(int argc, char *argv[])
81	{
82		char *control;
83		int c, err, nodaemon;
84		FILE *out;
85	
86		control  = NULL;
(gdb) info locals
No symbol table info available.

try 2:
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
Cannot access memory at address 0x7fe05c2ba518
(gdb) frame 0
#0  0x00007fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) list
77	}
78	
79	int
80	main(int argc, char *argv[])
81	{
82		char *control;
83		int c, err, nodaemon;
84		FILE *out;
85	
86		control  = NULL;
(gdb) info locals
No symbol table info available.

Regards,

- Frederik

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-04-19 14:41   ` [Xen-devel] " Sylvain Munaut
@ 2013-11-29 11:05     ` James Harper
  2013-11-29 15:11       ` Sylvain Munaut
  0 siblings, 1 reply; 99+ messages in thread
From: James Harper @ 2013-11-29 11:05 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel@vger.kernel.org

Sylvain,

Are you still working on this in any way?

It's been working great for me but seems to use an excessive amount of memory, like 300MB per process. Is that expected?

Thanks

James

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Sylvain Munaut
> Sent: Saturday, 20 April 2013 12:41 AM
> To: Pasi Kärkkäinen
> Cc: ceph-devel@vger.kernel.org; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to
> test ? :p
> 
> > If you have time to write up some lines about steps required to test this,
> > that'd be nice, it'll help people to test this stuff.
> 
> To quickly test, I compiled the package and just replaced the tapdisk
> binary from my "normal" blktap install with the newly compiled one.
> 
> Then you need to setup a RBD image named 'test' in the default 'rbd'
> pool. You also need to setup a proper ceph.conf and keyring file on
> the client (since librbd will use those for the parameters). The
> keyring must contain the 'client.admin' key
> 
> Then in the config file, use something like
> "tap2:tapdisk:rbd:xxx,xvda1,w"  the 'xxx' part is currently ignored
> ...
> 
> 
> Cheers,
> 
>     Sylvain
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-11-29 11:05     ` James Harper
@ 2013-11-29 15:11       ` Sylvain Munaut
  2013-12-01  4:08         ` James Harper
  0 siblings, 1 reply; 99+ messages in thread
From: Sylvain Munaut @ 2013-11-29 15:11 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel@vger.kernel.org

Hi James,

> Are you still working on this in any way?

Well I'm using it, but I haven't worked on it. I never was able to
reproduce any issue with it locally ...
In prod, I do run it with cache disabled though since I never took the
time to check using the cache was safe in the various failure modes.

Is 300 MB normal ? Well, that probably depends on your settings (cache
enabled / size / ...). But in anycase I'd guess the memory comes from
a librbd itself. It's not like I do much allocation myself :p

Cheers,

   Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-11-29 15:11       ` Sylvain Munaut
@ 2013-12-01  4:08         ` James Harper
  2013-12-03 15:46           ` Sylvain Munaut
  0 siblings, 1 reply; 99+ messages in thread
From: James Harper @ 2013-12-01  4:08 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: ceph-devel@vger.kernel.org

> 
> Hi James,
> 
> > Are you still working on this in any way?
> 
> Well I'm using it, but I haven't worked on it. I never was able to
> reproduce any issue with it locally ...
> In prod, I do run it with cache disabled though since I never took the
> time to check using the cache was safe in the various failure modes.
> 
> Is 300 MB normal ? Well, that probably depends on your settings (cache
> enabled / size / ...). But in anycase I'd guess the memory comes from
> a librbd itself. It's not like I do much allocation myself :p
> 

What sort of memory are your instances using? I haven't turned on any caching so I assume it's disabled.

I increased the stack size to 8M to work around the crash I was having, but lowering that to 2MB doesn't have any significant impact on memory usage.

James

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
  2013-12-01  4:08         ` James Harper
@ 2013-12-03 15:46           ` Sylvain Munaut
  0 siblings, 0 replies; 99+ messages in thread
From: Sylvain Munaut @ 2013-12-03 15:46 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel@vger.kernel.org

Hi,

> What sort of memory are your instances using?

I just had a look. Around 120 Mb. Which indeed is a bit higher that I'd like.


> I haven't turned on any caching so I assume it's disabled.

Yes.


Cheers,

    Sylvain

^ permalink raw reply	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2013-12-03 15:46 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-18 15:05 Xen blktap driver for Ceph RBD : Anybody wants to test ? :p Sylvain Munaut
2013-04-18 19:35 ` Wido den Hollander
2013-04-19 14:37   ` Sylvain Munaut
2013-04-19 14:40     ` Bernard Grymonpon
2013-04-23 10:02       ` Sylvain Munaut
2013-04-23 14:56         ` Bernard Grymonpon
2013-04-23 15:06           ` Sylvain Munaut
2013-04-23 19:13             ` Bernard Grymonpon
2013-04-23 16:38         ` Nick Couchman
2013-04-23 18:51           ` Sylvain Munaut
2013-04-23 20:09             ` Nick Couchman
2013-04-26 13:07               ` Sylvain Munaut
2013-04-26 15:51                 ` Sage Weil
2013-04-26 17:10                   ` Sylvain Munaut
2013-04-19  6:45 ` Pasi Kärkkäinen
2013-04-19 14:41   ` [Xen-devel] " Sylvain Munaut
2013-11-29 11:05     ` James Harper
2013-11-29 15:11       ` Sylvain Munaut
2013-12-01  4:08         ` James Harper
2013-12-03 15:46           ` Sylvain Munaut
2013-04-19 14:41   ` Sylvain Munaut
2013-08-01  2:12   ` James Harper
2013-08-05  9:41     ` Sylvain Munaut
2013-08-05  9:41     ` [Xen-devel] " Sylvain Munaut
2013-08-05  9:45       ` James Harper
2013-08-05  9:45       ` [Xen-devel] " James Harper
2013-08-05 11:01         ` Sylvain Munaut
2013-08-05 11:03           ` James Harper
2013-08-05 11:03           ` [Xen-devel] " James Harper
2013-08-05 11:12           ` Pasi Kärkkäinen
2013-08-05 12:03             ` Sylvain Munaut
2013-08-05 12:03             ` [Xen-devel] " Sylvain Munaut
2013-08-05 13:35               ` George Dunlap
2013-08-05 13:55                 ` Sylvain Munaut
2013-08-05 14:04                   ` George Dunlap
2013-08-05 15:18                     ` Wei Liu
2013-08-05 15:20                       ` George Dunlap
2013-08-05 15:32                         ` Wei Liu
2013-08-05 15:32                         ` Wei Liu
2013-08-05 15:20                       ` George Dunlap
2013-08-05 15:18                     ` Wei Liu
2013-08-05 14:04                   ` George Dunlap
2013-08-05 13:55                 ` Sylvain Munaut
2013-08-05 13:35               ` George Dunlap
2013-08-05 11:01         ` Sylvain Munaut
2013-08-09  0:12       ` James Harper
2013-08-09  0:12       ` [Xen-devel] " James Harper
2013-08-09  9:21         ` Sylvain Munaut
2013-08-09  9:21         ` [Xen-devel] " Sylvain Munaut
2013-08-11  0:51           ` James Harper
2013-08-11  1:02             ` James Harper
2013-08-12 14:13               ` Sylvain Munaut
2013-08-12 14:13               ` [Xen-devel] " Sylvain Munaut
2013-08-12 23:26                 ` James Harper
2013-08-12 23:26                 ` [Xen-devel] " James Harper
2013-08-13  0:39                 ` James Harper
2013-08-13  0:39                 ` [Xen-devel] " James Harper
2013-08-13  8:32                   ` Sylvain Munaut
2013-08-13  8:32                   ` [Xen-devel] " Sylvain Munaut
2013-08-13  9:12                     ` James Harper
2013-08-13  9:12                     ` [Xen-devel] " James Harper
2013-08-13  9:20                       ` Sylvain Munaut
2013-08-13 14:59                         ` Frederik Thuysbaert
2013-08-13 14:59                         ` Frederik Thuysbaert
     [not found]                         ` <520A4945.1030907@gmail.com>
2013-08-13 15:39                           ` [Xen-devel] " Sylvain Munaut
2013-08-13 23:39                             ` James Harper
2013-08-13 23:39                             ` [Xen-devel] " James Harper
2013-08-13 23:43                               ` Sylvain Munaut
2013-08-13 23:51                                 ` James Harper
2013-08-13 23:59                                   ` James Harper
2013-08-14 13:13                                     ` Sylvain Munaut
2013-08-14 13:16                                       ` James Harper
2013-08-14 13:16                                       ` [Xen-devel] " James Harper
2013-08-15  7:20                                       ` James Harper
2013-08-15  7:20                                       ` [Xen-devel] " James Harper
2013-08-16  1:02                                       ` James Harper
2013-08-16  1:02                                       ` [Xen-devel] " James Harper
2013-08-14 13:13                                     ` Sylvain Munaut
2013-08-13 23:59                                   ` James Harper
2013-08-13 23:51                                 ` James Harper
2013-08-13 23:43                               ` Sylvain Munaut
2013-08-14  8:43                             ` [Xen-devel] " Frederik Thuysbaert
2013-08-14 15:03                               ` Sylvain Munaut
2013-08-16  8:27                                 ` Frederik Thuysbaert
2013-08-16  8:27                                 ` [Xen-devel] " Frederik Thuysbaert
2013-08-14 15:03                               ` Sylvain Munaut
2013-08-14  8:43                             ` Frederik Thuysbaert
2013-08-14  8:47                             ` Frederik Thuysbaert
2013-08-14  8:47                             ` [Xen-devel] " Frederik Thuysbaert
2013-08-13 15:39                           ` Sylvain Munaut
2013-08-13 21:47                         ` James Harper
2013-08-13 21:47                         ` [Xen-devel] " James Harper
2013-08-13  9:20                       ` Sylvain Munaut
2013-08-11  1:02             ` James Harper
2013-08-11  0:51           ` James Harper
  -- strict thread matches above, loose matches on Subject: below --
2013-06-21  7:21 [Xen-devel] " Nathan O'Sullivan
2013-06-21 11:21 ` Sylvain Munaut
2013-07-01  9:57   ` Sylvain Munaut
2013-07-02  3:32     ` Nathan O'Sullivan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.