ceph rbd crashes/stalls while random write 4k blocks

All of lore.kernel.org
 help / color / mirror / Atom feed

* ceph rbd crashes/stalls while random write 4k blocks
@ 2012-05-24 11:07 Stefan Priebe - Profihost AG
  2012-05-24 12:12 ` Florian Haas
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-24 11:07 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

Hi list,

i'm still testing ceph rbd with kvm. Right now i'm testing a rbd block
device within a network booted kvm.

Sequential write/reads and random reads are fine. No problems so far.

But when i trigger lots of 4k random writes all of them stall after
short time and i get 0 iops and 0 transfer.

used command:
fio --filename=/dev/vda --direct=1 --rw=randwrite --bs=4k --size=20G
--numjobs=50 --runtime=30 --group_reporting --name=file1

Then some time later i see this call trace:

INFO: task ceph-osd:3065 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ceph-osd        D ffff8803b0e61d88     0  3065      1 0x00000004
 ffff88032f3ab7f8 0000000000000086 ffff8803bffdac08 ffff880300000000
 ffff8803b0e61820 0000000000010800 ffff88032f3abfd8 ffff88032f3aa010
 ffff88032f3abfd8 0000000000010800 ffffffff81a0b020 ffff8803b0e61820
Call Trace:
 [<ffffffff815e0e1a>] schedule+0x3a/0x60
 [<ffffffff815e127d>] schedule_timeout+0x1fd/0x2e0
 [<ffffffff812696c4>] ? xfs_iext_bno_to_ext+0x84/0x160
 [<ffffffff81074db1>] ? down_trylock+0x31/0x50
 [<ffffffff812696c4>] ? xfs_iext_bno_to_ext+0x84/0x160
 [<ffffffff815e20b9>] __down+0x69/0xb0
 [<ffffffff8128c4a6>] ? _xfs_buf_find+0xf6/0x280
 [<ffffffff81074e6b>] down+0x3b/0x50
 [<ffffffff8128b7b0>] xfs_buf_lock+0x40/0xe0
 [<ffffffff8128c4a6>] _xfs_buf_find+0xf6/0x280
 [<ffffffff8128c689>] xfs_buf_get+0x59/0x190
 [<ffffffff8128ccf7>] xfs_buf_read+0x27/0x100
 [<ffffffff81282f97>] xfs_trans_read_buf+0x1e7/0x420
 [<ffffffff81239371>] xfs_read_agf+0x61/0x1a0
 [<ffffffff812394e4>] xfs_alloc_read_agf+0x34/0xd0
 [<ffffffff8123c877>] xfs_alloc_fix_freelist+0x3f7/0x470
 [<ffffffff81288005>] ? kmem_free+0x35/0x40
 [<ffffffff8127ff6e>] ? xfs_trans_free_item_desc+0x2e/0x30
 [<ffffffff812800a7>] ? xfs_trans_free_items+0x87/0xb0
 [<ffffffff8127cc73>] ? xfs_perag_get+0x33/0xb0
 [<ffffffff8123c97f>] ? xfs_free_extent+0x8f/0x120
 [<ffffffff8123c990>] xfs_free_extent+0xa0/0x120
 [<ffffffff81287f07>] ? kmem_zone_alloc+0x77/0xf0
 [<ffffffff81245ead>] xfs_bmap_finish+0x15d/0x1a0
 [<ffffffff8126d15e>] xfs_itruncate_finish+0x15e/0x340
 [<ffffffff81285495>] xfs_setattr+0x365/0x980
 [<ffffffff812926e6>] xfs_vn_setattr+0x16/0x20
 [<ffffffff8111e0ad>] notify_change+0x11d/0x300
 [<ffffffff81103ccc>] do_truncate+0x5c/0x90
 [<ffffffff8110ea35>] ? get_write_access+0x15/0x50
 [<ffffffff81103ef7>] sys_truncate+0x127/0x130
 [<ffffffff815e367b>] system_call_fastpath+0x16/0x1b
INFO: task flush-8:16:3089 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:16      D ffff8803af0d9d88     0  3089      2 0x00000000
 ffff88032e835940 0000000000000046 0000000100000fe0 ffff880300000000
 ffff8803af0d9820 0000000000010800 ffff88032e835fd8 ffff88032e834010
 ffff88032e835fd8 0000000000010800 ffff8803b0f7e080 ffff8803af0d9820
Call Trace:
 [<ffffffff810be570>] ? __lock_page+0x70/0x70
 [<ffffffff815e0e1a>] schedule+0x3a/0x60
 [<ffffffff815e0ec7>] io_schedule+0x87/0xd0
 [<ffffffff810be579>] sleep_on_page+0x9/0x10
 [<ffffffff815e1412>] __wait_on_bit_lock+0x52/0xb0
 [<ffffffff810be562>] __lock_page+0x62/0x70
 [<ffffffff8106fb80>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff810c8fd0>] ? pagevec_lookup_tag+0x20/0x30
 [<ffffffff810c7f66>] write_cache_pages+0x386/0x4d0
 [<ffffffff810c6c10>] ? set_page_dirty+0x70/0x70
 [<ffffffff810fd7ab>] ? kmem_cache_free+0x1b/0xe0
 [<ffffffff810c80fc>] generic_writepages+0x4c/0x70
 [<ffffffff81288bcf>] xfs_vm_writepages+0x4f/0x60
 [<ffffffff810c813c>] do_writepages+0x1c/0x40
 [<ffffffff81128854>] writeback_single_inode+0xf4/0x260
 [<ffffffff81128c45>] writeback_sb_inodes+0xe5/0x1b0
 [<ffffffff811290a8>] writeback_inodes_wb+0x98/0x160
 [<ffffffff81129ac3>] wb_writeback+0x2f3/0x460
 [<ffffffff815e089e>] ? __schedule+0x3ae/0x850
 [<ffffffff8105df47>] ? lock_timer_base+0x37/0x70
 [<ffffffff81129e4f>] wb_do_writeback+0x21f/0x270
 [<ffffffff81129f3a>] bdi_writeback_thread+0x9a/0x230
 [<ffffffff81129ea0>] ? wb_do_writeback+0x270/0x270
 [<ffffffff81129ea0>] ? wb_do_writeback+0x270/0x270
 [<ffffffff8106f646>] kthread+0x96/0xa0
 [<ffffffff815e46d4>] kernel_thread_helper+0x4/0x10
 [<ffffffff8106f5b0>] ? kthread_worker_fn+0x130/0x130
 [<ffffffff815e46d0>] ? gs_change+0xb/0xb

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph rbd crashes/stalls while random write 4k blocks
  2012-05-24 11:07 ceph rbd crashes/stalls while random write 4k blocks Stefan Priebe - Profihost AG
@ 2012-05-24 12:12 ` Florian Haas
  2012-05-24 14:09   ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Haas @ 2012-05-24 12:12 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org

Stefan,

On 05/24/12 13:07, Stefan Priebe - Profihost AG wrote:
> Hi list,
>
> i'm still testing ceph rbd with kvm. Right now i'm testing a rbd block
> device within a network booted kvm.
>
> Sequential write/reads and random reads are fine. No problems so far.
>
> But when i trigger lots of 4k random writes all of them stall after
> short time and i get 0 iops and 0 transfer.
>
> used command:
> fio --filename=/dev/vda --direct=1 --rw=randwrite --bs=4k --size=20G
> --numjobs=50 --runtime=30 --group_reporting --name=file1
>
> Then some time later i see this call trace:
>
> INFO: task ceph-osd:3065 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ceph-osd        D ffff8803b0e61d88     0  3065      1 0x00000004
>  ffff88032f3ab7f8 0000000000000086 ffff8803bffdac08 ffff880300000000
>  ffff8803b0e61820 0000000000010800 ffff88032f3abfd8 ffff88032f3aa010
>  ffff88032f3abfd8 0000000000010800 ffffffff81a0b020 ffff8803b0e61820
> Call Trace:
>  [<ffffffff815e0e1a>] schedule+0x3a/0x60
>  [<ffffffff815e127d>] schedule_timeout+0x1fd/0x2e0
>  [<ffffffff812696c4>] ? xfs_iext_bno_to_ext+0x84/0x160
>  [<ffffffff81074db1>] ? down_trylock+0x31/0x50
>  [<ffffffff812696c4>] ? xfs_iext_bno_to_ext+0x84/0x160
>  [<ffffffff815e20b9>] __down+0x69/0xb0
>  [<ffffffff8128c4a6>] ? _xfs_buf_find+0xf6/0x280
>  [<ffffffff81074e6b>] down+0x3b/0x50

sorry I'm coming a bit late to the various threads you've posted
recently, but on this particular issue: what kernel are your OSDs
running on, and do these hung tasks occur if you're using a local
filesystem other than XFS?

As of late XFS has occasionally been producing seemingly random kernel
hangs. Your call trace doesn't have the signature entries from xfssyncd
that identify a particular problem that I've been struggling with
lately, but you just might be affected by some other effect of the same
root issue.

Take a look at these to see if anything looks familiar:

http://oss.sgi.com/bugzilla/show_bug.cgi?id=922
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/979498
http://oss.sgi.com/archives/xfs/2011-11/msg00400.html

Not sure if this helps at all; just thought I might pitch that in.

Cheers,
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph rbd crashes/stalls while random write 4k blocks
  2012-05-24 12:12 ` Florian Haas
@ 2012-05-24 14:09   ` Stefan Priebe - Profihost AG
  2012-05-24 14:19     ` Florian Haas
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-24 14:09 UTC (permalink / raw)
  To: Florian Haas; +Cc: ceph-devel@vger.kernel.org

Am 24.05.2012 14:12, schrieb Florian Haas:
> Stefan,
> sorry I'm coming a bit late to the various threads you've posted
> recently, but on this particular issue: what kernel are your OSDs
> running on, and do these hung tasks occur if you're using a local
> filesystem other than XFS?

OSDs run 3.0.30 but i tried 3.3.7 too - no difference (regarding XFS
crash and random writes).

Just tried btrfs with 3.4 kernel and the posted patch from yesterday.

But with kernel 3.4 the performance is in general pretty low doesn't
matter if i use xfs or btrfs:

~# rados -p data bench 10 write -t 16
Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    0       0         0         0         0         0         -         0
    1      16        35        19   75.9824        76  0.294869  0.376607
    2      16        51        35   69.9844        64  0.103118  0.345375
    3      16        72        56    74.652        84  0.113909    0.5364
    4      16        88        72   71.9866        64  0.641818  0.786378
    5      16        95        79   63.1887        28  0.131084  0.737699
    6      16       113        97   64.6553        72  0.232688  0.851319
    7      16       129       113   64.5604        64   0.35199  0.822971
    8      16       148       132   65.9888        76   0.09892  0.739852
    9      16       149       133   59.1007         4  0.833541  0.740556
   10      16       157       141   56.3899        32  0.101306  0.715187
   11      16       157       141   51.2634         0         -  0.715187
   12      16       157       141   46.9914         0         -  0.715187
   13      16       157       141   43.3766         0         -  0.715187
   14      16       157       141   40.2782         0         -  0.715187
   15      16       157       141    37.593         0         -  0.715187
   16      16       157       141   35.2434         0         -  0.715187
Total time run:        16.471636
Total writes made:     158
Write size:            4194304
Bandwidth (MB/sec):    38.369

Average Latency:       1.66534
Max latency:           13.554
Min latency:           0.095194

> As of late XFS has occasionally been producing seemingly random kernel
> hangs. Your call trace doesn't have the signature entries from xfssyncd
> that identify a particular problem that I've been struggling with
> lately, but you just might be affected by some other effect of the same
> root issue.
>
> Take a look at these to see if anything looks familiar:
>
> http://oss.sgi.com/bugzilla/show_bug.cgi?id=922
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/979498
> http://oss.sgi.com/archives/xfs/2011-11/msg00400.html

These are solved by using 3.0.20.

Stefan




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph rbd crashes/stalls while random write 4k blocks
  2012-05-24 14:09   ` Stefan Priebe - Profihost AG
@ 2012-05-24 14:19     ` Florian Haas
  2012-05-25  6:47       ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Haas @ 2012-05-24 14:19 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org

On Thu, May 24, 2012 at 4:09 PM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
>> Take a look at these to see if anything looks familiar:
>>
>> http://oss.sgi.com/bugzilla/show_bug.cgi?id=922
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/979498
>> http://oss.sgi.com/archives/xfs/2011-11/msg00400.html
>
> These are solved by using 3.0.20.

... or so Christoph says, but comment #4 in bug 922 seems to indicate otherwise.

Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph rbd crashes/stalls while random write 4k blocks
  2012-05-24 14:19     ` Florian Haas
@ 2012-05-25  6:47       ` Stefan Priebe - Profihost AG
  2012-05-25  7:33         ` Florian Haas
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-25  6:47 UTC (permalink / raw)
  To: Florian Haas; +Cc: ceph-devel@vger.kernel.org

Am 24.05.2012 16:19, schrieb Florian Haas:
> On Thu, May 24, 2012 at 4:09 PM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>>> Take a look at these to see if anything looks familiar:
>>>
>>> http://oss.sgi.com/bugzilla/show_bug.cgi?id=922
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/979498
>>> http://oss.sgi.com/archives/xfs/2011-11/msg00400.html
>>
>> These are solved by using 3.0.20.
> 
> ... or so Christoph says, but comment #4 in bug 922 seems to indicate otherwise.

I'm sorry you're absolutely right. BUT XFS had some regressions with
xlog_grabt_log_space since 2.6.28 which was fixed in 3.0.X by reverting
back to a kernel thread instead of workers. I was working with Christoph
and Dave on this problem and it tooked be nearly a whole month to track
that down (git commit c7eead1e118fb7e34ee8f5063c3c090c054c3820). In this
case (#922) it seems it is really related to a too small log. But I
don't have a too small log in my ceph case ;-)

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph rbd crashes/stalls while random write 4k blocks
  2012-05-25  6:47       ` Stefan Priebe - Profihost AG
@ 2012-05-25  7:33         ` Florian Haas
  2012-05-25  7:35           ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Haas @ 2012-05-25  7:33 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org

On Fri, May 25, 2012 at 8:47 AM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
> Am 24.05.2012 16:19, schrieb Florian Haas:
>> On Thu, May 24, 2012 at 4:09 PM, Stefan Priebe - Profihost AG
>> <s.priebe@profihost.ag> wrote:
>>>> Take a look at these to see if anything looks familiar:
>>>>
>>>> http://oss.sgi.com/bugzilla/show_bug.cgi?id=922
>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/979498
>>>> http://oss.sgi.com/archives/xfs/2011-11/msg00400.html
>>>
>>> These are solved by using 3.0.20.
>>
>> ... or so Christoph says, but comment #4 in bug 922 seems to indicate otherwise.
>
> I'm sorry you're absolutely right. BUT XFS had some regressions with
> xlog_grabt_log_space since 2.6.28 which was fixed in 3.0.X by reverting
> back to a kernel thread instead of workers. I was working with Christoph
> and Dave on this problem and it tooked be nearly a whole month to track
> that down (git commit c7eead1e118fb7e34ee8f5063c3c090c054c3820). In this
> case (#922) it seems it is really related to a too small log. But I
> don't have a too small log in my ceph case ;-)

Hmmm. So what's Chinner saying about this one? Should we move this
discussion to an XFS list?

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph rbd crashes/stalls while random write 4k blocks
  2012-05-25  7:33         ` Florian Haas
@ 2012-05-25  7:35           ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-05-25  7:35 UTC (permalink / raw)
  To: Florian Haas; +Cc: ceph-devel@vger.kernel.org

Am 25.05.2012 09:33, schrieb Florian Haas:
> On Fri, May 25, 2012 at 8:47 AM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>> Am 24.05.2012 16:19, schrieb Florian Haas:
>>> On Thu, May 24, 2012 at 4:09 PM, Stefan Priebe - Profihost AG
>>> <s.priebe@profihost.ag> wrote:
>>>>> Take a look at these to see if anything looks familiar:
>>>>>
>>>>> http://oss.sgi.com/bugzilla/show_bug.cgi?id=922
>>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/979498
>>>>> http://oss.sgi.com/archives/xfs/2011-11/msg00400.html
>>>>
>>>> These are solved by using 3.0.20.
>>>
>>> ... or so Christoph says, but comment #4 in bug 922 seems to indicate otherwise.
>>
>> I'm sorry you're absolutely right. BUT XFS had some regressions with
>> xlog_grabt_log_space since 2.6.28 which was fixed in 3.0.X by reverting
>> back to a kernel thread instead of workers. I was working with Christoph
>> and Dave on this problem and it tooked be nearly a whole month to track
>> that down (git commit c7eead1e118fb7e34ee8f5063c3c090c054c3820). In this
>> case (#922) it seems it is really related to a too small log. But I
>> don't have a too small log in my ceph case ;-)
> 
> Hmmm. So what's Chinner saying about this one? Should we move this
> discussion to an XFS list?

I already send the trace to Christoph, Dave and the XFS List. Sadly no
reply.

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-05-25  7:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-24 11:07 ceph rbd crashes/stalls while random write 4k blocks Stefan Priebe - Profihost AG
2012-05-24 12:12 ` Florian Haas
2012-05-24 14:09   ` Stefan Priebe - Profihost AG
2012-05-24 14:19     ` Florian Haas
2012-05-25  6:47       ` Stefan Priebe - Profihost AG
2012-05-25  7:33         ` Florian Haas
2012-05-25  7:35           ` Stefan Priebe - Profihost AG

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.