From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fyodor Ustinov Subject: Trouble with rbd Date: Fri, 06 May 2011 00:33:22 +0300 Message-ID: <4DC317A2.6080806@ufm.su> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail.ufm.su ([77.120.103.19]:53550 "EHLO mail.ufm.su" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751331Ab1EEVdZ (ORCPT ); Thu, 5 May 2011 17:33:25 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.ufm.su (Postfix) with ESMTP id E3C0060059C for ; Fri, 6 May 2011 00:33:22 +0300 (EEST) Received: from mail.ufm.su ([127.0.0.1]) by localhost (mail.ufm.su [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Bm5d2Xq3KDTk for ; Fri, 6 May 2011 00:33:22 +0300 (EEST) Received: from [192.168.2.22] (unknown [94.232.181.14]) by mail.ufm.su (Postfix) with ESMTPSA id 9B95860059A for ; Fri, 6 May 2011 00:33:22 +0300 (EEST) Sender: ceph-devel-owner@vger.kernel.org List-ID: To: ceph-devel@vger.kernel.org Hi! Latest 64 bit kernel (2.6.39-rc6, but with latest ubuntu 2.6.38-8 - the same). Very similar to that rbd or libceph ruin the memory. 1. Make rbd. rbd create tmt --size 102400 2. Attach and make disk modprobe rbd echo "xx.xx.xx.xx name=admin rbd tmt" > /sys/bus/rbd/add mkfs.ext4 -E lazy_itable_init=0 /dev/rbd0 mount /dev/rbd0 /mnt 3. start test - start iozone + bonnie++ + rsync. root@stb1:/mnt# iozone -a -n4g -g20g Iozone: Performance Test of File I/O Version $Revision: 3.373 $ Compiled for 64 bit mode. Build: linux-AMD64 [...] Auto Mode Using minimum file size of 4194304 kilobytes. Using maximum file size of 20971520 kilobytes. Command line used: iozone -a -n4g -g20g Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 4194304 64 120955 66607 25500 28465 14384 89028 42840 5504993 84387 111439 124649 Error in file: Found ?170aba407000ffc? Expecting ?3a3a3a3a3a3a3a3a? addr 7fb85eb06000 Error in file: Position 3984547840 Record # 60799 Record size 64 kb where 7fb85eb06000 loop 24576 root@stb1:/mnt# umount /mnt root@stb1:~# fsck -f /dev/rbd0 fsck from util-linux-ng 2.17.2 e2fsck 1.41.14 (22-Dec-2010) Pass 1: Checking inodes, blocks, and sizes Inode 1850570 is in use, but has dtime set. Fix? /dev/rbd0: e2fsck canceled. root@stb1:~# So. No errors in logs on stb1 But format /dev/rbd0 by ocfs2 and the same test make nice show in syslog - very nicely fluttering out in all directions errors in all that you can imagine. Like that: May 5 22:36:57 stb1 kernel: [13683.907928] BUG: Bad page state in process iozone pfn:5ba7e May 5 22:36:57 stb1 kernel: [13683.907936] page:ffffea000140cb90 count:1 mapcount:0 mapping: (null) index:0xa0f63 May 5 22:36:57 stb1 kernel: [13683.907941] page flags: 0x100000000000000() May 5 22:36:57 stb1 kernel: [13683.907945] Pid: 2752, comm: iozone Not tainted 2.6.39-rc6-ufm #1 May 5 22:36:57 stb1 kernel: [13683.907947] Call Trace: May 5 22:36:57 stb1 kernel: [13683.907953] [] ? dump_page+0x9b/0xd0 May 5 22:36:57 stb1 kernel: [13683.907956] [] bad_page+0xc9/0x120 May 5 22:36:57 stb1 kernel: [13683.907959] [] get_page_from_freelist+0x6f8/0x7a0 May 5 22:36:57 stb1 kernel: [13683.907963] [] ? __find_get_block_slow.clone.11+0x44/0x140 May 5 22:36:57 stb1 kernel: [13683.907967] [] __alloc_pages_nodemask+0x118/0x8a0 May 5 22:36:57 stb1 kernel: [13683.907971] [] ? bit_waitqueue+0x17/0xd0 May 5 22:36:57 stb1 kernel: [13683.907973] [] ? wake_up_bit+0x2f/0x40 May 5 22:36:57 stb1 kernel: [13683.907976] [] ? do_get_write_access+0x2e4/0x490 May 5 22:36:57 stb1 kernel: [13683.907979] [] ? start_this_handle.clone.7+0x417/0x4a0 May 5 22:36:57 stb1 kernel: [13683.908000] [] ? ocfs2_inode_cache_unlock+0x12/0x20 [ocfs2] May 5 22:36:57 stb1 kernel: [13683.908004] [] alloc_pages_current+0xa5/0x110 May 5 22:36:57 stb1 kernel: [13683.908006] [] __page_cache_alloc+0x8f/0xa0 May 5 22:36:57 stb1 kernel: [13683.908009] [] find_or_create_page+0x4c/0xb0 May 5 22:36:57 stb1 kernel: [13683.908019] [] ocfs2_write_begin_nolock+0x15dd/0x1bf0 [ocfs2] May 5 22:36:57 stb1 kernel: [13683.908032] [] ? ocfs2_find_actor+0x140/0x140 [ocfs2] May 5 22:36:57 stb1 kernel: [13683.908043] [] ocfs2_write_begin+0xf6/0x210 [ocfs2] May 5 22:36:57 stb1 kernel: [13683.908046] [] generic_file_buffered_write+0x109/0x250 May 5 22:36:57 stb1 kernel: [13683.908059] [] ocfs2_file_aio_write+0x7c4/0x810 [ocfs2] May 5 22:36:57 stb1 kernel: [13683.908070] [] ? ocfs2_write_end_nolock+0x223/0x380 [ocfs2] May 5 22:36:57 stb1 kernel: [13683.908073] [] do_sync_write+0xd2/0x110 May 5 22:36:57 stb1 kernel: [13683.908077] [] ? apparmor_file_permission+0x18/0x20 May 5 22:36:57 stb1 kernel: [13683.908080] [] ? security_file_permission+0x2c/0xb0 May 5 22:36:57 stb1 kernel: [13683.908111] [] ? rw_verify_area+0x61/0xf0 May 5 22:36:57 stb1 kernel: [13683.908115] [] vfs_write+0xc6/0x180 May 5 22:36:57 stb1 kernel: [13683.908117] [] sys_write+0x51/0x90 May 5 22:36:57 stb1 kernel: [13683.908120] [] ? common_interrupt+0xe/0x13 May 5 22:36:57 stb1 kernel: [13683.908125] [] system_call_fastpath+0x16/0x1b May 5 22:37:39 stb1 kernel: [13726.005858] BUG: Bad page state in process swapper pfn:22eff May 5 22:37:39 stb1 kernel: [13726.005866] page:ffffea00007a47c8 count:0 mapcount:-127 mapping: (null) index:0xd9772 May 5 22:37:39 stb1 kernel: [13726.005871] page flags: 0x100000000000000() May 5 22:37:39 stb1 kernel: [13726.005882] Pid: 0, comm: swapper Tainted: G B 2.6.39-rc6-ufm #1 May 5 22:37:39 stb1 kernel: [13726.005884] Call Trace: May 5 22:37:39 stb1 kernel: [13726.005885] [] ? dump_page+0x9b/0xd0 May 5 22:37:39 stb1 kernel: [13726.005894] [] bad_page+0xc9/0x120 May 5 22:37:39 stb1 kernel: [13726.005897] [] free_pages_prepare+0xef/0x100 May 5 22:37:39 stb1 kernel: [13726.005900] [] free_hot_cold_page+0x49/0x470 May 5 22:37:39 stb1 kernel: [13726.005902] [] __put_single_page+0x20/0x30 May 5 22:37:39 stb1 kernel: [13726.005904] [] put_page+0x2d/0x40 May 5 22:37:39 stb1 kernel: [13726.005909] [] __pskb_trim_head+0xd1/0x120 May 5 22:37:39 stb1 kernel: [13726.005911] [] tcp_trim_head+0x70/0x120 May 5 22:37:39 stb1 kernel: [13726.005914] [] tcp_ack+0x547/0x1de0 May 5 22:37:39 stb1 kernel: [13726.005916] [] ? tcp_write_xmit+0x1d0/0x990 May 5 22:37:39 stb1 kernel: [13726.005920] [] ? kmem_cache_free+0x20/0x110 May 5 22:37:39 stb1 kernel: [13726.005923] [] tcp_rcv_established+0x44d/0x830 May 5 22:37:39 stb1 kernel: [13726.005926] [] tcp_v4_do_rcv+0x189/0x430 May 5 22:37:39 stb1 kernel: [13726.005928] [] ? tcp_v4_rcv+0x65c/0x900 May 5 22:37:39 stb1 kernel: [13726.005932] [] ? queue_work+0x1f/0x30 May 5 22:37:39 stb1 kernel: [13726.005935] [] tcp_v4_rcv+0x649/0x900 May 5 22:37:39 stb1 kernel: [13726.005939] [] ip_local_deliver_finish+0xdd/0x2a0 May 5 22:37:39 stb1 kernel: [13726.005941] [] ip_local_deliver+0x88/0x90 May 5 22:37:39 stb1 kernel: [13726.005944] [] ip_rcv_finish+0x141/0x390 May 5 22:37:39 stb1 kernel: [13726.005946] [] ip_rcv+0x21c/0x2e0 May 5 22:37:39 stb1 kernel: [13726.005950] [] __netif_receive_skb+0x55b/0x680 May 5 22:37:39 stb1 kernel: [13726.005953] [] netif_receive_skb+0x58/0x80 May 5 22:37:39 stb1 kernel: [13726.005955] [] napi_skb_finish+0x50/0x70 May 5 22:37:39 stb1 kernel: [13726.005957] [] napi_gro_receive+0xb5/0xc0 May 5 22:37:39 stb1 kernel: [13726.005976] [] e1000_receive_skb+0x5b/0x90 [e1000] May 5 22:37:39 stb1 kernel: [13726.005980] [] e1000_clean_rx_irq+0x25e/0x4b0 [e1000] May 5 22:37:39 stb1 kernel: [13726.005983] [] e1000_clean+0x229/0x620 [e1000] May 5 22:37:39 stb1 kernel: [13726.005986] [] ? dev_hard_start_xmit+0x2cd/0x6d0 May 5 22:37:39 stb1 kernel: [13726.005990] [] ? default_spin_lock_flags+0x9/0x10 May 5 22:37:39 stb1 kernel: [13726.005993] [] net_rx_action+0x128/0x270 May 5 22:37:39 stb1 kernel: [13726.005996] [] __do_softirq+0xa8/0x1c0 May 5 22:37:39 stb1 kernel: [13726.005999] [] ? ack_apic_level+0x72/0x1a0 May 5 22:37:39 stb1 kernel: [13726.006003] [] call_softirq+0x1c/0x30 May 5 22:37:39 stb1 kernel: [13726.006006] [] do_softirq+0x65/0xa0 May 5 22:37:39 stb1 kernel: [13726.006008] [] irq_exit+0x8e/0xa0 May 5 22:37:39 stb1 kernel: [13726.006010] [] do_IRQ+0x66/0xe0 May 5 22:37:39 stb1 kernel: [13726.006013] [] common_interrupt+0x13/0x13 May 5 22:37:39 stb1 kernel: [13726.006014] [] ? native_safe_halt+0xb/0x10 May 5 22:37:39 stb1 kernel: [13726.006020] [] default_idle+0x41/0xe0 May 5 22:37:39 stb1 kernel: [13726.006023] [] cpu_idle+0xa6/0xf0 May 5 22:37:39 stb1 kernel: [13726.006027] [] rest_init+0x75/0x80 May 5 22:37:39 stb1 kernel: [13726.006031] [] start_kernel+0x3e1/0x3ec May 5 22:37:39 stb1 kernel: [13726.006034] [] x86_64_start_reservations+0x132/0x136 May 5 22:37:39 stb1 kernel: [13726.006036] [] x86_64_start_kernel+0x101/0x110 May 5 22:37:39 stb1 kernel: [13726.049368] BUG: Bad page state in process rs:main Q:Reg pfn:7b6e2 May 5 22:37:39 stb1 kernel: [13726.049376] page:ffffea0001b00170 count:0 mapcount:-127 mapping: (null) index:0xda771 May 5 22:37:39 stb1 kernel: [13726.049381] page flags: 0x100000000000000() May 5 22:37:39 stb1 kernel: [13726.049385] Pid: 2683, comm: rs:main Q:Reg Tainted: G B 2.6.39-rc6-ufm #1 May 5 22:37:39 stb1 kernel: [13726.049387] Call Trace: May 5 22:37:39 stb1 kernel: [13726.049388] [] ? dump_page+0x9b/0xd0 May 5 22:37:39 stb1 kernel: [13726.049397] [] bad_page+0xc9/0x120 May 5 22:37:39 stb1 kernel: [13726.049399] [] free_pages_prepare+0xef/0x100 May 5 22:37:39 stb1 kernel: [13726.049403] [] free_hot_cold_page+0x49/0x470 May 5 22:37:39 stb1 kernel: [13726.049405] [] __put_single_page+0x20/0x30 May 5 22:37:39 stb1 kernel: [13726.049407] [] put_page+0x2d/0x40 May 5 22:37:39 stb1 kernel: [13726.049411] [] skb_release_data+0xb4/0xe0 May 5 22:37:39 stb1 kernel: [13726.049413] [] __kfree_skb+0x1e/0xa0 May 5 22:37:39 stb1 kernel: [13726.049417] [] tcp_ack+0x444/0x1de0 May 5 22:37:39 stb1 kernel: [13726.049420] [] ? tcp_write_xmit+0x76/0x990 May 5 22:37:39 stb1 kernel: [13726.049429] [] ? put_osd_con+0x11/0x20 [libceph] May 5 22:37:39 stb1 kernel: [13726.049431] [] tcp_rcv_established+0x44d/0x830 May 5 22:37:39 stb1 kernel: [13726.049435] [] tcp_v4_do_rcv+0x189/0x430 May 5 22:37:39 stb1 kernel: [13726.049438] [] ? tcp_v4_rcv+0x65c/0x900 May 5 22:37:39 stb1 kernel: [13726.049441] [] tcp_v4_rcv+0x649/0x900 May 5 22:37:39 stb1 kernel: [13726.049445] [] ip_local_deliver_finish+0xdd/0x2a0 May 5 22:37:39 stb1 kernel: [13726.049448] [] ip_local_deliver+0x88/0x90 May 5 22:37:39 stb1 kernel: [13726.049450] [] ip_rcv_finish+0x141/0x390 May 5 22:37:39 stb1 kernel: [13726.049452] [] ip_rcv+0x21c/0x2e0 May 5 22:37:39 stb1 kernel: [13726.049456] [] __netif_receive_skb+0x55b/0x680 May 5 22:37:39 stb1 kernel: [13726.049459] [] netif_receive_skb+0x58/0x80 May 5 22:37:39 stb1 kernel: [13726.049461] [] napi_skb_finish+0x50/0x70 May 5 22:37:39 stb1 kernel: [13726.049463] [] napi_gro_receive+0xb5/0xc0 May 5 22:37:39 stb1 kernel: [13726.049470] [] e1000_receive_skb+0x5b/0x90 [e1000] May 5 22:37:39 stb1 kernel: [13726.049474] [] e1000_clean_rx_irq+0x25e/0x4b0 [e1000] May 5 22:37:39 stb1 kernel: [13726.049477] [] e1000_clean+0x229/0x620 [e1000] May 5 22:37:39 stb1 kernel: [13726.049482] [] ? update_shares+0xc2/0x110 May 5 22:37:39 stb1 kernel: [13726.049484] [] net_rx_action+0x128/0x270 May 5 22:37:39 stb1 kernel: [13726.049488] [] __do_softirq+0xa8/0x1c0 May 5 22:37:39 stb1 kernel: [13726.049491] [] ? ack_apic_level+0x72/0x1a0 May 5 22:37:39 stb1 kernel: [13726.049495] [] call_softirq+0x1c/0x30 May 5 22:37:39 stb1 kernel: [13726.049497] [] do_softirq+0x65/0xa0 May 5 22:37:39 stb1 kernel: [13726.049500] [] irq_exit+0x8e/0xa0 May 5 22:37:39 stb1 kernel: [13726.049502] [] do_IRQ+0x66/0xe0 May 5 22:37:39 stb1 kernel: [13726.049505] [] common_interrupt+0x13/0x13 May 5 22:37:39 stb1 kernel: [13726.049506] [] ? ext4_mark_iloc_dirty+0xae/0x5c0 May 5 22:37:39 stb1 kernel: [13726.049512] [] ? ext4_mark_iloc_dirty+0x88/0x5c0 May 5 22:37:39 stb1 kernel: [13726.049515] [] ? ext4_dirty_inode+0x3d/0x60 May 5 22:37:39 stb1 kernel: [13726.049517] [] ext4_mark_inode_dirty+0x81/0x210 May 5 22:37:39 stb1 kernel: [13726.049521] [] ? ext4_journal_start_sb+0x69/0x170 May 5 22:37:39 stb1 kernel: [13726.049525] [] ? __block_commit_write.clone.12+0x8e/0xd0 May 5 22:37:39 stb1 kernel: [13726.049528] [] ext4_dirty_inode+0x3d/0x60 May 5 22:37:39 stb1 kernel: [13726.049530] [] __mark_inode_dirty+0x40/0x210 May 5 22:37:39 stb1 kernel: [13726.049533] [] generic_write_end+0x6b/0xa0 May 5 22:37:39 stb1 kernel: [13726.049535] [] ext4_da_write_end+0xdb/0x340 May 5 22:37:39 stb1 kernel: [13726.049538] [] generic_file_buffered_write+0x17b/0x250 May 5 22:37:39 stb1 kernel: [13726.049541] [] ? current_fs_time+0x16/0x60 May 5 22:37:39 stb1 kernel: [13726.049544] [] __generic_file_aio_write+0x229/0x440 May 5 22:37:39 stb1 kernel: [13726.049546] [] generic_file_aio_write+0x6f/0xe0 May 5 22:37:39 stb1 kernel: [13726.049549] [] ext4_file_write+0x69/0x280 May 5 22:37:39 stb1 kernel: [13726.049553] [] do_sync_write+0xd2/0x110 [...] and so on. WBR, Fyodor.