* pg scrub check problem @ 2015-10-28 9:39 changtao381 2015-10-28 10:34 ` 池信泽 2015-10-29 0:13 ` Sage Weil 0 siblings, 2 replies; 7+ messages in thread From: changtao381 @ 2015-10-28 9:39 UTC (permalink / raw) To: 'ceph-devel' Hi, I’m testing the deep-scrub function of ceph. And the test steps are below : 1) I put an object on ceph using command : rados put test.txt test.txt –p testpool The size of testpool is 3, so there three replicates on three osds: osd.0: /data1/ceph_data/osd.0/current/1.0_head/test.txt__head_8B0B6108__1 osd.1: /data2/ceph_data/osd.1/current/1.0_head/test.txt__head_8B0B6108__1 osd.2 /data3/ceph_data/osd.2/current/1.0_head/test.txt__head_8B0B6108__1 2) I modified the content of one replica on osd.0 using vim editor directly on disk 3) I run the command ceph pg deep-scrub 1.0 and expect it can check the inconsistent error out, but it fails. It doesn’t find the error why? Any suggestions will be appreciated! Thanks -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: pg scrub check problem 2015-10-28 9:39 pg scrub check problem changtao381 @ 2015-10-28 10:34 ` 池信泽 2015-10-28 19:27 ` David Zafman 2015-10-29 0:13 ` Sage Weil 1 sibling, 1 reply; 7+ messages in thread From: 池信泽 @ 2015-10-28 10:34 UTC (permalink / raw) To: changtao381; +Cc: ceph-devel Are you sure the osd begin to scrub? maybe you could check it from osd log, or using 'ceph pg dump' to check whether the scrub stamp changes or not. Because there is some strategy which would reject the scrub command Such as the system load , osd_scrub_min_interval, osd_deep_scrub_interval and so on 2015-10-28 17:39 GMT+08:00 changtao381 <changtao381@163.com>: > Hi, > > I’m testing the deep-scrub function of ceph. And the test steps are below : > > 1) I put an object on ceph using command : > rados put test.txt test.txt –p testpool > > The size of testpool is 3, so there three replicates on three osds: > > osd.0: /data1/ceph_data/osd.0/current/1.0_head/test.txt__head_8B0B6108__1 > osd.1: /data2/ceph_data/osd.1/current/1.0_head/test.txt__head_8B0B6108__1 > osd.2 /data3/ceph_data/osd.2/current/1.0_head/test.txt__head_8B0B6108__1 > > 2) I modified the content of one replica on osd.0 using vim editor directly on disk > > 3) I run the command > ceph pg deep-scrub 1.0 > > and expect it can check the inconsistent error out, but it fails. It doesn’t find the error > why? > > Any suggestions will be appreciated! Thanks > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards, xinze -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: pg scrub check problem 2015-10-28 10:34 ` 池信泽 @ 2015-10-28 19:27 ` David Zafman 0 siblings, 0 replies; 7+ messages in thread From: David Zafman @ 2015-10-28 19:27 UTC (permalink / raw) To: 池信泽, changtao381; +Cc: ceph-devel Initiating a manual deep-scrub like you are doing should always run. The command you are running doesn't report any information it just initiates a background process. If you follow the command with ceph -w you'll see what is happening: After I corrupted one of my replicas I see this. $ ceph pg deep-scrub 1.6; ceph -w instructing pg 1.6 on osd.3 to deep-scrub cluster 8528c83b-0ff9-479c-af76-fc0ac5c595d3 health HEALTH_OK monmap e1: 1 mons at {a=127.0.0.1:6789/0} election epoch 2, quorum 0 a osdmap e14: 4 osds: 4 up, 4 in flags sortbitwise pgmap v29: 16 pgs, 2 pools, 1130 bytes data, 1 objects 83917 MB used, 30311 MB / 117 GB avail 16 active+clean 2015-10-28 12:23:17.724011 mon.0 [INF] from='client.? 127.0.0.1:0/3672629479' entity='client.admin' cmd=[{"prefix": "pg deep-scrub", "pgid": "1.6"}]: dispatch 2015-10-28 12:23:19.787756 mon.0 [INF] pgmap v30: 16 pgs: 1 active+clean+inconsistent, 15 active+clean; 1130 bytes data, 83917 MB used, 30310 MB / 117 GB avail 2015-10-28 12:23:18.274239 osd.3 [INF] 1.6 deep-scrub starts 2015-10-28 12:23:18.277332 osd.3 [ERR] 1.6 shard 2: soid 1/7fc1f406/foo/head data_digest 0xe84d3cdc != known data_digest 0x74d68469 from auth shard 0, size 7 != known size 1130 2015-10-28 12:23:18.277546 osd.3 [ERR] 1.6 deep-scrub 0 missing, 1 inconsistent objects 2015-10-28 12:23:18.277549 osd.3 [ERR] 1.6 deep-scrub 1 errors ^C David On 10/28/15 3:34 AM, 池信泽 wrote: > Are you sure the osd begin to scrub? maybe you could check it from osd > log, or using 'ceph pg dump' to > check whether the scrub stamp changes or not. > Because there is some strategy which would reject the scrub command > Such as the system load , osd_scrub_min_interval, > osd_deep_scrub_interval and so on > > 2015-10-28 17:39 GMT+08:00 changtao381 <changtao381@163.com>: >> Hi, >> >> I’m testing the deep-scrub function of ceph. And the test steps are below : >> >> 1) I put an object on ceph using command : >> rados put test.txt test.txt –p testpool >> >> The size of testpool is 3, so there three replicates on three osds: >> >> osd.0: /data1/ceph_data/osd.0/current/1.0_head/test.txt__head_8B0B6108__1 >> osd.1: /data2/ceph_data/osd.1/current/1.0_head/test.txt__head_8B0B6108__1 >> osd.2 /data3/ceph_data/osd.2/current/1.0_head/test.txt__head_8B0B6108__1 >> >> 2) I modified the content of one replica on osd.0 using vim editor directly on disk >> >> 3) I run the command >> ceph pg deep-scrub 1.0 >> >> and expect it can check the inconsistent error out, but it fails. It doesn’t find the error >> why? >> >> Any suggestions will be appreciated! Thanks >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: pg scrub check problem 2015-10-28 9:39 pg scrub check problem changtao381 2015-10-28 10:34 ` 池信泽 @ 2015-10-29 0:13 ` Sage Weil 2015-10-29 0:26 ` David Zafman 1 sibling, 1 reply; 7+ messages in thread From: Sage Weil @ 2015-10-29 0:13 UTC (permalink / raw) To: changtao381; +Cc: 'ceph-devel' On Wed, 28 Oct 2015, changtao381 wrote: > Hi, > > I?m testing the deep-scrub function of ceph. And the test steps are below : > > 1) I put an object on ceph using command : > rados put test.txt test.txt ?p testpool > > The size of testpool is 3, so there three replicates on three osds: > > osd.0: /data1/ceph_data/osd.0/current/1.0_head/test.txt__head_8B0B6108__1 > osd.1: /data2/ceph_data/osd.1/current/1.0_head/test.txt__head_8B0B6108__1 > osd.2 /data3/ceph_data/osd.2/current/1.0_head/test.txt__head_8B0B6108__1 > > 2) I modified the content of one replica on osd.0 using vim editor directly on disk > > 3) I run the command > ?ceph pg deep-scrub 1.0 > > and expect it can check the inconsistent error out, but it fails. It doesn?t find the error > why? Becuse you *just* wrote the object, and the FileStore caches open file handles. Vim renames a new inode over the old one so the open inode is untouched. If you restart the osd and then scrub you'll see the error. sage ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: pg scrub check problem 2015-10-29 0:13 ` Sage Weil @ 2015-10-29 0:26 ` David Zafman 2015-10-29 0:42 ` 池信泽 2015-10-31 16:59 ` Ning Yao 0 siblings, 2 replies; 7+ messages in thread From: David Zafman @ 2015-10-29 0:26 UTC (permalink / raw) To: Sage Weil, changtao381; +Cc: 'ceph-devel' Good point. In my previous response I did "echo garbage > ........./foo__head_7FC1F406__1" to corrupt a replica. David On 10/28/15 5:13 PM, Sage Weil wrote: > Becuse you *just* wrote the object, and the FileStore caches open file > handles. Vim renames a new inode over the old one so the open inode is > untouched. > > If you restart the osd and then scrub you'll see the error. > > sage > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: pg scrub check problem 2015-10-29 0:26 ` David Zafman @ 2015-10-29 0:42 ` 池信泽 2015-10-31 16:59 ` Ning Yao 1 sibling, 0 replies; 7+ messages in thread From: 池信泽 @ 2015-10-29 0:42 UTC (permalink / raw) To: David Zafman; +Cc: Sage Weil, changtao381, ceph-devel Yes, I think we should also the scrub_interval. OSD::sched_scrub() { if ((double)diff < cct->_conf->osd_scrub_min_interval) { dout(10) << "sched_scrub " << pgid << " at " << t << ": " << (double)diff << " < min (" << cct->_conf->osd_scrub_min_interval << " seconds)" << dendl; break; } if ((double)diff < cct->_conf->osd_scrub_max_interval && !load_is_low) { // save ourselves some effort dout(10) << "sched_scrub " << pgid << " high load at " << t << ": " << (double)diff << " < max (" << cct->_conf->osd_scrub_max_interval << " seconds)" << dendl; break; } } 2015-10-29 8:26 GMT+08:00 David Zafman <dzafman@redhat.com>: > > Good point. In my previous response I did "echo garbage > > ........./foo__head_7FC1F406__1" to corrupt a replica. > > David > > > On 10/28/15 5:13 PM, Sage Weil wrote: >> >> Becuse you *just* wrote the object, and the FileStore caches open file >> handles. Vim renames a new inode over the old one so the open inode is >> untouched. >> >> If you restart the osd and then scrub you'll see the error. >> >> sage >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards, xinze ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: pg scrub check problem 2015-10-29 0:26 ` David Zafman 2015-10-29 0:42 ` 池信泽 @ 2015-10-31 16:59 ` Ning Yao 1 sibling, 0 replies; 7+ messages in thread From: Ning Yao @ 2015-10-31 16:59 UTC (permalink / raw) To: David Zafman; +Cc: Sage Weil, changtao381, ceph-devel > Good point. In my previous response I did "echo garbage > > ........./foo__head_7FC1F406__1" to corrupt a replica. I think this may just happen when mixing O_DIRECT and buffer_io, which may just happen in Newstore. Or, inode content changes such as FileStore write " ........./foo__head_7FC1F406__1", then rm -f ........./foo__head_7FC1F406__1 and echo garbage > ........./foo__head_7FC1F406__1 Or, you just write 4bytes in object ........./foo__head_7FC1F406__1 and echo > 4bytes data to " ........./foo__head_7FC1F406__1", which will lead to extent changed. If the extent is already allocated and echo new content, you can read out the modified data immediately ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-10-31 16:59 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-10-28 9:39 pg scrub check problem changtao381 2015-10-28 10:34 ` 池信泽 2015-10-28 19:27 ` David Zafman 2015-10-29 0:13 ` Sage Weil 2015-10-29 0:26 ` David Zafman 2015-10-29 0:42 ` 池信泽 2015-10-31 16:59 ` Ning Yao
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.