* tracking down failed eviction
@ 2015-08-11 11:32 Loic Dachary
2015-08-11 14:52 ` Loic Dachary
0 siblings, 1 reply; 4+ messages in thread
From: Loic Dachary @ 2015-08-11 11:32 UTC (permalink / raw)
To: Shylesh Kumar; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 10853 bytes --]
Hi Shylesh,
I ran the following on a hammer compiled from sources.
./stop.sh
rm -fr out dev ; MON=1 OSD=3 ./vstart.sh -X -d -n -l mon osd
ceph osd pool create slow 1 1
ceph osd pool create fast 1 1
ceph osd tier add slow fast
ceph osd tier cache-mode fast writeback
ceph osd tier set-overlay slow fast
ceph osd pool set fast hit_set_type bloom
ceph osd pool set fast cache_target_dirty_ratio 0.05
ceph osd pool set fast cache_min_flush_age 300
rados -p slow put obj1 /etc/group
ceph osd pool set fast cache_target_dirty_ratio .5
ceph osd pool set fast cache_target_full_ratio .8
ceph osd pool set fast target_max_objects 1
rados -p slow put obj2 /etc/group
rados -p slow put obj3 /etc/group
ceph osd pool set fast hit_set_count 1
ceph osd pool set fast hit_set_period 5
ceph osd pool set fast hit_set_period 600
sleep 30
ceph df
ceph health detail
and the output shows it works (see below). I'll try to figure out the difference with your cluster now that I have a baseline that does the right thing :-) If you can think of something else you did that may explain the failure, please let me know.
$ bash -x /tmp/bug.sh
+ ./stop.sh
+ rm -fr out dev
+ MON=1
+ OSD=3
+ ./vstart.sh -X -d -n -l mon osd
** going verbose **
ip 127.0.0.1
port
NOTE: hostname resolves to loopback; remote hosts will not be able to
connect. either adjust /etc/hosts, or edit this script to use your
machine's real IP.
creating /home/loic/software/ceph/ceph/src/keyring
./monmaptool --create --clobber --add a 127.0.0.1:6789 --print /tmp/ceph_monmap.24142
./monmaptool: monmap file /tmp/ceph_monmap.24142
./monmaptool: generated fsid 1d4a5e92-e47c-4f79-9390-07614b8d261a
epoch 0
fsid 1d4a5e92-e47c-4f79-9390-07614b8d261a
last_changed 2015-08-11 13:26:52.958438
created 2015-08-11 13:26:52.958438
0: 127.0.0.1:6789/0 mon.a
./monmaptool: writing epoch 0 to /tmp/ceph_monmap.24142 (1 monitors)
rm -rf /home/loic/software/ceph/ceph/src/dev/mon.a
mkdir -p /home/loic/software/ceph/ceph/src/dev/mon.a
./ceph-mon --mkfs -c /home/loic/software/ceph/ceph/src/ceph.conf -i a --monmap=/tmp/ceph_monmap.24142 --keyring=/home/loic/software/ceph/ceph/src/keyring
./ceph-mon: set fsid to 87413c55-5249-4969-a8fe-f2768e5c59ce
./ceph-mon: created monfs at /home/loic/software/ceph/ceph/src/dev/mon.a for mon.a
./ceph-mon -i a -c /home/loic/software/ceph/ceph/src/ceph.conf
ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd0/*'
add osd0 8c8ac2c0-2426-42dd-ac43-c7b1a54a14bc
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
0
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
add item id 0 name 'osd.0' weight 1 at location {host=fold,root=default} to crush map
2015-08-11 13:26:55.473416 7ff1bf4a07c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2015-08-11 13:26:56.462509 7ff1bf4a07c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2015-08-11 13:26:56.480075 7ff1bf4a07c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2015-08-11 13:26:57.030312 7ff1bf4a07c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd0 journal /home/loic/software/ceph/ceph/src/dev/osd0.journal for osd.0 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
2015-08-11 13:26:57.030372 7ff1bf4a07c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd0/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd0/keyring: (2) No such file or directory
2015-08-11 13:26:57.030471 7ff1bf4a07c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd0/keyring
adding osd0 key to auth repository
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
added key for osd.0
start osd0
./ceph-osd -i 0 -c /home/loic/software/ceph/ceph/src/ceph.conf
starting osd.0 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd0 /home/loic/software/ceph/ceph/src/dev/osd0.journal
ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd1/*'
add osd1 a57e7891-c8ca-4a59-92fa-6f65f681a560
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
1
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
add item id 1 name 'osd.1' weight 1 at location {host=fold,root=default} to crush map
2015-08-11 13:27:00.350457 7fc4aa2507c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2015-08-11 13:27:01.104445 7fc4aa2507c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2015-08-11 13:27:01.117456 7fc4aa2507c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd1) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2015-08-11 13:27:01.826145 7fc4aa2507c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd1 journal /home/loic/software/ceph/ceph/src/dev/osd1.journal for osd.1 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
2015-08-11 13:27:01.826194 7fc4aa2507c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd1/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd1/keyring: (2) No such file or directory
2015-08-11 13:27:01.826277 7fc4aa2507c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd1/keyring
adding osd1 key to auth repository
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
added key for osd.1
start osd1
./ceph-osd -i 1 -c /home/loic/software/ceph/ceph/src/ceph.conf
starting osd.1 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd1 /home/loic/software/ceph/ceph/src/dev/osd1.journal
ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd2/*'
add osd2 f88f90e0-5796-479c-ac2b-9406f91e55cd
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
add item id 2 name 'osd.2' weight 1 at location {host=fold,root=default} to crush map
2015-08-11 13:27:05.397800 7f4462e1b7c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2015-08-11 13:27:06.808009 7f4462e1b7c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2015-08-11 13:27:06.827479 7f4462e1b7c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd2) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2015-08-11 13:27:08.451691 7f4462e1b7c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd2 journal /home/loic/software/ceph/ceph/src/dev/osd2.journal for osd.2 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
2015-08-11 13:27:08.451731 7f4462e1b7c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd2/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd2/keyring: (2) No such file or directory
2015-08-11 13:27:08.454963 7f4462e1b7c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd2/keyring
adding osd2 key to auth repository
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
added key for osd.2
start osd2
./ceph-osd -i 2 -c /home/loic/software/ceph/ceph/src/ceph.conf
starting osd.2 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd2 /home/loic/software/ceph/ceph/src/dev/osd2.journal
started. stop.sh to stop. see out/* (e.g. 'tail -f out/????') for debug output.
export PYTHONPATH=./pybind
export LD_LIBRARY_PATH=.libs
+ ceph osd pool create slow 1 1
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
pool 'slow' created
+ ceph osd pool create fast 1 1
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
pool 'fast' created
+ ceph osd tier add slow fast
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
pool 'fast' is now (or already was) a tier of 'slow'
+ ceph osd tier cache-mode fast writeback
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set cache-mode for pool 'fast' to writeback
+ ceph osd tier set-overlay slow fast
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
overlay for 'slow' is now (or already was) 'fast'
+ ceph osd pool set fast hit_set_type bloom
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set pool 2 hit_set_type to bloom
+ ceph osd pool set fast cache_target_dirty_ratio 0.05
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set pool 2 cache_target_dirty_ratio to 0.05
+ ceph osd pool set fast cache_min_flush_age 300
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set pool 2 cache_min_flush_age to 300
+ rados -p slow put obj1 /etc/group
+ ceph osd pool set fast cache_target_dirty_ratio .5
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set pool 2 cache_target_dirty_ratio to .5
+ ceph osd pool set fast cache_target_full_ratio .8
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set pool 2 cache_target_full_ratio to .8
+ ceph osd pool set fast target_max_objects 1
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set pool 2 target_max_objects to 1
+ rados -p slow put obj2 /etc/group
+ rados -p slow put obj3 /etc/group
+ ceph osd pool set fast hit_set_count 1
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set pool 2 hit_set_count to 1
+ ceph osd pool set fast hit_set_period 5
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set pool 2 hit_set_period to 5
+ ceph osd pool set fast hit_set_period 600
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
set pool 2 hit_set_period to 600
+ sleep 30
+ ceph df
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
547G 56407M 491G 89.92
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd 0 0 0 18801M 0
slow 1 3834 0 18801M 3
fast 2 0 0 18801M 0
+ ceph health detail
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
HEALTH_WARN 3 near full osd(s); mon.a low disk space
osd.0 is near full at 89%
osd.1 is near full at 89%
osd.2 is near full at 89%
mon.a low disk space -- 10% avail
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: tracking down failed eviction
2015-08-11 11:32 tracking down failed eviction Loic Dachary
@ 2015-08-11 14:52 ` Loic Dachary
2015-08-11 14:58 ` Loic Dachary
0 siblings, 1 reply; 4+ messages in thread
From: Loic Dachary @ 2015-08-11 14:52 UTC (permalink / raw)
To: Shylesh Kumar; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 14668 bytes --]
Hi,
In your cluster there is:
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
8334G 8333G 1280M 0.02
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd 0 0 0 2777G 0
mypool 1 135 0 2777G 2
sample 2 0 0 1851G 0
slow 3 295M 0 1851G 84
fast 4 498 0 1851G 6
After running
rados -p fast cache-flush-evict-all
Then I do the following:
rados -p slow put loic1 /etc/group
ceph df
fast 4 1371 0 1851G 8
I'm assuming the extra object (6 objects + 1 for loic1 + 1 other ) is an internal hit set.
rados -p slow put loic2 /etc/group
ceph df
fast 4 1455 0 1851G 9
There I have 2 objects in the fast pool. I sleep 120 seconds and I see it going back to 8
ceph df
fast 4 666 0 1851G 8
Which is consistent with the settings of the pool that only allow for one object to stay in the fast pool as reported with ceph report:
{
"pool": 4,
"pool_name": "fast",
"flags": 9,
"flags_names": "hashpspool,incomplete_clones",
"type": 1,
"size": 3,
"min_size": 2,
"crush_ruleset": 1,
"object_hash": 2,
"pg_num": 128,
"pg_placement_num": 128,
"crash_replay_interval": 0,
"last_change": "79",
"last_force_op_resend": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 0,
"snap_epoch": 67,
"pool_snaps": [],
"removed_snaps": "[]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": 3,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "writeback",
"target_max_bytes": 1000,
"target_max_objects": 1,
"cache_target_dirty_ratio_micro": 500000,
"cache_target_full_ratio_micro": 800000,
"cache_min_flush_age": 300,
"cache_min_evict_age": 0,
"erasure_code_profile": "cache",
"hit_set_params": {
"type": "bloom",
"false_positive_probability": 0.050000,
"target_size": 0,
"seed": 0
},
"hit_set_period": 600,
"hit_set_count": 1,
"min_read_recency_for_promote": 0,
"stripe_width": 0,
"expected_num_objects": 0
}
I'm not entirely sure the explanation for the remaining 6 objects (i.e. them being internal objects used to store the hit set) is right. Let say it is right, is there more unexplained behavior ?
Cheers
For the record: https://github.com/ceph/ceph/blob/hammer/src/osd/ReplicatedPG.cc#L10494 bool ReplicatedPG::agent_work
On 11/08/2015 13:32, Loic Dachary wrote:
> Hi Shylesh,
>
>
> I ran the following on a hammer compiled from sources.
>
> ./stop.sh
> rm -fr out dev ; MON=1 OSD=3 ./vstart.sh -X -d -n -l mon osd
>
> ceph osd pool create slow 1 1
> ceph osd pool create fast 1 1
> ceph osd tier add slow fast
> ceph osd tier cache-mode fast writeback
> ceph osd tier set-overlay slow fast
> ceph osd pool set fast hit_set_type bloom
> ceph osd pool set fast cache_target_dirty_ratio 0.05
> ceph osd pool set fast cache_min_flush_age 300
> rados -p slow put obj1 /etc/group
> ceph osd pool set fast cache_target_dirty_ratio .5
> ceph osd pool set fast cache_target_full_ratio .8
> ceph osd pool set fast target_max_objects 1
> rados -p slow put obj2 /etc/group
> rados -p slow put obj3 /etc/group
> ceph osd pool set fast hit_set_count 1
> ceph osd pool set fast hit_set_period 5
> ceph osd pool set fast hit_set_period 600
>
> sleep 30
>
> ceph df
> ceph health detail
>
>
> and the output shows it works (see below). I'll try to figure out the difference with your cluster now that I have a baseline that does the right thing :-) If you can think of something else you did that may explain the failure, please let me know.
>
> $ bash -x /tmp/bug.sh
> + ./stop.sh
> + rm -fr out dev
> + MON=1
> + OSD=3
> + ./vstart.sh -X -d -n -l mon osd
> ** going verbose **
> ip 127.0.0.1
> port
>
> NOTE: hostname resolves to loopback; remote hosts will not be able to
> connect. either adjust /etc/hosts, or edit this script to use your
> machine's real IP.
>
> creating /home/loic/software/ceph/ceph/src/keyring
> ./monmaptool --create --clobber --add a 127.0.0.1:6789 --print /tmp/ceph_monmap.24142
> ./monmaptool: monmap file /tmp/ceph_monmap.24142
> ./monmaptool: generated fsid 1d4a5e92-e47c-4f79-9390-07614b8d261a
> epoch 0
> fsid 1d4a5e92-e47c-4f79-9390-07614b8d261a
> last_changed 2015-08-11 13:26:52.958438
> created 2015-08-11 13:26:52.958438
> 0: 127.0.0.1:6789/0 mon.a
> ./monmaptool: writing epoch 0 to /tmp/ceph_monmap.24142 (1 monitors)
> rm -rf /home/loic/software/ceph/ceph/src/dev/mon.a
> mkdir -p /home/loic/software/ceph/ceph/src/dev/mon.a
> ./ceph-mon --mkfs -c /home/loic/software/ceph/ceph/src/ceph.conf -i a --monmap=/tmp/ceph_monmap.24142 --keyring=/home/loic/software/ceph/ceph/src/keyring
> ./ceph-mon: set fsid to 87413c55-5249-4969-a8fe-f2768e5c59ce
> ./ceph-mon: created monfs at /home/loic/software/ceph/ceph/src/dev/mon.a for mon.a
> ./ceph-mon -i a -c /home/loic/software/ceph/ceph/src/ceph.conf
> ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd0/*'
> add osd0 8c8ac2c0-2426-42dd-ac43-c7b1a54a14bc
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> 0
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> add item id 0 name 'osd.0' weight 1 at location {host=fold,root=default} to crush map
> 2015-08-11 13:26:55.473416 7ff1bf4a07c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
> 2015-08-11 13:26:56.462509 7ff1bf4a07c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
> 2015-08-11 13:26:56.480075 7ff1bf4a07c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
> 2015-08-11 13:26:57.030312 7ff1bf4a07c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd0 journal /home/loic/software/ceph/ceph/src/dev/osd0.journal for osd.0 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
> 2015-08-11 13:26:57.030372 7ff1bf4a07c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd0/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd0/keyring: (2) No such file or directory
> 2015-08-11 13:26:57.030471 7ff1bf4a07c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd0/keyring
> adding osd0 key to auth repository
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> added key for osd.0
> start osd0
> ./ceph-osd -i 0 -c /home/loic/software/ceph/ceph/src/ceph.conf
> starting osd.0 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd0 /home/loic/software/ceph/ceph/src/dev/osd0.journal
> ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd1/*'
> add osd1 a57e7891-c8ca-4a59-92fa-6f65f681a560
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> 1
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> add item id 1 name 'osd.1' weight 1 at location {host=fold,root=default} to crush map
> 2015-08-11 13:27:00.350457 7fc4aa2507c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
> 2015-08-11 13:27:01.104445 7fc4aa2507c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
> 2015-08-11 13:27:01.117456 7fc4aa2507c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd1) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
> 2015-08-11 13:27:01.826145 7fc4aa2507c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd1 journal /home/loic/software/ceph/ceph/src/dev/osd1.journal for osd.1 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
> 2015-08-11 13:27:01.826194 7fc4aa2507c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd1/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd1/keyring: (2) No such file or directory
> 2015-08-11 13:27:01.826277 7fc4aa2507c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd1/keyring
> adding osd1 key to auth repository
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> added key for osd.1
> start osd1
> ./ceph-osd -i 1 -c /home/loic/software/ceph/ceph/src/ceph.conf
> starting osd.1 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd1 /home/loic/software/ceph/ceph/src/dev/osd1.journal
> ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd2/*'
> add osd2 f88f90e0-5796-479c-ac2b-9406f91e55cd
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> 2
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> add item id 2 name 'osd.2' weight 1 at location {host=fold,root=default} to crush map
> 2015-08-11 13:27:05.397800 7f4462e1b7c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
> 2015-08-11 13:27:06.808009 7f4462e1b7c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
> 2015-08-11 13:27:06.827479 7f4462e1b7c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd2) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
> 2015-08-11 13:27:08.451691 7f4462e1b7c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd2 journal /home/loic/software/ceph/ceph/src/dev/osd2.journal for osd.2 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
> 2015-08-11 13:27:08.451731 7f4462e1b7c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd2/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd2/keyring: (2) No such file or directory
> 2015-08-11 13:27:08.454963 7f4462e1b7c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd2/keyring
> adding osd2 key to auth repository
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> added key for osd.2
> start osd2
> ./ceph-osd -i 2 -c /home/loic/software/ceph/ceph/src/ceph.conf
> starting osd.2 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd2 /home/loic/software/ceph/ceph/src/dev/osd2.journal
> started. stop.sh to stop. see out/* (e.g. 'tail -f out/????') for debug output.
>
> export PYTHONPATH=./pybind
> export LD_LIBRARY_PATH=.libs
> + ceph osd pool create slow 1 1
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> pool 'slow' created
> + ceph osd pool create fast 1 1
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> pool 'fast' created
> + ceph osd tier add slow fast
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> pool 'fast' is now (or already was) a tier of 'slow'
> + ceph osd tier cache-mode fast writeback
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set cache-mode for pool 'fast' to writeback
> + ceph osd tier set-overlay slow fast
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> overlay for 'slow' is now (or already was) 'fast'
> + ceph osd pool set fast hit_set_type bloom
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set pool 2 hit_set_type to bloom
> + ceph osd pool set fast cache_target_dirty_ratio 0.05
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set pool 2 cache_target_dirty_ratio to 0.05
> + ceph osd pool set fast cache_min_flush_age 300
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set pool 2 cache_min_flush_age to 300
> + rados -p slow put obj1 /etc/group
> + ceph osd pool set fast cache_target_dirty_ratio .5
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set pool 2 cache_target_dirty_ratio to .5
> + ceph osd pool set fast cache_target_full_ratio .8
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set pool 2 cache_target_full_ratio to .8
> + ceph osd pool set fast target_max_objects 1
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set pool 2 target_max_objects to 1
> + rados -p slow put obj2 /etc/group
> + rados -p slow put obj3 /etc/group
> + ceph osd pool set fast hit_set_count 1
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set pool 2 hit_set_count to 1
> + ceph osd pool set fast hit_set_period 5
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set pool 2 hit_set_period to 5
> + ceph osd pool set fast hit_set_period 600
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> set pool 2 hit_set_period to 600
> + sleep 30
> + ceph df
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> GLOBAL:
> SIZE AVAIL RAW USED %RAW USED
> 547G 56407M 491G 89.92
> POOLS:
> NAME ID USED %USED MAX AVAIL OBJECTS
> rbd 0 0 0 18801M 0
> slow 1 3834 0 18801M 3
> fast 2 0 0 18801M 0
> + ceph health detail
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> HEALTH_WARN 3 near full osd(s); mon.a low disk space
> osd.0 is near full at 89%
> osd.1 is near full at 89%
> osd.2 is near full at 89%
> mon.a low disk space -- 10% avail
>
>
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: tracking down failed eviction
2015-08-11 14:52 ` Loic Dachary
@ 2015-08-11 14:58 ` Loic Dachary
2015-08-11 21:57 ` Loic Dachary
0 siblings, 1 reply; 4+ messages in thread
From: Loic Dachary @ 2015-08-11 14:58 UTC (permalink / raw)
To: Shylesh Kumar; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 15797 bytes --]
Looks like these 8 objects really are hit set archives:
$ sudo rados --namespace .ceph-internal -p fast ls
hit_set_4.3_archive_2015-08-10 14:21:26.133292_2015-08-11 10:42:57.463074
hit_set_4.4_archive_2015-08-10 14:46:52.626962_2015-08-11 07:12:21.654410
hit_set_4.b_archive_2015-08-10 14:21:32.897661_2015-08-11 10:42:30.888782
hit_set_4.2c_archive_2015-08-10 08:29:09.218604_2015-08-10 08:29:14.782606
hit_set_4.63_archive_2015-08-11 06:52:25.052126_2015-08-11 07:12:21.776928
hit_set_4.74_archive_2015-08-10 14:41:39.411882_2015-08-11 07:12:21.789545
hit_set_4.7c_archive_2015-08-10 14:38:04.463447_2015-08-11 07:12:21.801581
hit_set_4.7f_archive_2015-08-10 14:43:01.446562_2015-08-11 07:12:21.819104
On 11/08/2015 16:52, Loic Dachary wrote:
> Hi,
>
> In your cluster there is:
>
> GLOBAL:
> SIZE AVAIL RAW USED %RAW USED
> 8334G 8333G 1280M 0.02
> POOLS:
> NAME ID USED %USED MAX AVAIL OBJECTS
> rbd 0 0 0 2777G 0
> mypool 1 135 0 2777G 2
> sample 2 0 0 1851G 0
> slow 3 295M 0 1851G 84
> fast 4 498 0 1851G 6
>
> After running
>
> rados -p fast cache-flush-evict-all
>
> Then I do the following:
>
> rados -p slow put loic1 /etc/group
>
> ceph df
> fast 4 1371 0 1851G 8
>
> I'm assuming the extra object (6 objects + 1 for loic1 + 1 other ) is an internal hit set.
>
> rados -p slow put loic2 /etc/group
>
> ceph df
> fast 4 1455 0 1851G 9
>
> There I have 2 objects in the fast pool. I sleep 120 seconds and I see it going back to 8
>
> ceph df
> fast 4 666 0 1851G 8
>
> Which is consistent with the settings of the pool that only allow for one object to stay in the fast pool as reported with ceph report:
>
> {
> "pool": 4,
> "pool_name": "fast",
> "flags": 9,
> "flags_names": "hashpspool,incomplete_clones",
> "type": 1,
> "size": 3,
> "min_size": 2,
> "crush_ruleset": 1,
> "object_hash": 2,
> "pg_num": 128,
> "pg_placement_num": 128,
> "crash_replay_interval": 0,
> "last_change": "79",
> "last_force_op_resend": "0",
> "auid": 0,
> "snap_mode": "selfmanaged",
> "snap_seq": 0,
> "snap_epoch": 67,
> "pool_snaps": [],
> "removed_snaps": "[]",
> "quota_max_bytes": 0,
> "quota_max_objects": 0,
> "tiers": [],
> "tier_of": 3,
> "read_tier": -1,
> "write_tier": -1,
> "cache_mode": "writeback",
> "target_max_bytes": 1000,
> "target_max_objects": 1,
> "cache_target_dirty_ratio_micro": 500000,
> "cache_target_full_ratio_micro": 800000,
> "cache_min_flush_age": 300,
> "cache_min_evict_age": 0,
> "erasure_code_profile": "cache",
> "hit_set_params": {
> "type": "bloom",
> "false_positive_probability": 0.050000,
> "target_size": 0,
> "seed": 0
> },
> "hit_set_period": 600,
> "hit_set_count": 1,
> "min_read_recency_for_promote": 0,
> "stripe_width": 0,
> "expected_num_objects": 0
> }
>
> I'm not entirely sure the explanation for the remaining 6 objects (i.e. them being internal objects used to store the hit set) is right. Let say it is right, is there more unexplained behavior ?
>
> Cheers
>
> For the record: https://github.com/ceph/ceph/blob/hammer/src/osd/ReplicatedPG.cc#L10494 bool ReplicatedPG::agent_work
>
> On 11/08/2015 13:32, Loic Dachary wrote:
>> Hi Shylesh,
>>
>>
>> I ran the following on a hammer compiled from sources.
>>
>> ./stop.sh
>> rm -fr out dev ; MON=1 OSD=3 ./vstart.sh -X -d -n -l mon osd
>>
>> ceph osd pool create slow 1 1
>> ceph osd pool create fast 1 1
>> ceph osd tier add slow fast
>> ceph osd tier cache-mode fast writeback
>> ceph osd tier set-overlay slow fast
>> ceph osd pool set fast hit_set_type bloom
>> ceph osd pool set fast cache_target_dirty_ratio 0.05
>> ceph osd pool set fast cache_min_flush_age 300
>> rados -p slow put obj1 /etc/group
>> ceph osd pool set fast cache_target_dirty_ratio .5
>> ceph osd pool set fast cache_target_full_ratio .8
>> ceph osd pool set fast target_max_objects 1
>> rados -p slow put obj2 /etc/group
>> rados -p slow put obj3 /etc/group
>> ceph osd pool set fast hit_set_count 1
>> ceph osd pool set fast hit_set_period 5
>> ceph osd pool set fast hit_set_period 600
>>
>> sleep 30
>>
>> ceph df
>> ceph health detail
>>
>>
>> and the output shows it works (see below). I'll try to figure out the difference with your cluster now that I have a baseline that does the right thing :-) If you can think of something else you did that may explain the failure, please let me know.
>>
>> $ bash -x /tmp/bug.sh
>> + ./stop.sh
>> + rm -fr out dev
>> + MON=1
>> + OSD=3
>> + ./vstart.sh -X -d -n -l mon osd
>> ** going verbose **
>> ip 127.0.0.1
>> port
>>
>> NOTE: hostname resolves to loopback; remote hosts will not be able to
>> connect. either adjust /etc/hosts, or edit this script to use your
>> machine's real IP.
>>
>> creating /home/loic/software/ceph/ceph/src/keyring
>> ./monmaptool --create --clobber --add a 127.0.0.1:6789 --print /tmp/ceph_monmap.24142
>> ./monmaptool: monmap file /tmp/ceph_monmap.24142
>> ./monmaptool: generated fsid 1d4a5e92-e47c-4f79-9390-07614b8d261a
>> epoch 0
>> fsid 1d4a5e92-e47c-4f79-9390-07614b8d261a
>> last_changed 2015-08-11 13:26:52.958438
>> created 2015-08-11 13:26:52.958438
>> 0: 127.0.0.1:6789/0 mon.a
>> ./monmaptool: writing epoch 0 to /tmp/ceph_monmap.24142 (1 monitors)
>> rm -rf /home/loic/software/ceph/ceph/src/dev/mon.a
>> mkdir -p /home/loic/software/ceph/ceph/src/dev/mon.a
>> ./ceph-mon --mkfs -c /home/loic/software/ceph/ceph/src/ceph.conf -i a --monmap=/tmp/ceph_monmap.24142 --keyring=/home/loic/software/ceph/ceph/src/keyring
>> ./ceph-mon: set fsid to 87413c55-5249-4969-a8fe-f2768e5c59ce
>> ./ceph-mon: created monfs at /home/loic/software/ceph/ceph/src/dev/mon.a for mon.a
>> ./ceph-mon -i a -c /home/loic/software/ceph/ceph/src/ceph.conf
>> ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd0/*'
>> add osd0 8c8ac2c0-2426-42dd-ac43-c7b1a54a14bc
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> 0
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> add item id 0 name 'osd.0' weight 1 at location {host=fold,root=default} to crush map
>> 2015-08-11 13:26:55.473416 7ff1bf4a07c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>> 2015-08-11 13:26:56.462509 7ff1bf4a07c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>> 2015-08-11 13:26:56.480075 7ff1bf4a07c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
>> 2015-08-11 13:26:57.030312 7ff1bf4a07c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd0 journal /home/loic/software/ceph/ceph/src/dev/osd0.journal for osd.0 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
>> 2015-08-11 13:26:57.030372 7ff1bf4a07c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd0/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd0/keyring: (2) No such file or directory
>> 2015-08-11 13:26:57.030471 7ff1bf4a07c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd0/keyring
>> adding osd0 key to auth repository
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> added key for osd.0
>> start osd0
>> ./ceph-osd -i 0 -c /home/loic/software/ceph/ceph/src/ceph.conf
>> starting osd.0 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd0 /home/loic/software/ceph/ceph/src/dev/osd0.journal
>> ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd1/*'
>> add osd1 a57e7891-c8ca-4a59-92fa-6f65f681a560
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> 1
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> add item id 1 name 'osd.1' weight 1 at location {host=fold,root=default} to crush map
>> 2015-08-11 13:27:00.350457 7fc4aa2507c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>> 2015-08-11 13:27:01.104445 7fc4aa2507c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>> 2015-08-11 13:27:01.117456 7fc4aa2507c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd1) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
>> 2015-08-11 13:27:01.826145 7fc4aa2507c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd1 journal /home/loic/software/ceph/ceph/src/dev/osd1.journal for osd.1 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
>> 2015-08-11 13:27:01.826194 7fc4aa2507c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd1/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd1/keyring: (2) No such file or directory
>> 2015-08-11 13:27:01.826277 7fc4aa2507c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd1/keyring
>> adding osd1 key to auth repository
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> added key for osd.1
>> start osd1
>> ./ceph-osd -i 1 -c /home/loic/software/ceph/ceph/src/ceph.conf
>> starting osd.1 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd1 /home/loic/software/ceph/ceph/src/dev/osd1.journal
>> ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd2/*'
>> add osd2 f88f90e0-5796-479c-ac2b-9406f91e55cd
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> 2
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> add item id 2 name 'osd.2' weight 1 at location {host=fold,root=default} to crush map
>> 2015-08-11 13:27:05.397800 7f4462e1b7c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>> 2015-08-11 13:27:06.808009 7f4462e1b7c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>> 2015-08-11 13:27:06.827479 7f4462e1b7c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd2) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
>> 2015-08-11 13:27:08.451691 7f4462e1b7c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd2 journal /home/loic/software/ceph/ceph/src/dev/osd2.journal for osd.2 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
>> 2015-08-11 13:27:08.451731 7f4462e1b7c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd2/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd2/keyring: (2) No such file or directory
>> 2015-08-11 13:27:08.454963 7f4462e1b7c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd2/keyring
>> adding osd2 key to auth repository
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> added key for osd.2
>> start osd2
>> ./ceph-osd -i 2 -c /home/loic/software/ceph/ceph/src/ceph.conf
>> starting osd.2 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd2 /home/loic/software/ceph/ceph/src/dev/osd2.journal
>> started. stop.sh to stop. see out/* (e.g. 'tail -f out/????') for debug output.
>>
>> export PYTHONPATH=./pybind
>> export LD_LIBRARY_PATH=.libs
>> + ceph osd pool create slow 1 1
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> pool 'slow' created
>> + ceph osd pool create fast 1 1
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> pool 'fast' created
>> + ceph osd tier add slow fast
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> pool 'fast' is now (or already was) a tier of 'slow'
>> + ceph osd tier cache-mode fast writeback
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> set cache-mode for pool 'fast' to writeback
>> + ceph osd tier set-overlay slow fast
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> overlay for 'slow' is now (or already was) 'fast'
>> + ceph osd pool set fast hit_set_type bloom
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> set pool 2 hit_set_type to bloom
>> + ceph osd pool set fast cache_target_dirty_ratio 0.05
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> set pool 2 cache_target_dirty_ratio to 0.05
>> + ceph osd pool set fast cache_min_flush_age 300
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> set pool 2 cache_min_flush_age to 300
>> + rados -p slow put obj1 /etc/group
>> + ceph osd pool set fast cache_target_dirty_ratio .5
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> set pool 2 cache_target_dirty_ratio to .5
>> + ceph osd pool set fast cache_target_full_ratio .8
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> set pool 2 cache_target_full_ratio to .8
>> + ceph osd pool set fast target_max_objects 1
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> set pool 2 target_max_objects to 1
>> + rados -p slow put obj2 /etc/group
>> + rados -p slow put obj3 /etc/group
>> + ceph osd pool set fast hit_set_count 1
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> set pool 2 hit_set_count to 1
>> + ceph osd pool set fast hit_set_period 5
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> set pool 2 hit_set_period to 5
>> + ceph osd pool set fast hit_set_period 600
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> set pool 2 hit_set_period to 600
>> + sleep 30
>> + ceph df
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> GLOBAL:
>> SIZE AVAIL RAW USED %RAW USED
>> 547G 56407M 491G 89.92
>> POOLS:
>> NAME ID USED %USED MAX AVAIL OBJECTS
>> rbd 0 0 0 18801M 0
>> slow 1 3834 0 18801M 3
>> fast 2 0 0 18801M 0
>> + ceph health detail
>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>> HEALTH_WARN 3 near full osd(s); mon.a low disk space
>> osd.0 is near full at 89%
>> osd.1 is near full at 89%
>> osd.2 is near full at 89%
>> mon.a low disk space -- 10% avail
>>
>>
>
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: tracking down failed eviction
2015-08-11 14:58 ` Loic Dachary
@ 2015-08-11 21:57 ` Loic Dachary
0 siblings, 0 replies; 4+ messages in thread
From: Loic Dachary @ 2015-08-11 21:57 UTC (permalink / raw)
To: Shylesh Kumar; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 17410 bytes --]
Hi Shylesh,
The following script reproduces the problem: obj3 is promoted and never evicted
rados -p slow get obj3 /tmp/obj3
Running the same command again somehow fixes the situation and obj3 is evicted. Reading again, obj3 is no longer evicted. I'll create a bug report out of this.
Thanks for your patience :-)
./stop.sh
rm -fr out dev ; MON=1 OSD=3 ./vstart.sh -X -d -n -l mon osd
ceph osd pool create slow 1 1
ceph osd pool create fast 1 1
ceph osd tier add slow fast
ceph osd tier cache-mode fast writeback
ceph osd tier set-overlay slow fast
ceph osd pool set fast hit_set_type bloom
ceph osd pool set fast cache_target_dirty_ratio 0.05
ceph osd pool set fast cache_min_flush_age 300
rados -p slow put obj1 /etc/group
rados -p slow put obj2 /etc/group
rados -p slow put obj3 /etc/group
ceph osd pool set fast cache_target_dirty_ratio .5
ceph osd pool set fast cache_target_full_ratio .8
ceph osd pool set fast target_max_objects 1
ceph osd pool set fast hit_set_count 1
ceph osd pool set fast hit_set_period 5
sleep 30
ceph df
rados -p slow get obj3 /tmp/obj3
ceph df
sleep 30
ceph df
ceph health detail
On 11/08/2015 16:58, Loic Dachary wrote:
> Looks like these 8 objects really are hit set archives:
>
> $ sudo rados --namespace .ceph-internal -p fast ls
> hit_set_4.3_archive_2015-08-10 14:21:26.133292_2015-08-11 10:42:57.463074
> hit_set_4.4_archive_2015-08-10 14:46:52.626962_2015-08-11 07:12:21.654410
> hit_set_4.b_archive_2015-08-10 14:21:32.897661_2015-08-11 10:42:30.888782
> hit_set_4.2c_archive_2015-08-10 08:29:09.218604_2015-08-10 08:29:14.782606
> hit_set_4.63_archive_2015-08-11 06:52:25.052126_2015-08-11 07:12:21.776928
> hit_set_4.74_archive_2015-08-10 14:41:39.411882_2015-08-11 07:12:21.789545
> hit_set_4.7c_archive_2015-08-10 14:38:04.463447_2015-08-11 07:12:21.801581
> hit_set_4.7f_archive_2015-08-10 14:43:01.446562_2015-08-11 07:12:21.819104
>
>
> On 11/08/2015 16:52, Loic Dachary wrote:
>> Hi,
>>
>> In your cluster there is:
>>
>> GLOBAL:
>> SIZE AVAIL RAW USED %RAW USED
>> 8334G 8333G 1280M 0.02
>> POOLS:
>> NAME ID USED %USED MAX AVAIL OBJECTS
>> rbd 0 0 0 2777G 0
>> mypool 1 135 0 2777G 2
>> sample 2 0 0 1851G 0
>> slow 3 295M 0 1851G 84
>> fast 4 498 0 1851G 6
>>
>> After running
>>
>> rados -p fast cache-flush-evict-all
>>
>> Then I do the following:
>>
>> rados -p slow put loic1 /etc/group
>>
>> ceph df
>> fast 4 1371 0 1851G 8
>>
>> I'm assuming the extra object (6 objects + 1 for loic1 + 1 other ) is an internal hit set.
>>
>> rados -p slow put loic2 /etc/group
>>
>> ceph df
>> fast 4 1455 0 1851G 9
>>
>> There I have 2 objects in the fast pool. I sleep 120 seconds and I see it going back to 8
>>
>> ceph df
>> fast 4 666 0 1851G 8
>>
>> Which is consistent with the settings of the pool that only allow for one object to stay in the fast pool as reported with ceph report:
>>
>> {
>> "pool": 4,
>> "pool_name": "fast",
>> "flags": 9,
>> "flags_names": "hashpspool,incomplete_clones",
>> "type": 1,
>> "size": 3,
>> "min_size": 2,
>> "crush_ruleset": 1,
>> "object_hash": 2,
>> "pg_num": 128,
>> "pg_placement_num": 128,
>> "crash_replay_interval": 0,
>> "last_change": "79",
>> "last_force_op_resend": "0",
>> "auid": 0,
>> "snap_mode": "selfmanaged",
>> "snap_seq": 0,
>> "snap_epoch": 67,
>> "pool_snaps": [],
>> "removed_snaps": "[]",
>> "quota_max_bytes": 0,
>> "quota_max_objects": 0,
>> "tiers": [],
>> "tier_of": 3,
>> "read_tier": -1,
>> "write_tier": -1,
>> "cache_mode": "writeback",
>> "target_max_bytes": 1000,
>> "target_max_objects": 1,
>> "cache_target_dirty_ratio_micro": 500000,
>> "cache_target_full_ratio_micro": 800000,
>> "cache_min_flush_age": 300,
>> "cache_min_evict_age": 0,
>> "erasure_code_profile": "cache",
>> "hit_set_params": {
>> "type": "bloom",
>> "false_positive_probability": 0.050000,
>> "target_size": 0,
>> "seed": 0
>> },
>> "hit_set_period": 600,
>> "hit_set_count": 1,
>> "min_read_recency_for_promote": 0,
>> "stripe_width": 0,
>> "expected_num_objects": 0
>> }
>>
>> I'm not entirely sure the explanation for the remaining 6 objects (i.e. them being internal objects used to store the hit set) is right. Let say it is right, is there more unexplained behavior ?
>>
>> Cheers
>>
>> For the record: https://github.com/ceph/ceph/blob/hammer/src/osd/ReplicatedPG.cc#L10494 bool ReplicatedPG::agent_work
>>
>> On 11/08/2015 13:32, Loic Dachary wrote:
>>> Hi Shylesh,
>>>
>>>
>>> I ran the following on a hammer compiled from sources.
>>>
>>> ./stop.sh
>>> rm -fr out dev ; MON=1 OSD=3 ./vstart.sh -X -d -n -l mon osd
>>>
>>> ceph osd pool create slow 1 1
>>> ceph osd pool create fast 1 1
>>> ceph osd tier add slow fast
>>> ceph osd tier cache-mode fast writeback
>>> ceph osd tier set-overlay slow fast
>>> ceph osd pool set fast hit_set_type bloom
>>> ceph osd pool set fast cache_target_dirty_ratio 0.05
>>> ceph osd pool set fast cache_min_flush_age 300
>>> rados -p slow put obj1 /etc/group
>>> ceph osd pool set fast cache_target_dirty_ratio .5
>>> ceph osd pool set fast cache_target_full_ratio .8
>>> ceph osd pool set fast target_max_objects 1
>>> rados -p slow put obj2 /etc/group
>>> rados -p slow put obj3 /etc/group
>>> ceph osd pool set fast hit_set_count 1
>>> ceph osd pool set fast hit_set_period 5
>>> ceph osd pool set fast hit_set_period 600
>>>
>>> sleep 30
>>>
>>> ceph df
>>> ceph health detail
>>>
>>>
>>> and the output shows it works (see below). I'll try to figure out the difference with your cluster now that I have a baseline that does the right thing :-) If you can think of something else you did that may explain the failure, please let me know.
>>>
>>> $ bash -x /tmp/bug.sh
>>> + ./stop.sh
>>> + rm -fr out dev
>>> + MON=1
>>> + OSD=3
>>> + ./vstart.sh -X -d -n -l mon osd
>>> ** going verbose **
>>> ip 127.0.0.1
>>> port
>>>
>>> NOTE: hostname resolves to loopback; remote hosts will not be able to
>>> connect. either adjust /etc/hosts, or edit this script to use your
>>> machine's real IP.
>>>
>>> creating /home/loic/software/ceph/ceph/src/keyring
>>> ./monmaptool --create --clobber --add a 127.0.0.1:6789 --print /tmp/ceph_monmap.24142
>>> ./monmaptool: monmap file /tmp/ceph_monmap.24142
>>> ./monmaptool: generated fsid 1d4a5e92-e47c-4f79-9390-07614b8d261a
>>> epoch 0
>>> fsid 1d4a5e92-e47c-4f79-9390-07614b8d261a
>>> last_changed 2015-08-11 13:26:52.958438
>>> created 2015-08-11 13:26:52.958438
>>> 0: 127.0.0.1:6789/0 mon.a
>>> ./monmaptool: writing epoch 0 to /tmp/ceph_monmap.24142 (1 monitors)
>>> rm -rf /home/loic/software/ceph/ceph/src/dev/mon.a
>>> mkdir -p /home/loic/software/ceph/ceph/src/dev/mon.a
>>> ./ceph-mon --mkfs -c /home/loic/software/ceph/ceph/src/ceph.conf -i a --monmap=/tmp/ceph_monmap.24142 --keyring=/home/loic/software/ceph/ceph/src/keyring
>>> ./ceph-mon: set fsid to 87413c55-5249-4969-a8fe-f2768e5c59ce
>>> ./ceph-mon: created monfs at /home/loic/software/ceph/ceph/src/dev/mon.a for mon.a
>>> ./ceph-mon -i a -c /home/loic/software/ceph/ceph/src/ceph.conf
>>> ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd0/*'
>>> add osd0 8c8ac2c0-2426-42dd-ac43-c7b1a54a14bc
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> 0
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> add item id 0 name 'osd.0' weight 1 at location {host=fold,root=default} to crush map
>>> 2015-08-11 13:26:55.473416 7ff1bf4a07c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>>> 2015-08-11 13:26:56.462509 7ff1bf4a07c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>>> 2015-08-11 13:26:56.480075 7ff1bf4a07c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
>>> 2015-08-11 13:26:57.030312 7ff1bf4a07c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd0 journal /home/loic/software/ceph/ceph/src/dev/osd0.journal for osd.0 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
>>> 2015-08-11 13:26:57.030372 7ff1bf4a07c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd0/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd0/keyring: (2) No such file or directory
>>> 2015-08-11 13:26:57.030471 7ff1bf4a07c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd0/keyring
>>> adding osd0 key to auth repository
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> added key for osd.0
>>> start osd0
>>> ./ceph-osd -i 0 -c /home/loic/software/ceph/ceph/src/ceph.conf
>>> starting osd.0 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd0 /home/loic/software/ceph/ceph/src/dev/osd0.journal
>>> ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd1/*'
>>> add osd1 a57e7891-c8ca-4a59-92fa-6f65f681a560
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> 1
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> add item id 1 name 'osd.1' weight 1 at location {host=fold,root=default} to crush map
>>> 2015-08-11 13:27:00.350457 7fc4aa2507c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>>> 2015-08-11 13:27:01.104445 7fc4aa2507c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>>> 2015-08-11 13:27:01.117456 7fc4aa2507c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd1) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
>>> 2015-08-11 13:27:01.826145 7fc4aa2507c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd1 journal /home/loic/software/ceph/ceph/src/dev/osd1.journal for osd.1 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
>>> 2015-08-11 13:27:01.826194 7fc4aa2507c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd1/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd1/keyring: (2) No such file or directory
>>> 2015-08-11 13:27:01.826277 7fc4aa2507c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd1/keyring
>>> adding osd1 key to auth repository
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> added key for osd.1
>>> start osd1
>>> ./ceph-osd -i 1 -c /home/loic/software/ceph/ceph/src/ceph.conf
>>> starting osd.1 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd1 /home/loic/software/ceph/ceph/src/dev/osd1.journal
>>> ERROR: error accessing '/home/loic/software/ceph/ceph/src/dev/osd2/*'
>>> add osd2 f88f90e0-5796-479c-ac2b-9406f91e55cd
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> 2
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> add item id 2 name 'osd.2' weight 1 at location {host=fold,root=default} to crush map
>>> 2015-08-11 13:27:05.397800 7f4462e1b7c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>>> 2015-08-11 13:27:06.808009 7f4462e1b7c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
>>> 2015-08-11 13:27:06.827479 7f4462e1b7c0 -1 filestore(/home/loic/software/ceph/ceph/src/dev/osd2) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
>>> 2015-08-11 13:27:08.451691 7f4462e1b7c0 -1 created object store /home/loic/software/ceph/ceph/src/dev/osd2 journal /home/loic/software/ceph/ceph/src/dev/osd2.journal for osd.2 fsid 87413c55-5249-4969-a8fe-f2768e5c59ce
>>> 2015-08-11 13:27:08.451731 7f4462e1b7c0 -1 auth: error reading file: /home/loic/software/ceph/ceph/src/dev/osd2/keyring: can't open /home/loic/software/ceph/ceph/src/dev/osd2/keyring: (2) No such file or directory
>>> 2015-08-11 13:27:08.454963 7f4462e1b7c0 -1 created new key in keyring /home/loic/software/ceph/ceph/src/dev/osd2/keyring
>>> adding osd2 key to auth repository
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> added key for osd.2
>>> start osd2
>>> ./ceph-osd -i 2 -c /home/loic/software/ceph/ceph/src/ceph.conf
>>> starting osd.2 at :/0 osd_data /home/loic/software/ceph/ceph/src/dev/osd2 /home/loic/software/ceph/ceph/src/dev/osd2.journal
>>> started. stop.sh to stop. see out/* (e.g. 'tail -f out/????') for debug output.
>>>
>>> export PYTHONPATH=./pybind
>>> export LD_LIBRARY_PATH=.libs
>>> + ceph osd pool create slow 1 1
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> pool 'slow' created
>>> + ceph osd pool create fast 1 1
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> pool 'fast' created
>>> + ceph osd tier add slow fast
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> pool 'fast' is now (or already was) a tier of 'slow'
>>> + ceph osd tier cache-mode fast writeback
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> set cache-mode for pool 'fast' to writeback
>>> + ceph osd tier set-overlay slow fast
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> overlay for 'slow' is now (or already was) 'fast'
>>> + ceph osd pool set fast hit_set_type bloom
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> set pool 2 hit_set_type to bloom
>>> + ceph osd pool set fast cache_target_dirty_ratio 0.05
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> set pool 2 cache_target_dirty_ratio to 0.05
>>> + ceph osd pool set fast cache_min_flush_age 300
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> set pool 2 cache_min_flush_age to 300
>>> + rados -p slow put obj1 /etc/group
>>> + ceph osd pool set fast cache_target_dirty_ratio .5
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> set pool 2 cache_target_dirty_ratio to .5
>>> + ceph osd pool set fast cache_target_full_ratio .8
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> set pool 2 cache_target_full_ratio to .8
>>> + ceph osd pool set fast target_max_objects 1
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> set pool 2 target_max_objects to 1
>>> + rados -p slow put obj2 /etc/group
>>> + rados -p slow put obj3 /etc/group
>>> + ceph osd pool set fast hit_set_count 1
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> set pool 2 hit_set_count to 1
>>> + ceph osd pool set fast hit_set_period 5
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> set pool 2 hit_set_period to 5
>>> + ceph osd pool set fast hit_set_period 600
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> set pool 2 hit_set_period to 600
>>> + sleep 30
>>> + ceph df
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> GLOBAL:
>>> SIZE AVAIL RAW USED %RAW USED
>>> 547G 56407M 491G 89.92
>>> POOLS:
>>> NAME ID USED %USED MAX AVAIL OBJECTS
>>> rbd 0 0 0 18801M 0
>>> slow 1 3834 0 18801M 3
>>> fast 2 0 0 18801M 0
>>> + ceph health detail
>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>> HEALTH_WARN 3 near full osd(s); mon.a low disk space
>>> osd.0 is near full at 89%
>>> osd.1 is near full at 89%
>>> osd.2 is near full at 89%
>>> mon.a low disk space -- 10% avail
>>>
>>>
>>
>
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-08-11 21:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-11 11:32 tracking down failed eviction Loic Dachary
2015-08-11 14:52 ` Loic Dachary
2015-08-11 14:58 ` Loic Dachary
2015-08-11 21:57 ` Loic Dachary
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.