From: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
To: Brad Walker <bwalker-WlSugiYO8JFBDgjK7y7TUQ@public.gmane.org>
Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: problem w/ read caching..
Date: Mon, 1 Oct 2012 14:14:58 -0700 [thread overview]
Message-ID: <20121001211458.GC26488@google.com> (raw)
In-Reply-To: <loom.20121001T223817-249-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
On Mon, Oct 01, 2012 at 08:56:21PM +0000, Brad Walker wrote:
> Kent Overstreet <koverstreet@...> writes:
>
> >
> > On Mon, Oct 01, 2012 at 08:05:14PM +0000, Brad Walker wrote:
> > > Kent Overstreet <koverstreet@...> writes:
> > >
> > > >
> > > > What about cache_bypass_hits, cache_bypass_misses?
> > > >
> > >
> > > cache_bypass_hits = 0
> > > cache_bypass_misses = 0
> >
> > I should've just asked you for all the stats - what about
> > cache_miss_collision?
So cache_miss_collisions, cache_read_races are 0...
----
I was just browsing around the code, and I bet I know what it is -
btree_insert_check_key() is failing because the btree node is full.
The way the code works is on cache miss, we can't just blindly insert
that data into the cache because if a write happens to the same location
after the cache miss but before the data from the cache miss gets
inserted, we'd overwrite the write with stale data.
So btree_insert_check_key() inserts a fake key atomically with the cache
miss - we don't need that key to be persisted so we can skip
journalling and all the normal btree insert code, which is how we can
insert this fake key atomically.
Then, on when we go to insert the real key that points to the data from
the cache miss, we check if the fake key we inserted is still present
and fail the insert if it's not.
It's cmpxchg(), but for the btree.
Anyways... since we're skipping all the normal btree_insert() code,
btree_insert_check_key() can't split the btree node if it's full - if
the btree node is full it just fails it.
This'd be perfectly fine in any normal workload where you've got some
mix of reads and writes... if the btree node is full, a write will come
along to split it.
But the synthetic workload is a bit of a pathological case here :)
But, we should confirm this really is what's going on... Can you apply
this patch and rerun to test my theory? See if the number of times the
printk fires lines up with the number of cache misses.
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 4102267..d5c5313 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1875,9 +1875,13 @@ bool bch_btree_insert_check_key(struct btree *b, struct btree_op *op,
rw_unlock(false, b);
rw_lock(true, b, b->level);
+ if (should_split(b)) {
+ printk(KERN_DEBUG "bcache: bch_btree_insert_check_key() failed because btree node full\n");
+ goto out;
+ }
+
if (b->key.ptr[0] != btree_ptr ||
- b->seq != seq + 1 ||
- should_split(b))
+ b->seq != seq + 1)
goto out;
op->replace = KEY(op->inode, bio_end(bio), bio_sectors(bio));
next prev parent reply other threads:[~2012-10-01 21:14 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-12 20:01 problem w/ read caching Brad Walker
[not found] ` <CAPKZHbV3n7O+VRVNS-C2oDVSpO_VdirMDUOuwwWKaA5ZOUEG_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-09-13 18:43 ` Kent Overstreet
2012-09-27 23:28 ` Brad Walker
[not found] ` <loom.20120928T010314-562-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2012-09-28 18:59 ` Kent Overstreet
2012-10-01 19:18 ` Brad Walker
[not found] ` <loom.20121001T211315-779-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2012-10-01 19:38 ` Kent Overstreet
2012-10-01 20:05 ` Brad Walker
[not found] ` <loom.20121001T220412-225-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2012-10-01 20:37 ` Kent Overstreet
2012-10-01 20:56 ` Brad Walker
[not found] ` <loom.20121001T223817-249-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2012-10-01 21:14 ` Kent Overstreet [this message]
2012-10-01 22:26 ` Brad Walker
[not found] ` <loom.20121002T001556-394-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2012-10-01 23:00 ` Kent Overstreet
[not found] ` <20121001230023.GG26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-03 4:44 ` Kent Overstreet
2012-10-08 16:39 ` Brad Walker
[not found] <50651c68.a8e1440a.1165.67c8SMTPIN_ADDED@mx.google.com>
[not found] ` <50651c68.a8e1440a.1165.67c8SMTPIN_ADDED-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2012-09-28 16:31 ` Brad Walker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121001211458.GC26488@google.com \
--to=koverstreet-hpiqsd4aklfqt0dzr+alfa@public.gmane.org \
--cc=bwalker-WlSugiYO8JFBDgjK7y7TUQ@public.gmane.org \
--cc=linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).