From: NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org>
To: Kent Overstreet
<kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Dan Williams
<dan.j.williams-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>,
"linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"rdunlap-/UHa2rfvQTnk1uMJSBkQmQ@public.gmane.org"
<rdunlap-/UHa2rfvQTnk1uMJSBkQmQ@public.gmane.org>,
"axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org"
<axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
"akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org"
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Subject: Re: [GIT] Bcache version 12
Date: Mon, 19 Sep 2011 17:16:06 +1000 [thread overview]
Message-ID: <20110919171606.3640c102@notabene.brown> (raw)
In-Reply-To: <CAC7rs0t_J+foaLZSuuw5BhpUAYfr-KY1iegFOxEBPCpbrkk1Dg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 5803 bytes --]
On Thu, 15 Sep 2011 14:33:36 -0700 Kent Overstreet
<kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Thu, Sep 15, 2011 at 2:15 PM, Dan Williams <dan.j.williams-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Sun, Sep 11, 2011 at 6:44 PM, Kent Overstreet
> > <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> On Sun, Sep 11, 2011 at 07:35:56PM -0600, Andreas Dilger wrote:
> >>> On 2011-09-11, at 1:23 PM, Kent Overstreet <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >>> > I don't think that makes any more sense, as module paramaters AFAIK are
> >>> > even more explicitly just a value you can stick in and pull out.
> >>> > /sys/fs/bcache/register is really more analagous to mount().
> >
> > ... and you looked at module_param_call()?
>
> Damn, nope. I still think a module parameter is even uglier than a
> sysfs file, though.
Beauty is in the eye of the beholder I guess.
>
> As far as I can tell, the linux kernel is really lacking any sort of
> coherent vision for how to make arbitrary interfaces available from
> the filesystem.
Cannot disagree with that. Coherent vision isn't something that the kernel
community really values.
I think the best approach is always to find out how someone else already
achieved a similar goal. Then either:
1/ copy that
2/ make a convincing argument why is it bad, and produce a better
implementation which meets your needs and theirs.
i.e. perfect is not an option, better is good when convincing, but not-worse
is always acceptable.
>
> We all seem to agree that it's a worthwhile thing to do - nobody likes
> ioctls, /proc/sys has been around for ages; something visible and
> discoverable beats an ioctl or a weird special purpose system call any
> day.
>
> But until people can agree on - hell, even come up with a decent plan
> - for the right way to put interfaces in the filesystem, I'm not going
> to lose much sleep over it.
>
> >> I looked into that many months ago, spent quite a bit of time fighting
> >> with the dm code trying to get it to do what I wanted and... no. Never
> >> again
> >
> > Did you do a similar analysis of md? I had a pet caching project that
> > had it's own sysfs interface registration system, and came to the
> > conclusion that it would have been better to have started with an MD
> > personality. Especially when one of the legs of the cache is a
> > md-raid array it helps to keep all that assembly logic using the same
> > interface.
>
> I did spend some time looking at md, I don't really remember if I gave
> it a fair chance or if I found a critical flaw.
>
> I agree that an md personality ought to be a good fit but I don't
> think the current md code is ideal for what bcache wants to do. Much
> saner than dm, but I think it still suffers from the assumption that
> there's some easy mapping from superblocks to block devices, with
> bcache they really can't be tied together.
I don't understand what you mean there, even after reading bcache.txt.
Does not each block device have a unique superblock (created by make-bcache)
on it? That should define a clear 1-to-1 mapping....
It isn't clear from the documentation what a 'cache set' is. I think it is a
set of related cache devices. But how do they relate to backing devices?
Is it one backing device per cache set? Or can it be several backing devices
are all cached by one cache-set??
In any case it certainly could be modelled in md - and if the modelling were
not elegant (e.g. even device numbers for backing devices, odd device numbers
for cache devices) we could "fix" md to make it more elegant.
(Not that I'm necessarily advocating an md interface, but if I can understand
why you don't think md can work, then I might understand bcache better ....
or you might get to understand md better).
Do you have any benchmark numbers showing how wonderful this feature is in
practice? Preferably some artificial workloads that show fantastic
improvement, some that show the worst result you can, and something that is
actually realistic (best case, worst case, real case). Graphs are nice.
... I just checked http://bcache.evilpiepirate.org/ and there is one graph
there which does seem nice, but it doesn't tell me much (I don't know what a
Corsair Nova is). And while bonnie certainly has some value, it mainly shows
you how fast bonnie can run. Reporting the file size used and splitting out
the sequential and random, read and write speeds would help a lot.
Also I don't think the code belongs in /block. The CRC64 code should go
in /lib and the rest should either be in /drivers/block or
possible /drivers/md (as it makes a single device out of 'multiple devices'.
Obviously that isn't urgent, but should be fixed before it can be considered
to be ready.
Is there some documentation on the format of the cache and the cache
replacement policy? I couldn't easily find anything on your wiki.
Having that would make it much easier to review the code and to understand
pessimal workloads.
Thanks,
NeilBrown
>
> > And md supports assembling devices via sysfs without
> > requiring mdadm which is a nice feature.
>
> Didn't know that, I'll have to look at that. If nothing else
> consistency is good...
>
> > Also has the benefit of reusing the distro installation / boot
> > enabling for md devices which turned out to be a bit of work when
> > enabling external-metadata in md.
>
> Dunno what you mean about external metadata, but it would be nice to
> not have to do anything to userspace to boot from a bcache device. As
> is though it's only a couple lines of bash you have to drop in your
> initramfs.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]
next prev parent reply other threads:[~2011-09-19 7:16 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-10 6:45 [GIT] Bcache version 12 Kent Overstreet
2011-09-11 6:18 ` NeilBrown
2011-09-11 19:23 ` Kent Overstreet
[not found] ` <FD294A0B-7127-4ED1-89B8-3E3ADA796360@dilger.ca>
[not found] ` <FD294A0B-7127-4ED1-89B8-3E3ADA796360-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2011-09-12 1:44 ` Kent Overstreet
2011-09-15 21:15 ` Dan Williams
[not found] ` <CAA9_cmeqevWoK=9WMD9c+csc8SbaYq0aK9j1qWr_0FEa6jWZEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-15 21:33 ` Kent Overstreet
[not found] ` <CAC7rs0t_J+foaLZSuuw5BhpUAYfr-KY1iegFOxEBPCpbrkk1Dg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-19 7:16 ` NeilBrown [this message]
2011-09-21 2:54 ` Kent Overstreet
2011-09-29 23:38 ` Dan Williams
[not found] ` <CAA9_cmfOdv4ozkz7bd2QsbL5_VtAraMZMXoo0AAV0eCgNQr62Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-30 7:14 ` Kent Overstreet
2011-09-30 19:47 ` Williams, Dan J
2011-09-15 22:03 ` Dan Williams
2011-09-15 22:07 ` Kent Overstreet
2011-09-19 7:28 ` Pekka Enberg
[not found] ` <CAOJsxLFPODubVEB3Tjg54C7jDKM8H-RCM_u5kvO1D0kKyjUYXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-21 2:55 ` Kent Overstreet
2011-09-21 5:33 ` Pekka Enberg
2011-09-21 5:42 ` Pekka Enberg
2011-09-21 5:57 ` Kent Overstreet
2011-10-06 17:58 ` Pavel Machek
2011-10-10 12:35 ` LuVar
2011-09-20 15:37 ` Arnd Bergmann
2011-09-21 3:44 ` Kent Overstreet
2011-09-21 9:19 ` Arnd Bergmann
2011-09-22 4:07 ` Kent Overstreet
[not found] <1280519620.12031317482084581.JavaMail.root@shiva>
2011-10-01 15:19 ` LuVar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110919171606.3640c102@notabene.brown \
--to=neilb-l3a5bk7wagm@public.gmane.org \
--cc=adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=dan.j.williams-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rdunlap-/UHa2rfvQTnk1uMJSBkQmQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).