* [ANNOUNCE] bcachefs!
@ 2015-07-14 0:58 Kent Overstreet
[not found] ` <CACaajQtwx45r8GcRmchrQwDts1GH-V8g0x1FwGfDvnfm02bq+Q@mail.gmail.com>
` (3 more replies)
0 siblings, 4 replies; 36+ messages in thread
From: Kent Overstreet @ 2015-07-14 0:58 UTC (permalink / raw)
To: linux-bcache; +Cc: sviatoslavpestov, mrubin, adam.berkan, zab, rickyb
Short announcement, because I'm in the process of moving - but I wanted to get
this out there because the code is up and I think it's reasonably stable right
now.
Bcachefs is a posix filesystem that I've been working towards for - well, quite
awhile now: it's intended as a competitor/replacement for ext4/xfs/btrfs.
Current features
- multiple devices
- replication
- tiering
- data checksumming and compression (zlib only; also the code doesn't work with
tiering yet)
- most of the normal posix fs features (no fallocate or quotas yet)
Planned features:
- snapshots!
- erasure coding
- more
There will be a longer announcement on LKML/linux-fs in the near future (after
I'm finished moving) - but I'd like to get it a bit more testing from a wider
audience first, if possible.
You need the bcache-dev branch, and the new bcache tools - be warned, this code
is _not_ compatible with the upstream bcache on disk format:
$ git clone -b bcache-dev http://evilpiepirate.org/git/linux-bcache.git
$ git clone -b dev http://evilpiepirate.org/git/bcache-tools.git
Then do the usual compiling...
# bcacheadm format -C /dev/sda1
# mount /dev/sda1 /mnt
The usual caveats apply - it might eat your data, the on disk format has _not_
been stabilized yet, etc. But it's been reasonably stable for me, and passes all
but 2-3 of the supported xfstests.
Try it out and let me know how it goes!
Also, programmers please check out the bcache guide - feedback is appreciated:
http://bcache.evilpiepirate.org/BcacheGuide/
Thanks!
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
[not found] ` <CACaajQtwx45r8GcRmchrQwDts1GH-V8g0x1FwGfDvnfm02bq+Q@mail.gmail.com>
@ 2015-07-14 8:11 ` Kent Overstreet
2015-07-20 1:11 ` Denis Bychkov
0 siblings, 1 reply; 36+ messages in thread
From: Kent Overstreet @ 2015-07-14 8:11 UTC (permalink / raw)
To: Vasiliy Tolstov
Cc: zab, linux-bcache, rickyb, mrubin, sviatoslavpestov, adam.berkan
On Tue, Jul 14, 2015 at 09:05:39AM +0300, Vasiliy Tolstov wrote:
> Does it support discards?
Yes
> Does this announce means that bcache block device
> no longer maintained by developers?
There's no plural, it's just me :)
I've been overly stressed and burned out from a startup gone horribly wrong -
I'm going to try to start doing some maintainence of upstream bcache again once
I've moved.
I apologize for my absence on the list these past months, I've had (and still
have) entirely too much to juggle - hint hint, if anyone wants to jump in and
help out.
> What about performance tests with compared to plain btrfs/ext4?
I haven't done much comparison benchmarking yet, I'll post them once I have. If
anyone else wants to do some benchmarking I'd love to see the results.
(From the testing I have done - untarring the kernel (on a fast device, with the
tar file already uncompressed and in kernel so we're purely cpu bound) -
bcachefs is equal to ext4 to within the margin of error. On dbench when we're
purely cpu bound, bcachefs is roughly 30% off from ext4. I think the majority of
that is bcachefs's dirent code being somewhat cpu heavy).
The long term goal is to be at least as fast on ext4/xfs for any given workload,
and on typical workloads we ought to be faster - and in particular bcachefs
should have better and more predictable latency, due to the way journalling
works.
> Format command have one device, how provide tiering?
--tier specifies the tier of the devices that come after it, where the smaller
index is the faster tier.
Only tiers 0 and 1 are supported for now, that will be increased whenever
someone gets around to it.
If /dev/sda is your fast device and /dev/sdb is your slow device, run
# bcacheadm format -C /dev/sda --tier 1 /dev/sdb
bcacheadm format --help gives you the full list of options.
> 14 июля 2015 г. 3:58 пользователь "Kent Overstreet" <
> kent.overstreet@gmail.com> написал:
>
> > Short announcement, because I'm in the process of moving - but I wanted to
> > get
> > this out there because the code is up and I think it's reasonably stable
> > right
> > now.
> >
> > Bcachefs is a posix filesystem that I've been working towards for - well,
> > quite
> > awhile now: it's intended as a competitor/replacement for ext4/xfs/btrfs.
> >
> > Current features
> > - multiple devices
> > - replication
> > - tiering
> > - data checksumming and compression (zlib only; also the code doesn't
> > work with
> > tiering yet)
> > - most of the normal posix fs features (no fallocate or quotas yet)
> >
> > Planned features:
> > - snapshots!
> > - erasure coding
> > - more
> >
> > There will be a longer announcement on LKML/linux-fs in the near future
> > (after
> > I'm finished moving) - but I'd like to get it a bit more testing from a
> > wider
> > audience first, if possible.
> >
> > You need the bcache-dev branch, and the new bcache tools - be warned, this
> > code
> > is _not_ compatible with the upstream bcache on disk format:
> >
> > $ git clone -b bcache-dev http://evilpiepirate.org/git/linux-bcache.git
> > $ git clone -b dev http://evilpiepirate.org/git/bcache-tools.git
> >
> > Then do the usual compiling...
> >
> > # bcacheadm format -C /dev/sda1
> > # mount /dev/sda1 /mnt
> >
> > The usual caveats apply - it might eat your data, the on disk format has
> > _not_
> > been stabilized yet, etc. But it's been reasonably stable for me, and
> > passes all
> > but 2-3 of the supported xfstests.
> >
> > Try it out and let me know how it goes!
> >
> > Also, programmers please check out the bcache guide - feedback is
> > appreciated:
> >
> > http://bcache.evilpiepirate.org/BcacheGuide/
> >
> > Thanks!
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-14 0:58 [ANNOUNCE] bcachefs! Kent Overstreet
[not found] ` <CACaajQtwx45r8GcRmchrQwDts1GH-V8g0x1FwGfDvnfm02bq+Q@mail.gmail.com>
@ 2015-07-15 6:11 ` Ming Lin
[not found] ` <CAC7rs0sbg2ci6=niQ0X11AONZbr2AOYhRbxfDH_w4N4A7dyPLw@mail.gmail.com>
2015-07-18 0:01 ` Denis Bychkov
2015-07-21 18:37 ` David Mohr
3 siblings, 1 reply; 36+ messages in thread
From: Ming Lin @ 2015-07-15 6:11 UTC (permalink / raw)
To: Kent Overstreet
Cc: linux-bcache, sviatoslavpestov, mrubin, adam.berkan, zab, rickyb
On Mon, 2015-07-13 at 17:58 -0700, Kent Overstreet wrote:
> Short announcement, because I'm in the process of moving - but I wanted to get
> this out there because the code is up and I think it's reasonably stable right
> now.
>
> Bcachefs is a posix filesystem that I've been working towards for - well, quite
> awhile now: it's intended as a competitor/replacement for ext4/xfs/btrfs.
>
> Current features
> - multiple devices
> - replication
> - tiering
> - data checksumming and compression (zlib only; also the code doesn't work with
> tiering yet)
> - most of the normal posix fs features (no fallocate or quotas yet)
>
> Planned features:
> - snapshots!
> - erasure coding
> - more
>
> There will be a longer announcement on LKML/linux-fs in the near future (after
> I'm finished moving) - but I'd like to get it a bit more testing from a wider
> audience first, if possible.
>
> You need the bcache-dev branch, and the new bcache tools - be warned, this code
> is _not_ compatible with the upstream bcache on disk format:
>
> $ git clone -b bcache-dev http://evilpiepirate.org/git/linux-bcache.git
> $ git clone -b dev http://evilpiepirate.org/git/bcache-tools.git
>
> Then do the usual compiling...
>
> # bcacheadm format -C /dev/sda1
> # mount /dev/sda1 /mnt
How to mount it?
root@afa03:~# bcacheadm format -C /dev/sdt
UUID: 87b0f1e2-e0dc-4453-b0c0-6afca64d402c
Set UUID: b5f3bc6a-2aab-4fe1-a8db-e35cc763388c
version: 6
nbuckets: 143051
block_size: 1
bucket_size: 4096
nr_in_set: 1
nr_this_dev: 0
first_bucket: 3
root@afa03:~# mount /dev/sdt /mnt/
mount: mount(2) failed: No such file or directory
root@afa03:~# mount -t bcache /dev/sdt /mnt/
mount: mount(2) failed: No such file or directory
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
[not found] ` <CAC7rs0sbg2ci6=niQ0X11AONZbr2AOYhRbxfDH_w4N4A7dyPLw@mail.gmail.com>
@ 2015-07-15 7:15 ` Ming Lin
2015-07-15 7:39 ` Ming Lin
0 siblings, 1 reply; 36+ messages in thread
From: Ming Lin @ 2015-07-15 7:15 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache@vger.kernel.org
On Tue, 2015-07-14 at 23:58 -0700, Kent Overstreet wrote:
> Can you strace it?
Strange. Now error message changed.
root@afa03:~# mount /dev/sdt /mnt/
mount: mount(2) failed: No such file or directory
strace:
execve("/bin/mount", ["mount", "/dev/sdt", "/mnt/"], [/* 17 vars */]) = 0
brk(0) = 0x17c0000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe72a554000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=80188, ...}) = 0
mmap(NULL, 80188, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe72a540000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libmount.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\216\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=274504, ...}) = 0
mmap(NULL, 2373920, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe72a0ef000
mprotect(0x7fe72a130000, 2097152, PROT_NONE) = 0
mmap(0x7fe72a330000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x41000) = 0x7fe72a330000
mmap(0x7fe72a332000, 2336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe72a332000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\v\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1869392, ...}) = 0
mmap(NULL, 3972864, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe729d25000
mprotect(0x7fe729ee5000, 2097152, PROT_NONE) = 0
mmap(0x7fe72a0e5000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c0000) = 0x7fe72a0e5000
mmap(0x7fe72a0eb000, 16128, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe72a0eb000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libblkid.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\200\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=258376, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe72a53f000
mmap(NULL, 2357800, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe729ae5000
mprotect(0x7fe729b20000, 2097152, PROT_NONE) = 0
mmap(0x7fe729d20000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3b000) = 0x7fe729d20000
mmap(0x7fe729d24000, 2600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe729d24000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300[\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=138400, ...}) = 0
mmap(NULL, 2242448, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe7298c1000
mprotect(0x7fe7298e2000, 2093056, PROT_NONE) = 0
mmap(0x7fe729ae1000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0x7fe729ae1000
mmap(0x7fe729ae3000, 6032, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe729ae3000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libuuid.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200\25\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=19000, ...}) = 0
mmap(NULL, 2113920, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe7296bc000
mprotect(0x7fe7296c0000, 2093056, PROT_NONE) = 0
mmap(0x7fe7298bf000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7fe7298bf000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpcre.so.3", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\26\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=444344, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe72a53e000
mmap(NULL, 2539880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe72944f000
mprotect(0x7fe7294bb000, 2093056, PROT_NONE) = 0
mmap(0x7fe7296ba000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6b000) = 0x7fe7296ba000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14592, ...}) = 0
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe72924b000
mprotect(0x7fe72924e000, 2093056, PROT_NONE) = 0
mmap(0x7fe72944d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7fe72944d000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340`\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=142080, ...}) = 0
mmap(NULL, 2217232, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe72902d000
mprotect(0x7fe729045000, 2097152, PROT_NONE) = 0
mmap(0x7fe729245000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x7fe729245000
mmap(0x7fe729247000, 13584, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe729247000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe72a53d000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe72a53c000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe72a53a000
arch_prctl(ARCH_SET_FS, 0x7fe72a53a840) = 0
mprotect(0x7fe72a0e5000, 16384, PROT_READ) = 0
mprotect(0x7fe729245000, 4096, PROT_READ) = 0
mprotect(0x7fe72944d000, 4096, PROT_READ) = 0
mprotect(0x7fe7296ba000, 4096, PROT_READ) = 0
mprotect(0x7fe7298bf000, 4096, PROT_READ) = 0
mprotect(0x7fe729ae1000, 4096, PROT_READ) = 0
mprotect(0x7fe729d20000, 12288, PROT_READ) = 0
mprotect(0x7fe72a330000, 4096, PROT_READ) = 0
mprotect(0x608000, 4096, PROT_READ) = 0
mprotect(0x7fe72a556000, 4096, PROT_READ) = 0
munmap(0x7fe72a540000, 80188) = 0
set_tid_address(0x7fe72a53ab10) = 3012
set_robust_list(0x7fe72a53ab20, 24) = 0
rt_sigaction(SIGRTMIN, {0x7fe729032bb0, [], SA_RESTORER|SA_SIGINFO, 0x7fe72903dd10}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7fe729032c40, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7fe72903dd10}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
statfs("/sys/fs/selinux", 0x7ffe81f62fe0) = -1 ENOENT (No such file or directory)
statfs("/selinux", 0x7ffe81f62fe0) = -1 ENOENT (No such file or directory)
brk(0) = 0x17c0000
brk(0x17e1000) = 0x17e1000
open("/proc/filesystems", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe72a553000
read(3, "nodev\tsysfs\nnodev\trootfs\nnodev\tr"..., 1024) = 351
read(3, "", 1024) = 0
close(3) = 0
munmap(0x7fe72a553000, 4096) = 0
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1852464, ...}) = 0
mmap(NULL, 1852464, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe72a375000
close(3) = 0
getuid() = 0
geteuid() = 0
lstat("/dev", {st_mode=S_IFDIR|0755, st_size=5260, ...}) = 0
lstat("/dev/sdt", {st_mode=S_IFBLK|0660, st_rdev=makedev(65, 48), ...}) = 0
stat("/dev/sdt", {st_mode=S_IFBLK|0660, st_rdev=makedev(65, 48), ...}) = 0
lstat("/mnt", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
access("/dev/sdt", F_OK) = 0
open("/dev/sdt", O_RDONLY|O_CLOEXEC) = 3
fadvise64(3, 0, 0, POSIX_FADV_RANDOM) = 0
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(65, 48), ...}) = 0
uname({sys="Linux", node="msl-lab-afa03", ...}) = 0
ioctl(3, BLKGETSIZE64, 300000000000) = 0
open("/sys/dev/block/65:48", O_RDONLY|O_CLOEXEC) = 4
openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
close(4) = 0
open("/sys/dev/block/65:48", O_RDONLY|O_CLOEXEC) = 4
newfstatat(4, "partition", 0x7ffe81f61a30, 0) = -1 ENOENT (No such file or directory)
openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
close(4) = 0
ioctl(3, CDROM_GET_CAPABILITY or SNDRV_SEQ_IOCTL_UNSUBSCRIBE_PORT, 0) = -1 EINVAL (Invalid argument)
lseek(3, 299999887360, SEEK_SET) = 299999887360
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 64) = 64
lseek(3, 299999989760, SEEK_SET) = 299999989760
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256
lseek(3, 0, SEEK_SET) = 0
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256
lseek(3, 4096, SEEK_SET) = 4096
read(3, "\341\362\332\304P'\235\256\10\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\306\205s\366N\32E\312"..., 256) = 256
lseek(3, 299999999488, SEEK_SET) = 299999999488
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
lseek(3, 299999868416, SEEK_SET) = 299999868416
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
lseek(3, 299999998976, SEEK_SET) = 299999998976
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 48) = 48
lseek(3, 299999967744, SEEK_SET) = 299999967744
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999869440, SEEK_SET) = 299999869440
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999868928, SEEK_SET) = 299999868928
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999991808, SEEK_SET) = 299999991808
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999795712, SEEK_SET) = 299999795712
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999697408, SEEK_SET) = 299999697408
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999654400, SEEK_SET) = 299999654400
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999623680, SEEK_SET) = 299999623680
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999533568, SEEK_SET) = 299999533568
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999501312, SEEK_SET) = 299999501312
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999492608, SEEK_SET) = 299999492608
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999513088, SEEK_SET) = 299999513088
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299998419456, SEEK_SET) = 299998419456
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 299999994368, SEEK_SET) = 299999994368
read(3, "\0\0\0\0", 4) = 4
lseek(3, 4096, SEEK_SET) = 4096
read(3, "\341\362\332\304P'\235\256\10\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\306\205s\366N\32E\312"..., 1024) = 1024
lseek(3, 4096, SEEK_SET) = 4096
read(3, "\341\362\332\304P'\235\256\10\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\306\205s\366N\32E\312"..., 2256) = 2256
lseek(3, 299999995904, SEEK_SET) = 299999995904
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(3, 0, SEEK_SET) = 0
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
lseek(3, 1024, SEEK_SET) = 1024
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
lseek(3, 1048576, SEEK_SET) = 1048576
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
lseek(3, 3072, SEEK_SET) = 3072
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
lseek(3, 7168, SEEK_SET) = 7168
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
lseek(3, 15360, SEEK_SET) = 15360
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
lseek(3, 31744, SEEK_SET) = 31744
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
lseek(3, 64512, SEEK_SET) = 64512
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
lseek(3, 0, SEEK_SET) = 0
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe728fec000
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 262144) = 262144
lseek(3, 393216, SEEK_SET) = 393216
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 397312, SEEK_SET) = 397312
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 401408, SEEK_SET) = 401408
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 405504, SEEK_SET) = 405504
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 409600, SEEK_SET) = 409600
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 413696, SEEK_SET) = 413696
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 417792, SEEK_SET) = 417792
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 421888, SEEK_SET) = 421888
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 425984, SEEK_SET) = 425984
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 430080, SEEK_SET) = 430080
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 434176, SEEK_SET) = 434176
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 438272, SEEK_SET) = 438272
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 442368, SEEK_SET) = 442368
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 446464, SEEK_SET) = 446464
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 450560, SEEK_SET) = 450560
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 454656, SEEK_SET) = 454656
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 458752, SEEK_SET) = 458752
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 462848, SEEK_SET) = 462848
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 466944, SEEK_SET) = 466944
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 471040, SEEK_SET) = 471040
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 475136, SEEK_SET) = 475136
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 479232, SEEK_SET) = 479232
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 483328, SEEK_SET) = 483328
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 487424, SEEK_SET) = 487424
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 491520, SEEK_SET) = 491520
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 495616, SEEK_SET) = 495616
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 499712, SEEK_SET) = 499712
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 503808, SEEK_SET) = 503808
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 507904, SEEK_SET) = 507904
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 512000, SEEK_SET) = 512000
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 516096, SEEK_SET) = 516096
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 520192, SEEK_SET) = 520192
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 41) = 41
lseek(3, 262144, SEEK_SET) = 262144
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1377) = 1377
lseek(3, 2097152, SEEK_SET) = 2097152
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
ioctl(3, BLKSSZGET, 512) = 0
close(3) = 0
munmap(0x7fe728fec000, 266240) = 0
stat("/sbin/mount.bcache", 0x7ffe81f61d60) = -1 ENOENT (No such file or directory)
stat("/sbin/fs.d/mount.bcache", 0x7ffe81f61d60) = -1 ENOENT (No such file or directory)
stat("/sbin/fs/mount.bcache", 0x7ffe81f61d60) = -1 ENOENT (No such file or directory)
getuid() = 0
geteuid() = 0
getgid() = 0
getegid() = 0
prctl(PR_GET_DUMPABLE) = 1
getuid() = 0
geteuid() = 0
getgid() = 0
getegid() = 0
prctl(PR_GET_DUMPABLE) = 1
stat("/run", {st_mode=S_IFDIR|0755, st_size=760, ...}) = 0
lstat("/etc/mtab", {st_mode=S_IFLNK|0777, st_size=12, ...}) = 0
lstat("/run/mount/utab", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
open("/run/mount/utab", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 3
close(3) = 0
mount("/dev/sdt", "/mnt", "bcache", MS_MGC_VAL, NULL) = -1 ENOENT (No such file or directory)
lstat("/mnt", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/mnt", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/dev/sdt", {st_mode=S_IFBLK|0660, st_rdev=makedev(65, 48), ...}) = 0
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2570, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe72a553000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2570
read(3, "", 4096) = 0
close(3) = 0
munmap(0x7fe72a553000, 4096) = 0
open("/usr/share/locale/en_US/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/en_US/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/en/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "mount: ", 7mount: ) = 7
write(2, "mount(2) failed", 15mount(2) failed) = 15
write(2, ": ", 2: ) = 2
open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "No such file or directory\n", 26No such file or directory
) = 26
close(1) = 0
close(2) = 0
exit_group(32) = ?
+++ exited with 32 +++
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-15 7:15 ` Ming Lin
@ 2015-07-15 7:39 ` Ming Lin
2015-07-17 23:17 ` Kent Overstreet
0 siblings, 1 reply; 36+ messages in thread
From: Ming Lin @ 2015-07-15 7:39 UTC (permalink / raw)
To: Ming Lin; +Cc: Kent Overstreet, linux-bcache@vger.kernel.org
On Wed, Jul 15, 2015 at 12:15 AM, Ming Lin <mlin@kernel.org> wrote:
> On Tue, 2015-07-14 at 23:58 -0700, Kent Overstreet wrote:
>> Can you strace it?
>
> Strange. Now error message changed.
I mean sometimes it showed:
mount: /dev/sdt already mounted or /mnt/ busy
>
> root@afa03:~# mount /dev/sdt /mnt/
> mount: mount(2) failed: No such file or directory
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-15 7:39 ` Ming Lin
@ 2015-07-17 23:17 ` Kent Overstreet
2015-07-17 23:35 ` Ming Lin
0 siblings, 1 reply; 36+ messages in thread
From: Kent Overstreet @ 2015-07-17 23:17 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-bcache@vger.kernel.org
On Wed, Jul 15, 2015 at 12:39:36AM -0700, Ming Lin wrote:
> On Wed, Jul 15, 2015 at 12:15 AM, Ming Lin <mlin@kernel.org> wrote:
> > On Tue, 2015-07-14 at 23:58 -0700, Kent Overstreet wrote:
> >> Can you strace it?
> >
> > Strange. Now error message changed.
>
> I mean sometimes it showed:
>
> mount: /dev/sdt already mounted or /mnt/ busy
I have no idea what's going on, it works for me - is there anything unusual
about your setup? what kind of block device is /dev/sdt? is there any chance
there's another process that has it open? maybe try rebooting?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-17 23:17 ` Kent Overstreet
@ 2015-07-17 23:35 ` Ming Lin
2015-07-17 23:40 ` Kent Overstreet
0 siblings, 1 reply; 36+ messages in thread
From: Ming Lin @ 2015-07-17 23:35 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache@vger.kernel.org
On Fri, 2015-07-17 at 16:17 -0700, Kent Overstreet wrote:
> On Wed, Jul 15, 2015 at 12:39:36AM -0700, Ming Lin wrote:
> > On Wed, Jul 15, 2015 at 12:15 AM, Ming Lin <mlin@kernel.org> wrote:
> > > On Tue, 2015-07-14 at 23:58 -0700, Kent Overstreet wrote:
> > >> Can you strace it?
> > >
> > > Strange. Now error message changed.
> >
> > I mean sometimes it showed:
> >
> > mount: /dev/sdt already mounted or /mnt/ busy
>
> I have no idea what's going on, it works for me - is there anything unusual
> about your setup? what kind of block device is /dev/sdt? is there any chance
> there's another process that has it open? maybe try rebooting?
It's a regular HDD. I tried rebooting several times.
Now I try in qemu-kvm. Only the first time it can be mounted.
On host: qemu-img create hdd1.img 20G
On guest: it's /dev/vda
root@block:~# bcacheadm format -C /dev/vda
UUID: 4730ed95-4c57-42db-856c-dbce36085625
Set UUID: e69ef0e0-0344-40d7-a6b1-c23d14745a32
version: 6
nbuckets: 40960
block_size: 1
bucket_size: 1024
nr_in_set: 1
nr_this_dev: 0
first_bucket: 3
root@block:~# mount -t bcache /dev/vda /mnt/
root@block:~# mount |grep bcache
/dev/vda on /mnt type bcache (rw,relatime)
root@block:~# reboot
root@block:~# dmesg |grep -i bcache
[ 2.548754] bcache: bch_journal_replay() journal replay done, 1 keys in 1 entries, seq 3
[ 2.636217] bcache: register_cache() registered cache device vda
root@block:~# mount -t bcache /dev/vda /mnt/
mount: No such file or directory
Now dmesg shows:
bcache: bch_open_as_blockdevs() register_cache_set err device already registered
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-17 23:35 ` Ming Lin
@ 2015-07-17 23:40 ` Kent Overstreet
2015-07-17 23:48 ` Ming Lin
0 siblings, 1 reply; 36+ messages in thread
From: Kent Overstreet @ 2015-07-17 23:40 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-bcache@vger.kernel.org
On Fri, Jul 17, 2015 at 04:35:55PM -0700, Ming Lin wrote:
>
> On Fri, 2015-07-17 at 16:17 -0700, Kent Overstreet wrote:
> > On Wed, Jul 15, 2015 at 12:39:36AM -0700, Ming Lin wrote:
> > > On Wed, Jul 15, 2015 at 12:15 AM, Ming Lin <mlin@kernel.org> wrote:
> > > > On Tue, 2015-07-14 at 23:58 -0700, Kent Overstreet wrote:
> > > >> Can you strace it?
> > > >
> > > > Strange. Now error message changed.
> > >
> > > I mean sometimes it showed:
> > >
> > > mount: /dev/sdt already mounted or /mnt/ busy
> >
> > I have no idea what's going on, it works for me - is there anything unusual
> > about your setup? what kind of block device is /dev/sdt? is there any chance
> > there's another process that has it open? maybe try rebooting?
>
> It's a regular HDD. I tried rebooting several times.
>
> Now I try in qemu-kvm. Only the first time it can be mounted.
>
> On host: qemu-img create hdd1.img 20G
> On guest: it's /dev/vda
>
> root@block:~# bcacheadm format -C /dev/vda
> UUID: 4730ed95-4c57-42db-856c-dbce36085625
> Set UUID: e69ef0e0-0344-40d7-a6b1-c23d14745a32
> version: 6
> nbuckets: 40960
> block_size: 1
> bucket_size: 1024
> nr_in_set: 1
> nr_this_dev: 0
> first_bucket: 3
>
> root@block:~# mount -t bcache /dev/vda /mnt/
>
> root@block:~# mount |grep bcache
> /dev/vda on /mnt type bcache (rw,relatime)
>
> root@block:~# reboot
>
> root@block:~# dmesg |grep -i bcache
> [ 2.548754] bcache: bch_journal_replay() journal replay done, 1 keys in 1 entries, seq 3
> [ 2.636217] bcache: register_cache() registered cache device vda
>
>
> root@block:~# mount -t bcache /dev/vda /mnt/
> mount: No such file or directory
>
> Now dmesg shows:
>
> bcache: bch_open_as_blockdevs() register_cache_set err device already registered
Ohhhh.
The cache set is getting registered by the udev hooks. We should be able to
mount it anyways - same as you can mount any other fs in multiple locations.
I won't be able to fix this for at least a couple days, but for now - just
shut it down it via sysfs (echo 1 > /sys/fs/bcache/<uuid>/stop), then mount it.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-17 23:40 ` Kent Overstreet
@ 2015-07-17 23:48 ` Ming Lin
2015-07-17 23:51 ` Kent Overstreet
0 siblings, 1 reply; 36+ messages in thread
From: Ming Lin @ 2015-07-17 23:48 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache@vger.kernel.org
On Fri, 2015-07-17 at 16:40 -0700, Kent Overstreet wrote:
> On Fri, Jul 17, 2015 at 04:35:55PM -0700, Ming Lin wrote:
> >
> > On Fri, 2015-07-17 at 16:17 -0700, Kent Overstreet wrote:
> > > On Wed, Jul 15, 2015 at 12:39:36AM -0700, Ming Lin wrote:
> > > > On Wed, Jul 15, 2015 at 12:15 AM, Ming Lin <mlin@kernel.org> wrote:
> > > > > On Tue, 2015-07-14 at 23:58 -0700, Kent Overstreet wrote:
> > > > >> Can you strace it?
> > > > >
> > > > > Strange. Now error message changed.
> > > >
> > > > I mean sometimes it showed:
> > > >
> > > > mount: /dev/sdt already mounted or /mnt/ busy
> > >
> > > I have no idea what's going on, it works for me - is there anything unusual
> > > about your setup? what kind of block device is /dev/sdt? is there any chance
> > > there's another process that has it open? maybe try rebooting?
> >
> > It's a regular HDD. I tried rebooting several times.
> >
> > Now I try in qemu-kvm. Only the first time it can be mounted.
> >
> > On host: qemu-img create hdd1.img 20G
> > On guest: it's /dev/vda
> >
> > root@block:~# bcacheadm format -C /dev/vda
> > UUID: 4730ed95-4c57-42db-856c-dbce36085625
> > Set UUID: e69ef0e0-0344-40d7-a6b1-c23d14745a32
> > version: 6
> > nbuckets: 40960
> > block_size: 1
> > bucket_size: 1024
> > nr_in_set: 1
> > nr_this_dev: 0
> > first_bucket: 3
> >
> > root@block:~# mount -t bcache /dev/vda /mnt/
> >
> > root@block:~# mount |grep bcache
> > /dev/vda on /mnt type bcache (rw,relatime)
> >
> > root@block:~# reboot
> >
> > root@block:~# dmesg |grep -i bcache
> > [ 2.548754] bcache: bch_journal_replay() journal replay done, 1 keys in 1 entries, seq 3
> > [ 2.636217] bcache: register_cache() registered cache device vda
> >
> >
> > root@block:~# mount -t bcache /dev/vda /mnt/
> > mount: No such file or directory
> >
> > Now dmesg shows:
> >
> > bcache: bch_open_as_blockdevs() register_cache_set err device already registered
>
> Ohhhh.
>
> The cache set is getting registered by the udev hooks. We should be able to
> mount it anyways - same as you can mount any other fs in multiple locations.
>
> I won't be able to fix this for at least a couple days, but for now - just
> shut it down it via sysfs (echo 1 > /sys/fs/bcache/<uuid>/stop), then mount it.
It works!
Any hint how to fix it? On udev or bcache-tool or kernel?
I'd like to fix it.
root@block:~# echo 1 > /sys/fs/bcache/e69ef0e0-0344-40d7-a6b1-c23d14745a32/stop
root@block:~# mount -t bcache /dev/vda /mnt/
/dev/vda on /mnt type bcache (rw,relatime)
root@block:~# mount |grep bcache
/dev/vda on /mnt type bcache (rw,relatime)
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-17 23:48 ` Ming Lin
@ 2015-07-17 23:51 ` Kent Overstreet
2015-07-17 23:58 ` Ming Lin
0 siblings, 1 reply; 36+ messages in thread
From: Kent Overstreet @ 2015-07-17 23:51 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-bcache@vger.kernel.org
On Fri, Jul 17, 2015 at 04:48:31PM -0700, Ming Lin wrote:
>
> On Fri, 2015-07-17 at 16:40 -0700, Kent Overstreet wrote:
> > On Fri, Jul 17, 2015 at 04:35:55PM -0700, Ming Lin wrote:
> > >
> > > On Fri, 2015-07-17 at 16:17 -0700, Kent Overstreet wrote:
> > > > On Wed, Jul 15, 2015 at 12:39:36AM -0700, Ming Lin wrote:
> > > > > On Wed, Jul 15, 2015 at 12:15 AM, Ming Lin <mlin@kernel.org> wrote:
> > > > > > On Tue, 2015-07-14 at 23:58 -0700, Kent Overstreet wrote:
> > > > > >> Can you strace it?
> > > > > >
> > > > > > Strange. Now error message changed.
> > > > >
> > > > > I mean sometimes it showed:
> > > > >
> > > > > mount: /dev/sdt already mounted or /mnt/ busy
> > > >
> > > > I have no idea what's going on, it works for me - is there anything unusual
> > > > about your setup? what kind of block device is /dev/sdt? is there any chance
> > > > there's another process that has it open? maybe try rebooting?
> > >
> > > It's a regular HDD. I tried rebooting several times.
> > >
> > > Now I try in qemu-kvm. Only the first time it can be mounted.
> > >
> > > On host: qemu-img create hdd1.img 20G
> > > On guest: it's /dev/vda
> > >
> > > root@block:~# bcacheadm format -C /dev/vda
> > > UUID: 4730ed95-4c57-42db-856c-dbce36085625
> > > Set UUID: e69ef0e0-0344-40d7-a6b1-c23d14745a32
> > > version: 6
> > > nbuckets: 40960
> > > block_size: 1
> > > bucket_size: 1024
> > > nr_in_set: 1
> > > nr_this_dev: 0
> > > first_bucket: 3
> > >
> > > root@block:~# mount -t bcache /dev/vda /mnt/
> > >
> > > root@block:~# mount |grep bcache
> > > /dev/vda on /mnt type bcache (rw,relatime)
> > >
> > > root@block:~# reboot
> > >
> > > root@block:~# dmesg |grep -i bcache
> > > [ 2.548754] bcache: bch_journal_replay() journal replay done, 1 keys in 1 entries, seq 3
> > > [ 2.636217] bcache: register_cache() registered cache device vda
> > >
> > >
> > > root@block:~# mount -t bcache /dev/vda /mnt/
> > > mount: No such file or directory
> > >
> > > Now dmesg shows:
> > >
> > > bcache: bch_open_as_blockdevs() register_cache_set err device already registered
> >
> > Ohhhh.
> >
> > The cache set is getting registered by the udev hooks. We should be able to
> > mount it anyways - same as you can mount any other fs in multiple locations.
> >
> > I won't be able to fix this for at least a couple days, but for now - just
> > shut it down it via sysfs (echo 1 > /sys/fs/bcache/<uuid>/stop), then mount it.
>
> It works!
> Any hint how to fix it? On udev or bcache-tool or kernel?
> I'd like to fix it.
The relevant code is in drivers/md/bcache/fs.c, bch_mount() ->
bch_open_as_blockdevs().
Part of the problem is that bcachefs isn't able to use much of the normal
generic mount path for block devices, partly because a fs can span multiple
block devices (same as btrfs).
I'm not sure the right way to fix it - it's going to take some thought, but
we want to do something like "is it already open? just take a ref on the
existing cache set".
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-17 23:51 ` Kent Overstreet
@ 2015-07-17 23:58 ` Ming Lin
2015-07-18 2:10 ` Kent Overstreet
0 siblings, 1 reply; 36+ messages in thread
From: Ming Lin @ 2015-07-17 23:58 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache@vger.kernel.org
On Fri, 2015-07-17 at 16:51 -0700, Kent Overstreet wrote:
> On Fri, Jul 17, 2015 at 04:48:31PM -0700, Ming Lin wrote:
> >
> > On Fri, 2015-07-17 at 16:40 -0700, Kent Overstreet wrote:
> > > On Fri, Jul 17, 2015 at 04:35:55PM -0700, Ming Lin wrote:
> > > >
> > > > On Fri, 2015-07-17 at 16:17 -0700, Kent Overstreet wrote:
> > > > > On Wed, Jul 15, 2015 at 12:39:36AM -0700, Ming Lin wrote:
> > > > > > On Wed, Jul 15, 2015 at 12:15 AM, Ming Lin <mlin@kernel.org> wrote:
> > > > > > > On Tue, 2015-07-14 at 23:58 -0700, Kent Overstreet wrote:
> > > > > > >> Can you strace it?
> > > > > > >
> > > > > > > Strange. Now error message changed.
> > > > > >
> > > > > > I mean sometimes it showed:
> > > > > >
> > > > > > mount: /dev/sdt already mounted or /mnt/ busy
> > > > >
> > > > > I have no idea what's going on, it works for me - is there anything unusual
> > > > > about your setup? what kind of block device is /dev/sdt? is there any chance
> > > > > there's another process that has it open? maybe try rebooting?
> > > >
> > > > It's a regular HDD. I tried rebooting several times.
> > > >
> > > > Now I try in qemu-kvm. Only the first time it can be mounted.
> > > >
> > > > On host: qemu-img create hdd1.img 20G
> > > > On guest: it's /dev/vda
> > > >
> > > > root@block:~# bcacheadm format -C /dev/vda
> > > > UUID: 4730ed95-4c57-42db-856c-dbce36085625
> > > > Set UUID: e69ef0e0-0344-40d7-a6b1-c23d14745a32
> > > > version: 6
> > > > nbuckets: 40960
> > > > block_size: 1
> > > > bucket_size: 1024
> > > > nr_in_set: 1
> > > > nr_this_dev: 0
> > > > first_bucket: 3
> > > >
> > > > root@block:~# mount -t bcache /dev/vda /mnt/
> > > >
> > > > root@block:~# mount |grep bcache
> > > > /dev/vda on /mnt type bcache (rw,relatime)
> > > >
> > > > root@block:~# reboot
> > > >
> > > > root@block:~# dmesg |grep -i bcache
> > > > [ 2.548754] bcache: bch_journal_replay() journal replay done, 1 keys in 1 entries, seq 3
> > > > [ 2.636217] bcache: register_cache() registered cache device vda
> > > >
> > > >
> > > > root@block:~# mount -t bcache /dev/vda /mnt/
> > > > mount: No such file or directory
> > > >
> > > > Now dmesg shows:
> > > >
> > > > bcache: bch_open_as_blockdevs() register_cache_set err device already registered
> > >
> > > Ohhhh.
> > >
> > > The cache set is getting registered by the udev hooks. We should be able to
> > > mount it anyways - same as you can mount any other fs in multiple locations.
> > >
> > > I won't be able to fix this for at least a couple days, but for now - just
> > > shut it down it via sysfs (echo 1 > /sys/fs/bcache/<uuid>/stop), then mount it.
> >
> > It works!
> > Any hint how to fix it? On udev or bcache-tool or kernel?
> > I'd like to fix it.
>
> The relevant code is in drivers/md/bcache/fs.c, bch_mount() ->
> bch_open_as_blockdevs().
>
> Part of the problem is that bcachefs isn't able to use much of the normal
> generic mount path for block devices, partly because a fs can span multiple
> block devices (same as btrfs).
>
> I'm not sure the right way to fix it - it's going to take some thought, but
> we want to do something like "is it already open? just take a ref on the
> existing cache set".
I'll look into it.
And, echo 1 > /sys/fs/bcache/<uuid>/stop, got below.
I'll also try to fix it.
[ 25.826280] ======================================================
[ 25.828038] [ INFO: possible circular locking dependency detected ]
[ 25.828587] 4.1.0-00943-g3683e624 #7 Not tainted
[ 25.828587] -------------------------------------------------------
[ 25.828587] kworker/2:1/660 is trying to acquire lock:
[ 25.828587] (s_active#31){++++.+}, at: [<ffffffff811bc5ee>] kernfs_remove+0x24/0x33
[ 25.828587]
[ 25.828587] but task is already holding lock:
[ 25.828587] (&bch_register_lock){+.+.+.}, at: [<ffffffff815c40e1>] cache_set_flush+0x46/0xa6
[ 25.828587]
[ 25.828587] which lock already depends on the new lock.
[ 25.828587]
[ 25.828587]
[ 25.828587] the existing dependency chain (in reverse order) is:
[ 25.828587]
-> #1 (&bch_register_lock){+.+.+.}:
[ 25.828587] [<ffffffff8108e179>] __lock_acquire+0x73f/0xb0f
[ 25.828587] [<ffffffff8108ecfa>] lock_acquire+0x149/0x25c
[ 25.828587] [<ffffffff816ff284>] mutex_lock_nested+0x6e/0x38f
[ 25.828587] [<ffffffff815c9d01>] bch_cache_set_store+0x2f/0x9e
[ 25.828587] [<ffffffff811bd1ca>] kernfs_fop_write+0x100/0x14a
[ 25.828587] [<ffffffff81154aa5>] __vfs_write+0x26/0xbe
[ 25.828587] [<ffffffff8115511b>] vfs_write+0xbe/0x166
[ 25.828587] [<ffffffff811558b7>] SyS_write+0x51/0x92
[ 25.828587] [<ffffffff81703817>] system_call_fastpath+0x12/0x6f
[ 25.828587]
-> #0 (s_active#31){++++.+}:
[ 25.828587] [<ffffffff8108af24>] validate_chain.isra.31+0x942/0xfc3
[ 25.828587] [<ffffffff8108e179>] __lock_acquire+0x73f/0xb0f
[ 25.828587] [<ffffffff8108ecfa>] lock_acquire+0x149/0x25c
[ 25.828587] [<ffffffff811bba0d>] __kernfs_remove+0x1d1/0x2fd
[ 25.828587] [<ffffffff811bc5ee>] kernfs_remove+0x24/0x33
[ 25.828587] [<ffffffff81402c76>] kobject_del+0x18/0x42
[ 25.828587] [<ffffffff815c40fc>] cache_set_flush+0x61/0xa6
[ 25.828587] [<ffffffff8105ca00>] process_one_work+0x2cc/0x6c4
[ 25.828587] [<ffffffff8105dd21>] worker_thread+0x27a/0x374
[ 25.828587] [<ffffffff81062798>] kthread+0xfb/0x103
[ 25.828587] [<ffffffff81703c02>] ret_from_fork+0x42/0x70
[ 25.828587]
[ 25.828587] other info that might help us debug this:
[ 25.828587]
[ 25.828587] Possible unsafe locking scenario:
[ 25.828587]
[ 25.828587] CPU0 CPU1
[ 25.828587] ---- ----
[ 25.828587] lock(&bch_register_lock);
[ 25.828587] lock(s_active#31);
[ 25.828587] lock(&bch_register_lock);
[ 25.828587] lock(s_active#31);
[ 25.828587]
[ 25.828587] *** DEADLOCK ***
[ 25.828587]
[ 25.828587] 3 locks held by kworker/2:1/660:
[ 25.828587] #0: ("events"){.+.+.+}, at: [<ffffffff8105c8d2>] process_one_work+0x19e/0x6c4
[ 25.828587] #1: ((&cl->work)#3){+.+.+.}, at: [<ffffffff8105c8d2>] process_one_work+0x19e/0x6c4
[ 25.828587] #2: (&bch_register_lock){+.+.+.}, at: [<ffffffff815c40e1>] cache_set_flush+0x46/0xa6
[ 25.828587]
[ 25.828587] stack backtrace:
[ 25.828587] CPU: 2 PID: 660 Comm: kworker/2:1 Not tainted 4.1.0-00943-g3683e624 #7
[ 25.828587] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20150306_163512-brownie 04/01/2014
[ 25.828587] Workqueue: events cache_set_flush
[ 25.828587] ffffffff827d7bd0 ffff880235937a78 ffffffff816fba3b 0000000000000002
[ 25.828587] ffffffff827f10c0 ffff880235937ac8 ffffffff81089bb8 ffff880235937b00
[ 25.828587] ffff880235a46a90 ffff880235937ac8 ffff880235a46a90 ffff880235a47390
[ 25.828587] Call Trace:
[ 25.828587] [<ffffffff816fba3b>] dump_stack+0x4f/0x7b
[ 25.828587] [<ffffffff81089bb8>] print_circular_bug+0x2b1/0x2c2
[ 25.828587] [<ffffffff8108af24>] validate_chain.isra.31+0x942/0xfc3
[ 25.828587] [<ffffffff8108e179>] __lock_acquire+0x73f/0xb0f
[ 25.828587] [<ffffffff8108ecfa>] lock_acquire+0x149/0x25c
[ 25.828587] [<ffffffff811bc5ee>] ? kernfs_remove+0x24/0x33
[ 25.828587] [<ffffffff811bba0d>] __kernfs_remove+0x1d1/0x2fd
[ 25.828587] [<ffffffff811bc5ee>] ? kernfs_remove+0x24/0x33
[ 25.828587] [<ffffffff811bc5ee>] kernfs_remove+0x24/0x33
[ 25.828587] [<ffffffff81402c76>] kobject_del+0x18/0x42
[ 25.828587] [<ffffffff815c40fc>] cache_set_flush+0x61/0xa6
[ 25.828587] [<ffffffff8105ca00>] process_one_work+0x2cc/0x6c4
[ 25.828587] [<ffffffff8105dd21>] worker_thread+0x27a/0x374
[ 25.828587] [<ffffffff8105daa7>] ? rescuer_thread+0x2a6/0x2a6
[ 25.828587] [<ffffffff81062798>] kthread+0xfb/0x103
[ 25.828587] [<ffffffff8108be8a>] ? trace_hardirqs_on_caller+0x1bb/0x1da
[ 25.828587] [<ffffffff8106269d>] ? kthread_create_on_node+0x1c0/0x1c0
[ 25.828587] [<ffffffff81703c02>] ret_from_fork+0x42/0x70
[ 25.828587] [<ffffffff8106269d>] ? kthread_create_on_node+0x1c0/0x1c0
[ 25.952174] bcache: cache_set_free() Cache set 80166ca9-ed99-4eb2-aca3-1f518531ca72 unregistered
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-14 0:58 [ANNOUNCE] bcachefs! Kent Overstreet
[not found] ` <CACaajQtwx45r8GcRmchrQwDts1GH-V8g0x1FwGfDvnfm02bq+Q@mail.gmail.com>
2015-07-15 6:11 ` Ming Lin
@ 2015-07-18 0:01 ` Denis Bychkov
2015-07-18 2:12 ` Kent Overstreet
2015-07-21 18:37 ` David Mohr
3 siblings, 1 reply; 36+ messages in thread
From: Denis Bychkov @ 2015-07-18 0:01 UTC (permalink / raw)
To: Kent Overstreet
Cc: linux-bcache, sviatoslavpestov, mrubin, adam.berkan, zab, rickyb
Hi,
I actually tried to compile that code recently (maybe 5 days ago) and
it did not change since then. There was a bunch of trivial errors,
that I was able to fix - files moved around without #include updated,
types missing. But at some point I ran into something definitely
non-trivial: there is a variable in io.c, which is supposed to have a
node id in it, but the init code commented out and there is no obvious
way to fix it. So I gave up. I definitely tried the branch you are
talking about - bcache-dev.
Here is the I code I mentioned:
static void bch_read_retry(struct bbio *bbio)
{
struct bio *bio = &bbio->bio;
struct bio *parent;
u64 inode;
trace_bcache_read_retry(bio);
/*
* This used to be a leaf bio from bch_read_fn(), but
* since we don't know what happened to the btree in
* the meantime, we have to re-submit it via the
* top-level bch_read() entry point. Before doing that,
* we have to reset the bio, preserving the biovec.
*
* The inode, offset and size come from the bbio's key,
* which was set by bch_read_fn().
*/
//inode = bbio->key.k.p.inode;
parent = bio->bi_private;
bch_bbio_reset(bbio);
bio_chain(bio, parent);
bch_read(bbio->ca->set, bio, inode);
bio_endio(parent, 0); /* for bio_chain() in bch_read_fn() */
bio_endio(bio, 0);
}
On Mon, Jul 13, 2015 at 8:58 PM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> Short announcement, because I'm in the process of moving - but I wanted to get
> this out there because the code is up and I think it's reasonably stable right
> now.
>
> Bcachefs is a posix filesystem that I've been working towards for - well, quite
> awhile now: it's intended as a competitor/replacement for ext4/xfs/btrfs.
>
> Current features
> - multiple devices
> - replication
> - tiering
> - data checksumming and compression (zlib only; also the code doesn't work with
> tiering yet)
> - most of the normal posix fs features (no fallocate or quotas yet)
>
> Planned features:
> - snapshots!
> - erasure coding
> - more
>
> There will be a longer announcement on LKML/linux-fs in the near future (after
> I'm finished moving) - but I'd like to get it a bit more testing from a wider
> audience first, if possible.
>
> You need the bcache-dev branch, and the new bcache tools - be warned, this code
> is _not_ compatible with the upstream bcache on disk format:
>
> $ git clone -b bcache-dev http://evilpiepirate.org/git/linux-bcache.git
> $ git clone -b dev http://evilpiepirate.org/git/bcache-tools.git
>
> Then do the usual compiling...
>
> # bcacheadm format -C /dev/sda1
> # mount /dev/sda1 /mnt
>
> The usual caveats apply - it might eat your data, the on disk format has _not_
> been stabilized yet, etc. But it's been reasonably stable for me, and passes all
> but 2-3 of the supported xfstests.
>
> Try it out and let me know how it goes!
>
> Also, programmers please check out the bcache guide - feedback is appreciated:
>
> http://bcache.evilpiepirate.org/BcacheGuide/
>
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Denis
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-17 23:58 ` Ming Lin
@ 2015-07-18 2:10 ` Kent Overstreet
2015-07-18 5:21 ` Ming Lin
0 siblings, 1 reply; 36+ messages in thread
From: Kent Overstreet @ 2015-07-18 2:10 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-bcache@vger.kernel.org
On Fri, Jul 17, 2015 at 04:58:17PM -0700, Ming Lin wrote:
> On Fri, 2015-07-17 at 16:51 -0700, Kent Overstreet wrote:
> > On Fri, Jul 17, 2015 at 04:48:31PM -0700, Ming Lin wrote:
> > >
> > > On Fri, 2015-07-17 at 16:40 -0700, Kent Overstreet wrote:
> > > > On Fri, Jul 17, 2015 at 04:35:55PM -0700, Ming Lin wrote:
> > > > >
> > > > > On Fri, 2015-07-17 at 16:17 -0700, Kent Overstreet wrote:
> > > > > > On Wed, Jul 15, 2015 at 12:39:36AM -0700, Ming Lin wrote:
> > > > > > > On Wed, Jul 15, 2015 at 12:15 AM, Ming Lin <mlin@kernel.org> wrote:
> > > > > > > > On Tue, 2015-07-14 at 23:58 -0700, Kent Overstreet wrote:
> > > > > > > >> Can you strace it?
> > > > > > > >
> > > > > > > > Strange. Now error message changed.
> > > > > > >
> > > > > > > I mean sometimes it showed:
> > > > > > >
> > > > > > > mount: /dev/sdt already mounted or /mnt/ busy
> > > > > >
> > > > > > I have no idea what's going on, it works for me - is there anything unusual
> > > > > > about your setup? what kind of block device is /dev/sdt? is there any chance
> > > > > > there's another process that has it open? maybe try rebooting?
> > > > >
> > > > > It's a regular HDD. I tried rebooting several times.
> > > > >
> > > > > Now I try in qemu-kvm. Only the first time it can be mounted.
> > > > >
> > > > > On host: qemu-img create hdd1.img 20G
> > > > > On guest: it's /dev/vda
> > > > >
> > > > > root@block:~# bcacheadm format -C /dev/vda
> > > > > UUID: 4730ed95-4c57-42db-856c-dbce36085625
> > > > > Set UUID: e69ef0e0-0344-40d7-a6b1-c23d14745a32
> > > > > version: 6
> > > > > nbuckets: 40960
> > > > > block_size: 1
> > > > > bucket_size: 1024
> > > > > nr_in_set: 1
> > > > > nr_this_dev: 0
> > > > > first_bucket: 3
> > > > >
> > > > > root@block:~# mount -t bcache /dev/vda /mnt/
> > > > >
> > > > > root@block:~# mount |grep bcache
> > > > > /dev/vda on /mnt type bcache (rw,relatime)
> > > > >
> > > > > root@block:~# reboot
> > > > >
> > > > > root@block:~# dmesg |grep -i bcache
> > > > > [ 2.548754] bcache: bch_journal_replay() journal replay done, 1 keys in 1 entries, seq 3
> > > > > [ 2.636217] bcache: register_cache() registered cache device vda
> > > > >
> > > > >
> > > > > root@block:~# mount -t bcache /dev/vda /mnt/
> > > > > mount: No such file or directory
> > > > >
> > > > > Now dmesg shows:
> > > > >
> > > > > bcache: bch_open_as_blockdevs() register_cache_set err device already registered
> > > >
> > > > Ohhhh.
> > > >
> > > > The cache set is getting registered by the udev hooks. We should be able to
> > > > mount it anyways - same as you can mount any other fs in multiple locations.
> > > >
> > > > I won't be able to fix this for at least a couple days, but for now - just
> > > > shut it down it via sysfs (echo 1 > /sys/fs/bcache/<uuid>/stop), then mount it.
> > >
> > > It works!
> > > Any hint how to fix it? On udev or bcache-tool or kernel?
> > > I'd like to fix it.
> >
> > The relevant code is in drivers/md/bcache/fs.c, bch_mount() ->
> > bch_open_as_blockdevs().
> >
> > Part of the problem is that bcachefs isn't able to use much of the normal
> > generic mount path for block devices, partly because a fs can span multiple
> > block devices (same as btrfs).
> >
> > I'm not sure the right way to fix it - it's going to take some thought, but
> > we want to do something like "is it already open? just take a ref on the
> > existing cache set".
>
> I'll look into it.
Thanks
> And, echo 1 > /sys/fs/bcache/<uuid>/stop, got below.
> I'll also try to fix it.
>
> [ 25.826280] ======================================================
> [ 25.828038] [ INFO: possible circular locking dependency detected ]
> [ 25.828587] 4.1.0-00943-g3683e624 #7 Not tainted
I think the correct fix is to change cache_set_flush() to not hold register_lock
while it's calling into the sysfs code. Want to do that, and add a comment so it
doesn't get screwed up again? Also try and make sure we don't actually need
register_lock for what you take out from under it.
BTW - probably the most valuable thing you could help out with is the
documentation, in particular the guide:
http://bcache.evilpiepirate.org/BcacheGuide/
Can you read through (at least some of) that, and tell me what's useful and what
needs clarifying? And tell me what you'd like to see added to the guide next -
I'll try and work on documentation over the next two weeks, since I probably
won't be able to do much real coding with my test machines offline.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-18 0:01 ` Denis Bychkov
@ 2015-07-18 2:12 ` Kent Overstreet
2015-07-19 7:46 ` Denis Bychkov
0 siblings, 1 reply; 36+ messages in thread
From: Kent Overstreet @ 2015-07-18 2:12 UTC (permalink / raw)
To: Denis Bychkov
Cc: linux-bcache, sviatoslavpestov, mrubin, adam.berkan, zab, rickyb
On Fri, Jul 17, 2015 at 08:01:43PM -0400, Denis Bychkov wrote:
> Hi,
>
> I actually tried to compile that code recently (maybe 5 days ago) and
> it did not change since then. There was a bunch of trivial errors,
> that I was able to fix - files moved around without #include updated,
> types missing. But at some point I ran into something definitely
> non-trivial: there is a variable in io.c, which is supposed to have a
> node id in it, but the init code commented out and there is no obvious
> way to fix it. So I gave up. I definitely tried the branch you are
> talking about - bcache-dev.
It didn't build? Weird - can you post some of the compiler errors you were
seeing, as well as your archictecture/gcc version? I don't know of any build
errors right now...
>
> Here is the I code I mentioned:
>
> static void bch_read_retry(struct bbio *bbio)
> {
> struct bio *bio = &bbio->bio;
> struct bio *parent;
> u64 inode;
>
> trace_bcache_read_retry(bio);
>
> /*
> * This used to be a leaf bio from bch_read_fn(), but
> * since we don't know what happened to the btree in
> * the meantime, we have to re-submit it via the
> * top-level bch_read() entry point. Before doing that,
> * we have to reset the bio, preserving the biovec.
> *
> * The inode, offset and size come from the bbio's key,
> * which was set by bch_read_fn().
> */
> //inode = bbio->key.k.p.inode;
> parent = bio->bi_private;
>
> bch_bbio_reset(bbio);
> bio_chain(bio, parent);
>
> bch_read(bbio->ca->set, bio, inode);
> bio_endio(parent, 0); /* for bio_chain() in bch_read_fn() */
> bio_endio(bio, 0);
> }
The read retry path is currently non functional, since I added
checksumming/compression support - the read path needs a fair bit more work. But
that shouldn't cause a build error - and the race the retry path is for is damn
near impossible to trigger without fault injection (I don't think I've ever seen
it happen without fault injection).
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-18 2:10 ` Kent Overstreet
@ 2015-07-18 5:21 ` Ming Lin
2015-07-22 5:11 ` Ming Lin
0 siblings, 1 reply; 36+ messages in thread
From: Ming Lin @ 2015-07-18 5:21 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache@vger.kernel.org
On Fri, 2015-07-17 at 19:10 -0700, Kent Overstreet wrote:
> BTW - probably the most valuable thing you could help out with is the
> documentation, in particular the guide:
> http://bcache.evilpiepirate.org/BcacheGuide/
>
> Can you read through (at least some of) that, and tell me what's useful and what
> needs clarifying? And tell me what you'd like to see added to the guide next -
> I'll try and work on documentation over the next two weeks, since I probably
> won't be able to do much real coding with my test machines offline.
Yes, I'll read through that.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-18 2:12 ` Kent Overstreet
@ 2015-07-19 7:46 ` Denis Bychkov
0 siblings, 0 replies; 36+ messages in thread
From: Denis Bychkov @ 2015-07-19 7:46 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache, Slava Pestov
[-- Attachment #1: Type: text/plain, Size: 2902 bytes --]
Hi,
I managed to build it tonight. Although, it took some tuning. The
problem is with bcache.h header file (the one that resides in
include/trace/events/). It includes "alloc_types.h", but there is no
alloc_types.h in that directory. I think, you use customized Makefile
with extra directories added to -I argument. I fixed it by moving
alloc_types and clock_types headers to include/trace/events/ from the
bcache module directory. I attached the diff if anyone is interested.
To anyone who wants to build this bcache version: you'll need to
either rebase the code on the kernel version you are working with or
extract a diff from bcache-dev branch against 4.1 vanilla kernel and
then apply it to your kernel, then the patch I attached. Or probably
Kent will fix it soon enough, so you won't need silly workarounds.
On Fri, Jul 17, 2015 at 10:12 PM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Fri, Jul 17, 2015 at 08:01:43PM -0400, Denis Bychkov wrote:
>> Hi,
>>
>> I actually tried to compile that code recently (maybe 5 days ago) and
>> it did not change since then. There was a bunch of trivial errors,
>> that I was able to fix - files moved around without #include updated,
>> types missing. But at some point I ran into something definitely
>> non-trivial: there is a variable in io.c, which is supposed to have a
>> node id in it, but the init code commented out and there is no obvious
>> way to fix it. So I gave up. I definitely tried the branch you are
>> talking about - bcache-dev.
>
> It didn't build? Weird - can you post some of the compiler errors you were
> seeing, as well as your archictecture/gcc version? I don't know of any build
> errors right now...
>
>>
>> Here is the I code I mentioned:
>>
>> static void bch_read_retry(struct bbio *bbio)
>> {
>> struct bio *bio = &bbio->bio;
>> struct bio *parent;
>> u64 inode;
>>
>> trace_bcache_read_retry(bio);
>>
>> /*
>> * This used to be a leaf bio from bch_read_fn(), but
>> * since we don't know what happened to the btree in
>> * the meantime, we have to re-submit it via the
>> * top-level bch_read() entry point. Before doing that,
>> * we have to reset the bio, preserving the biovec.
>> *
>> * The inode, offset and size come from the bbio's key,
>> * which was set by bch_read_fn().
>> */
>> //inode = bbio->key.k.p.inode;
>> parent = bio->bi_private;
>>
>> bch_bbio_reset(bbio);
>> bio_chain(bio, parent);
>>
>> bch_read(bbio->ca->set, bio, inode);
>> bio_endio(parent, 0); /* for bio_chain() in bch_read_fn() */
>> bio_endio(bio, 0);
>> }
>
> The read retry path is currently non functional, since I added
> checksumming/compression support - the read path needs a fair bit more work. But
> that shouldn't cause a build error - and the race the retry path is for is damn
> near impossible to trigger without fault injection (I don't think I've ever seen
> it happen without fault injection).
--
Denis
[-- Attachment #2: bcache-fix-includes.patch --]
[-- Type: application/octet-stream, Size: 7020 bytes --]
--- a/drivers/md/bcache/alloc_types.h
+++ /dev/null
@@ -1,97 +0,0 @@
-#ifndef _BCACHE_ALLOC_TYPES_H
-#define _BCACHE_ALLOC_TYPES_H
-
-#include <linux/mutex.h>
-
-#include "clock_types.h"
-
-/*
- * There's two of these clocks, one for reads and one for writes:
- *
- * All fields protected by bucket_lock
- */
-struct prio_clock {
- /*
- * "now" in (read/write) IO time - incremented whenever we do X amount
- * of reads or writes.
- *
- * Goes with the bucket read/write prios: when we read or write to a
- * bucket we reset the bucket's prio to the current hand; thus hand -
- * prio = time since bucket was last read/written.
- *
- * The units are some amount (bytes/sectors) of data read/written, and
- * the units can change on the fly if we need to rescale to fit
- * everything in a u16 - your only guarantee is that the units are
- * consistent.
- */
- u16 hand;
- u16 min_prio;
-
- int rw;
-
- struct io_timer rescale;
-};
-
-/* There is one reserve for each type of btree, one for prios and gens
- * and one for moving GC */
-enum alloc_reserve {
- RESERVE_PRIO,
- RESERVE_BTREE,
- RESERVE_METADATA_LAST = RESERVE_BTREE,
- RESERVE_MOVINGGC,
-
- RESERVE_NONE,
- RESERVE_NR,
-};
-
-static inline bool allocation_is_metadata(enum alloc_reserve id)
-{
- return id <= RESERVE_METADATA_LAST;
-}
-
-/* Enough for 16 cache devices, 2 tiers and some left over for pipelining */
-#define OPEN_BUCKETS_COUNT 256
-
-#define WRITE_POINT_COUNT 16
-
-struct open_bucket {
- struct list_head list;
- struct mutex lock;
- atomic_t pin;
- unsigned sectors_free;
- unsigned nr_ptrs;
- struct bch_extent_ptr ptrs[BKEY_EXTENT_PTRS_MAX];
-};
-
-struct write_point {
- struct open_bucket *b;
-
- /*
- * Throttle writes to this write point if tier 0 is full?
- */
- bool throttle;
-
- /*
- * If 0, use the desired replica count for the cache set.
- * Otherwise, this is the number of replicas desired (generally 1).
- */
- unsigned nr_replicas;
-
- /*
- * Bucket reserve to allocate from.
- */
- enum alloc_reserve reserve;
-
- /*
- * If not NULL, cache group for tiering, promotion and moving GC -
- * always allocates a single replica
- */
- struct cache_group *group;
-
- /*
- * Otherwise do a normal replicated bucket allocation that could come
- * from any device in tier 0 (foreground write)
- */
-};
-
-#endif /* _BCACHE_ALLOC_TYPES_H */
--- a/drivers/md/bcache/clock_types.h
+++ /dev/null
@@ -1,32 +0,0 @@
-#ifndef _BCACHE_CLOCK_TYPES_H
-#define _BCACHE_CLOCK_TYPES_H
-
-#define NR_IO_TIMERS 8
-
-/*
- * Clocks/timers in units of sectors of IO:
- *
- * Note - they use percpu batching, so they're only approximate.
- */
-
-struct io_timer;
-typedef void (*io_timer_fn)(struct io_timer *);
-
-struct io_timer {
- io_timer_fn fn;
- unsigned long expire;
-};
-
-/* Amount to buffer up on a percpu counter */
-#define IO_CLOCK_PCPU_SECTORS 128
-
-struct io_clock {
- atomic_long_t now;
- u16 __percpu *pcpu_buf;
-
- spinlock_t timer_lock;
- DECLARE_HEAP(struct io_timer *, timers);
-};
-
-#endif /* _BCACHE_CLOCK_TYPES_H */
-
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -1649,6 +1649,7 @@
* which was set by bch_read_fn().
*/
//inode = bbio->key.k.p.inode;
+ inode = 0;
parent = bio->bi_private;
bch_bbio_reset(bbio);
--- /dev/null
+++ b/include/trace/events/alloc_types.h
@@ -0,0 +1,97 @@
+#ifndef _BCACHE_ALLOC_TYPES_H
+#define _BCACHE_ALLOC_TYPES_H
+
+#include <linux/mutex.h>
+
+#include "clock_types.h"
+
+/*
+ * There's two of these clocks, one for reads and one for writes:
+ *
+ * All fields protected by bucket_lock
+ */
+struct prio_clock {
+ /*
+ * "now" in (read/write) IO time - incremented whenever we do X amount
+ * of reads or writes.
+ *
+ * Goes with the bucket read/write prios: when we read or write to a
+ * bucket we reset the bucket's prio to the current hand; thus hand -
+ * prio = time since bucket was last read/written.
+ *
+ * The units are some amount (bytes/sectors) of data read/written, and
+ * the units can change on the fly if we need to rescale to fit
+ * everything in a u16 - your only guarantee is that the units are
+ * consistent.
+ */
+ u16 hand;
+ u16 min_prio;
+
+ int rw;
+
+ struct io_timer rescale;
+};
+
+/* There is one reserve for each type of btree, one for prios and gens
+ * and one for moving GC */
+enum alloc_reserve {
+ RESERVE_PRIO,
+ RESERVE_BTREE,
+ RESERVE_METADATA_LAST = RESERVE_BTREE,
+ RESERVE_MOVINGGC,
+
+ RESERVE_NONE,
+ RESERVE_NR,
+};
+
+static inline bool allocation_is_metadata(enum alloc_reserve id)
+{
+ return id <= RESERVE_METADATA_LAST;
+}
+
+/* Enough for 16 cache devices, 2 tiers and some left over for pipelining */
+#define OPEN_BUCKETS_COUNT 256
+
+#define WRITE_POINT_COUNT 16
+
+struct open_bucket {
+ struct list_head list;
+ struct mutex lock;
+ atomic_t pin;
+ unsigned sectors_free;
+ unsigned nr_ptrs;
+ struct bch_extent_ptr ptrs[BKEY_EXTENT_PTRS_MAX];
+};
+
+struct write_point {
+ struct open_bucket *b;
+
+ /*
+ * Throttle writes to this write point if tier 0 is full?
+ */
+ bool throttle;
+
+ /*
+ * If 0, use the desired replica count for the cache set.
+ * Otherwise, this is the number of replicas desired (generally 1).
+ */
+ unsigned nr_replicas;
+
+ /*
+ * Bucket reserve to allocate from.
+ */
+ enum alloc_reserve reserve;
+
+ /*
+ * If not NULL, cache group for tiering, promotion and moving GC -
+ * always allocates a single replica
+ */
+ struct cache_group *group;
+
+ /*
+ * Otherwise do a normal replicated bucket allocation that could come
+ * from any device in tier 0 (foreground write)
+ */
+};
+
+#endif /* _BCACHE_ALLOC_TYPES_H */
--- /dev/null
+++ b/include/trace/events/clock_types.h
@@ -0,0 +1,32 @@
+#ifndef _BCACHE_CLOCK_TYPES_H
+#define _BCACHE_CLOCK_TYPES_H
+
+#define NR_IO_TIMERS 8
+
+/*
+ * Clocks/timers in units of sectors of IO:
+ *
+ * Note - they use percpu batching, so they're only approximate.
+ */
+
+struct io_timer;
+typedef void (*io_timer_fn)(struct io_timer *);
+
+struct io_timer {
+ io_timer_fn fn;
+ unsigned long expire;
+};
+
+/* Amount to buffer up on a percpu counter */
+#define IO_CLOCK_PCPU_SECTORS 128
+
+struct io_clock {
+ atomic_long_t now;
+ u16 __percpu *pcpu_buf;
+
+ spinlock_t timer_lock;
+ DECLARE_HEAP(struct io_timer *, timers);
+};
+
+#endif /* _BCACHE_CLOCK_TYPES_H */
+
--- a/drivers/md/bcache/alloc.h
+++ b/drivers/md/bcache/alloc.h
@@ -1,7 +1,7 @@
#ifndef _BCACHE_ALLOC_H
#define _BCACHE_ALLOC_H
-#include "alloc_types.h"
+#include <trace/events/alloc_types.h>
struct bkey;
struct bucket;
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -207,10 +207,10 @@
#define bch_meta_write_fault(name) \
dynamic_fault("bcache:meta:write:" name)
-#include "alloc_types.h"
+#include <trace/events/alloc_types.h>
#include "blockdev_types.h"
#include "buckets_types.h"
-#include "clock_types.h"
+#include <trace/events/clock_types.h>
#include "io_types.h"
#include "journal_types.h"
#include "keylist_types.h"
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-14 8:11 ` Kent Overstreet
@ 2015-07-20 1:11 ` Denis Bychkov
[not found] ` <CAC7rs0uWSt85F443PRw1zvybccg+EfebaSyH9EhUwHjhTGryRA@mail.gmail.com>
0 siblings, 1 reply; 36+ messages in thread
From: Denis Bychkov @ 2015-07-20 1:11 UTC (permalink / raw)
To: Kent Overstreet
Cc: Vasiliy Tolstov, zab, linux-bcache, rickyb, mrubin, Slava Pestov,
adam.berkan
On Tue, Jul 14, 2015 at 4:11 AM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Tue, Jul 14, 2015 at 09:05:39AM +0300, Vasiliy Tolstov wrote:
>> Does it support discards?
>
>> Format command have one device, how provide tiering?
>
> --tier specifies the tier of the devices that come after it, where the smaller
> index is the faster tier.
>
> Only tiers 0 and 1 are supported for now, that will be increased whenever
> someone gets around to it.
>
> If /dev/sda is your fast device and /dev/sdb is your slow device, run
>
> # bcacheadm format -C /dev/sda --tier 1 /dev/sdb
>
> bcacheadm format --help gives you the full list of options.
Ok, I am really confused right now. So, the format utility still
allows -B (backing device). Is it just an artifact left over after
block caching? Because I could not make it work. Any thread
encountering the formatted backing device just hangs inside the
kernel space until reboot. Is there something wrong with my setup or
backing devices are just an illusion, a figment of kernel's
imagination? And, if not, how do they relate to tiers?
Could you give a simple example of a usual setup - one small and fast
SSD and huge but slow RAID-6? Should I now format both partitions as
-C and assign tier 1 to RAID-6?
>
>> 14 июля 2015 г. 3:58 пользователь "Kent Overstreet" <
>> kent.overstreet@gmail.com> написал:
>>
>> > Short announcement, because I'm in the process of moving - but I wanted to
>> > get
>> > this out there because the code is up and I think it's reasonably stable
>> > right
>> > now.
>> >
>> > Bcachefs is a posix filesystem that I've been working towards for - well,
>> > quite
>> > awhile now: it's intended as a competitor/replacement for ext4/xfs/btrfs.
>> >
>> > Current features
>> > - multiple devices
>> > - replication
>> > - tiering
>> > - data checksumming and compression (zlib only; also the code doesn't
>> > work with
>> > tiering yet)
>> > - most of the normal posix fs features (no fallocate or quotas yet)
>> >
>> > Planned features:
>> > - snapshots!
>> > - erasure coding
>> > - more
>> >
>> > There will be a longer announcement on LKML/linux-fs in the near future
>> > (after
>> > I'm finished moving) - but I'd like to get it a bit more testing from a
>> > wider
>> > audience first, if possible.
>> >
>> > You need the bcache-dev branch, and the new bcache tools - be warned, this
>> > code
>> > is _not_ compatible with the upstream bcache on disk format:
>> >
>> > $ git clone -b bcache-dev http://evilpiepirate.org/git/linux-bcache.git
>> > $ git clone -b dev http://evilpiepirate.org/git/bcache-tools.git
>> >
>> > Then do the usual compiling...
>> >
>> > # bcacheadm format -C /dev/sda1
>> > # mount /dev/sda1 /mnt
>> >
>> > The usual caveats apply - it might eat your data, the on disk format has
>> > _not_
>> > been stabilized yet, etc. But it's been reasonably stable for me, and
>> > passes all
>> > but 2-3 of the supported xfstests.
>> >
>> > Try it out and let me know how it goes!
>> >
>> > Also, programmers please check out the bcache guide - feedback is
>> > appreciated:
>> >
>> > http://bcache.evilpiepirate.org/BcacheGuide/
>> >
>> > Thanks!
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Denis
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
[not found] ` <CAC7rs0upqkuH1CPd-OAmrpQ=8PmaDpzHYY1MaBDpAL6TS_iKyw@mail.gmail.com>
@ 2015-07-20 2:52 ` Denis Bychkov
2015-07-24 19:25 ` Kent Overstreet
0 siblings, 1 reply; 36+ messages in thread
From: Denis Bychkov @ 2015-07-20 2:52 UTC (permalink / raw)
To: Kent Overstreet
Cc: Adam Berkan, linux-bcache, Vasiliy Tolstov, Michael Rubin,
Slava Pestov, zab, Ricky Benitez
I don't think I found anything in the design description or anywhere
else explaining how tiering works and what data, when and why ends up
on the next tier. And how to control this. The old bcache has a pretty
advanced set of knobs allowing you to fine-tune this behavior
(read-ahead limit, sequential cutoff, congestion thresholds, etc.) If
I overlooked, please point me to the right direction.
On Sun, Jul 19, 2015 at 9:44 PM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> And yes, format ask drives as -C and use --tier for the slower drives.
>
> -B means an externally managed device - bcache doesn't remap anything to it,
> it exposes a passthrough block device just so it can snoop on accesses to it
> and cache them. The functionality still exists in the bcache codebase, but
> has no meaning in bcachefs land
>
> On Jul 19, 2015 6:41 PM, kent.overstreet@gmail.com wrote:
>>
>> Probably broken because I haven't tested them recently. Once I've unpacked
>> my computers (in a few weeks) I'll debug it
>>
>> On Jul 19, 2015 6:11 PM, "Denis Bychkov" <manover@gmail.com> wrote:
>>>
>>> On Tue, Jul 14, 2015 at 4:11 AM, Kent Overstreet
>>> <kent.overstreet@gmail.com> wrote:
>>> > On Tue, Jul 14, 2015 at 09:05:39AM +0300, Vasiliy Tolstov wrote:
>>> >> Does it support discards?
>>> >
>>> >> Format command have one device, how provide tiering?
>>> >
>>> > --tier specifies the tier of the devices that come after it, where the
>>> > smaller
>>> > index is the faster tier.
>>> >
>>> > Only tiers 0 and 1 are supported for now, that will be increased
>>> > whenever
>>> > someone gets around to it.
>>> >
>>> > If /dev/sda is your fast device and /dev/sdb is your slow device, run
>>> >
>>> > # bcacheadm format -C /dev/sda --tier 1 /dev/sdb
>>> >
>>> > bcacheadm format --help gives you the full list of options.
>>>
>>> Ok, I am really confused right now. So, the format utility still
>>> allows -B (backing device). Is it just an artifact left over after
>>> block caching? Because I could not make it work. Any thread
>>> encountering the formatted backing device just hangs inside the
>>> kernel space until reboot. Is there something wrong with my setup or
>>> backing devices are just an illusion, a figment of kernel's
>>> imagination? And, if not, how do they relate to tiers?
>>> Could you give a simple example of a usual setup - one small and fast
>>> SSD and huge but slow RAID-6? Should I now format both partitions as
>>> -C and assign tier 1 to RAID-6?
>>>
>>> >
>>> >> 14 июля 2015 г. 3:58 пользователь "Kent Overstreet" <
>>> >> kent.overstreet@gmail.com> написал:
>>> >>
>>> >> > Short announcement, because I'm in the process of moving - but I
>>> >> > wanted to
>>> >> > get
>>> >> > this out there because the code is up and I think it's reasonably
>>> >> > stable
>>> >> > right
>>> >> > now.
>>> >> >
>>> >> > Bcachefs is a posix filesystem that I've been working towards for -
>>> >> > well,
>>> >> > quite
>>> >> > awhile now: it's intended as a competitor/replacement for
>>> >> > ext4/xfs/btrfs.
>>> >> >
>>> >> > Current features
>>> >> > - multiple devices
>>> >> > - replication
>>> >> > - tiering
>>> >> > - data checksumming and compression (zlib only; also the code
>>> >> > doesn't
>>> >> > work with
>>> >> > tiering yet)
>>> >> > - most of the normal posix fs features (no fallocate or quotas yet)
>>> >> >
>>> >> > Planned features:
>>> >> > - snapshots!
>>> >> > - erasure coding
>>> >> > - more
>>> >> >
>>> >> > There will be a longer announcement on LKML/linux-fs in the near
>>> >> > future
>>> >> > (after
>>> >> > I'm finished moving) - but I'd like to get it a bit more testing
>>> >> > from a
>>> >> > wider
>>> >> > audience first, if possible.
>>> >> >
>>> >> > You need the bcache-dev branch, and the new bcache tools - be
>>> >> > warned, this
>>> >> > code
>>> >> > is _not_ compatible with the upstream bcache on disk format:
>>> >> >
>>> >> > $ git clone -b bcache-dev
>>> >> > http://evilpiepirate.org/git/linux-bcache.git
>>> >> > $ git clone -b dev http://evilpiepirate.org/git/bcache-tools.git
>>> >> >
>>> >> > Then do the usual compiling...
>>> >> >
>>> >> > # bcacheadm format -C /dev/sda1
>>> >> > # mount /dev/sda1 /mnt
>>> >> >
>>> >> > The usual caveats apply - it might eat your data, the on disk format
>>> >> > has
>>> >> > _not_
>>> >> > been stabilized yet, etc. But it's been reasonably stable for me,
>>> >> > and
>>> >> > passes all
>>> >> > but 2-3 of the supported xfstests.
>>> >> >
>>> >> > Try it out and let me know how it goes!
>>> >> >
>>> >> > Also, programmers please check out the bcache guide - feedback is
>>> >> > appreciated:
>>> >> >
>>> >> > http://bcache.evilpiepirate.org/BcacheGuide/
>>> >> >
>>> >> > Thanks!
>>> >> > --
>>> >> > To unsubscribe from this list: send the line "unsubscribe
>>> >> > linux-bcache" in
>>> >> > the body of a message to majordomo@vger.kernel.org
>>> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> >> >
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe linux-bcache"
>>> > in
>>> > the body of a message to majordomo@vger.kernel.org
>>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>>
>>> Denis
--
Denis
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-14 0:58 [ANNOUNCE] bcachefs! Kent Overstreet
` (2 preceding siblings ...)
2015-07-18 0:01 ` Denis Bychkov
@ 2015-07-21 18:37 ` David Mohr
2015-07-21 21:53 ` Jason Warr
2015-07-22 7:19 ` Killian De Volder
3 siblings, 2 replies; 36+ messages in thread
From: David Mohr @ 2015-07-21 18:37 UTC (permalink / raw)
To: Kent Overstreet
Cc: linux-bcache, sviatoslavpestov, mrubin, adam.berkan, zab, rickyb
On 2015-07-13 18:58, Kent Overstreet wrote:
> Short announcement, because I'm in the process of moving - but I wanted
> to get
> this out there because the code is up and I think it's reasonably
> stable right
> now.
>
> Bcachefs is a posix filesystem that I've been working towards for -
> well, quite
> awhile now: it's intended as a competitor/replacement for
> ext4/xfs/btrfs.
>
> Current features
> - multiple devices
> - replication
> - tiering
> - data checksumming and compression (zlib only; also the code doesn't
> work with
> tiering yet)
> - most of the normal posix fs features (no fallocate or quotas yet)
>
> Planned features:
> - snapshots!
> - erasure coding
> - more
>
> There will be a longer announcement on LKML/linux-fs in the near future
> (after
> I'm finished moving) - but I'd like to get it a bit more testing from a
> wider
> audience first, if possible.
Hi Kent,
one quick question about the roadmap at this point: As far as I
understand bcachefs basically integrates bcache features directly in the
filesystem. So does this deprecate bcache itself in your opinion? Bcache
is obviously still useful for other FS, but I just want to know how
things will get maintained in the future.
I wanted to suggest / possibly start implementing bcache support for the
debian installer - obviously that only makes sense if I can expect it to
be in the mainline kernel for the foreseeable future :-).
Thanks,
~David
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-21 18:37 ` David Mohr
@ 2015-07-21 21:53 ` Jason Warr
2015-07-24 19:32 ` Kent Overstreet
2015-07-22 7:19 ` Killian De Volder
1 sibling, 1 reply; 36+ messages in thread
From: Jason Warr @ 2015-07-21 21:53 UTC (permalink / raw)
To: David Mohr, Kent Overstreet
Cc: linux-bcache, sviatoslavpestov, mrubin, adam.berkan, zab, rickyb
On 7/21/2015 1:37 PM, David Mohr wrote:
> On 2015-07-13 18:58, Kent Overstreet wrote:
>> Short announcement, because I'm in the process of moving - but I
>> wanted to get
>> this out there because the code is up and I think it's reasonably
>> stable right
>> now.
>>
>> Bcachefs is a posix filesystem that I've been working towards for -
>> well, quite
>> awhile now: it's intended as a competitor/replacement for
>> ext4/xfs/btrfs.
>>
>> Current features
>> - multiple devices
>> - replication
>> - tiering
>> - data checksumming and compression (zlib only; also the code
>> doesn't work with
>> tiering yet)
>> - most of the normal posix fs features (no fallocate or quotas yet)
>>
>> Planned features:
>> - snapshots!
>> - erasure coding
>> - more
>>
>> There will be a longer announcement on LKML/linux-fs in the near
>> future (after
>> I'm finished moving) - but I'd like to get it a bit more testing from
>> a wider
>> audience first, if possible.
>
> Hi Kent,
>
> one quick question about the roadmap at this point: As far as I
> understand bcachefs basically integrates bcache features directly in
> the filesystem. So does this deprecate bcache itself in your opinion?
> Bcache is obviously still useful for other FS, but I just want to know
> how things will get maintained in the future.
>
It would be rather disappointing if this were the case. bcache is quite
useful for backing block devices that have no local filesystem, such as
devices exported via iSCSI, devices used directly by VMs, etc...
> I wanted to suggest / possibly start implementing bcache support for
> the debian installer - obviously that only makes sense if I can expect
> it to be in the mainline kernel for the foreseeable future :-).
>
> Thanks,
> ~David
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-18 5:21 ` Ming Lin
@ 2015-07-22 5:11 ` Ming Lin
2015-07-22 5:15 ` Ming Lin
2015-07-24 19:15 ` Kent Overstreet
0 siblings, 2 replies; 36+ messages in thread
From: Ming Lin @ 2015-07-22 5:11 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache@vger.kernel.org
On Fri, Jul 17, 2015 at 10:21 PM, Ming Lin <mlin@kernel.org> wrote:
> On Fri, 2015-07-17 at 19:10 -0700, Kent Overstreet wrote:
>> BTW - probably the most valuable thing you could help out with is the
>> documentation, in particular the guide:
>> http://bcache.evilpiepirate.org/BcacheGuide/
>>
>> Can you read through (at least some of) that, and tell me what's useful and what
>> needs clarifying? And tell me what you'd like to see added to the guide next -
>> I'll try and work on documentation over the next two weeks, since I probably
>> won't be able to do much real coding with my test machines offline.
>
> Yes, I'll read through that.
Would you add some example to explain how the extents/inodes/dirents are stored
in the btree on disk?
I'm reading the debug code in drivers/md/bcache/debug.c.
It seems helpful to learn about the internal btree structure.
Thanks.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-22 5:11 ` Ming Lin
@ 2015-07-22 5:15 ` Ming Lin
2015-07-24 19:15 ` Kent Overstreet
1 sibling, 0 replies; 36+ messages in thread
From: Ming Lin @ 2015-07-22 5:15 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache@vger.kernel.org
On Tue, Jul 21, 2015 at 10:11 PM, Ming Lin <mlin@kernel.org> wrote:
> On Fri, Jul 17, 2015 at 10:21 PM, Ming Lin <mlin@kernel.org> wrote:
>> On Fri, 2015-07-17 at 19:10 -0700, Kent Overstreet wrote:
>>> BTW - probably the most valuable thing you could help out with is the
>>> documentation, in particular the guide:
>>> http://bcache.evilpiepirate.org/BcacheGuide/
>>>
>>> Can you read through (at least some of) that, and tell me what's useful and what
>>> needs clarifying? And tell me what you'd like to see added to the guide next -
>>> I'll try and work on documentation over the next two weeks, since I probably
>>> won't be able to do much real coding with my test machines offline.
>>
>> Yes, I'll read through that.
>
> Would you add some example to explain how the extents/inodes/dirents are stored
> in the btree on disk?
>
> I'm reading the debug code in drivers/md/bcache/debug.c.
> It seems helpful to learn about the internal btree structure.
For whom interested:
"cat" these files to dump the btree.
root@bee:/sys/kernel/debug/bcache/07170fb5-b9bd-4f44-a7e3-c657a367a960# ls
dirents dirents-formats extents extents-formats inodes
inodes-formats xattrs xattrs-formats
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-21 18:37 ` David Mohr
2015-07-21 21:53 ` Jason Warr
@ 2015-07-22 7:19 ` Killian De Volder
1 sibling, 0 replies; 36+ messages in thread
From: Killian De Volder @ 2015-07-22 7:19 UTC (permalink / raw)
To: linux-bcache; +Cc: sviatoslavpestov, mrubin, adam.berkan, zab, rickyb
On 21-07-15 20:37, David Mohr wrote:
> one quick question about the roadmap at this point: As far as I understand bcachefs basically integrates bcache features directly in the filesystem.
> So does this deprecate bcache itself in your opinion? Bcache is obviously still useful for other FS, but I just want to know how things will get maintained in the future.
If they remove bcache from the kernel a lot of peaple are going to have serious troubles, as it's not easy to remove.
But to quote the developer:
"no don't worry. it's not going to be deleted from upstream"
> I wanted to suggest / possibly start implementing bcache support for the debian installer - obviously that only makes sense if I can expect it to be in the mainline kernel for the foreseeable future :-).
I can also make a quote on this question:
"the btree code is also hugely improved over what's in mainline, i'd like to get the improvements backported but i think it's just way way too much work"
"bcache will be deprecated when a stable bcachefs is upstream (but it's going to be awhile before the on disk format is stable again)"
More info on what bcachefs actually is:
You initialize some fast storage as a caching-device. These store a btree-journal-change (or whatever is actually used internally) key-value storage system on disk.
Next you can use this btree-caching-device to put a file system on top OR use it to store the cache data for a caching-block-device.
(Not sure if you can combine a caching device and a backing device into the same FS, but you will probably be able to.)
--
Killian De Volder
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-22 5:11 ` Ming Lin
2015-07-22 5:15 ` Ming Lin
@ 2015-07-24 19:15 ` Kent Overstreet
2015-07-24 20:47 ` Ming Lin
1 sibling, 1 reply; 36+ messages in thread
From: Kent Overstreet @ 2015-07-24 19:15 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-bcache@vger.kernel.org
On Tue, Jul 21, 2015 at 10:11:11PM -0700, Ming Lin wrote:
> On Fri, Jul 17, 2015 at 10:21 PM, Ming Lin <mlin@kernel.org> wrote:
> > On Fri, 2015-07-17 at 19:10 -0700, Kent Overstreet wrote:
> >> BTW - probably the most valuable thing you could help out with is the
> >> documentation, in particular the guide:
> >> http://bcache.evilpiepirate.org/BcacheGuide/
> >>
> >> Can you read through (at least some of) that, and tell me what's useful and what
> >> needs clarifying? And tell me what you'd like to see added to the guide next -
> >> I'll try and work on documentation over the next two weeks, since I probably
> >> won't be able to do much real coding with my test machines offline.
> >
> > Yes, I'll read through that.
>
> Would you add some example to explain how the extents/inodes/dirents are stored
> in the btree on disk?
Can you be more specific? Like how inodes/dirents map to keys in the btree, or
how it all ends up on disk?
The inodes/dirents code is pretty short, I'd look at inode.c and dirent.c
> I'm reading the debug code in drivers/md/bcache/debug.c.
> It seems helpful to learn about the internal btree structure.
Are you interested in more the format of the btree node itself, on disk? Like
struct btree_node, struct btree_node_entry, struct bset, and the packing?
I could try and elaborate on that in the guide, give me some specific questions
to cover if you've got any
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-20 2:52 ` Denis Bychkov
@ 2015-07-24 19:25 ` Kent Overstreet
0 siblings, 0 replies; 36+ messages in thread
From: Kent Overstreet @ 2015-07-24 19:25 UTC (permalink / raw)
To: Denis Bychkov
Cc: Adam Berkan, linux-bcache, Vasiliy Tolstov, Michael Rubin,
Slava Pestov, zab, Ricky Benitez
On Sun, Jul 19, 2015 at 10:52:09PM -0400, Denis Bychkov wrote:
> I don't think I found anything in the design description or anywhere
> else explaining how tiering works and what data, when and why ends up
> on the next tier. And how to control this. The old bcache has a pretty
> advanced set of knobs allowing you to fine-tune this behavior
> (read-ahead limit, sequential cutoff, congestion thresholds, etc.) If
> I overlooked, please point me to the right direction.
All those additional knobs don't exist yet in bcachefs/tiering land - I want to
rethink all of that, and also wait until there's actual users/use cases that
need that stuff so we have some idea of what we're trying to accomplish.
The way it works right now is:
- Foreground writes always go to tier 0
If tier 0 is full, they wait - there's code to slowly throttle foreground
writes if tier 0 is getting close to full and give tiering/copygc a chance to
catch up, so they hopefully don't get stuck waiting nearly forever when tier
0 gets completely full
- Tiering scans the extents btree looking for data that is present on tier 0
but not tier 1, and then writes an additional copy of that data on tier 1
- Extra replicas are considered cached, so the copy on tier 0 will no longer be
considered dirty and can be reclaimed
- On the read side, if we read from tier 1 the cache_promote() path tries to
write another copy to tier 0
No fancy knobs yet. In the future (a ways off), if we want to readd fancy
knobs/behaviour we should try and rethink this stuff in the context of a
filesystem - like we could potentially have persistent inode flags for "this
file should always live on the slow tier", and also if we want to send
particular IOs to the slow tier possibly try and do that from the code that
interacts with the pagecache, where we've got more information about how much
data we're going to be reading/writing.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-21 21:53 ` Jason Warr
@ 2015-07-24 19:32 ` Kent Overstreet
2015-07-24 19:42 ` Jason Warr
0 siblings, 1 reply; 36+ messages in thread
From: Kent Overstreet @ 2015-07-24 19:32 UTC (permalink / raw)
To: Jason Warr
Cc: David Mohr, linux-bcache, sviatoslavpestov, mrubin, adam.berkan,
zab, rickyb
On Tue, Jul 21, 2015 at 04:53:44PM -0500, Jason Warr wrote:
>
>
> On 7/21/2015 1:37 PM, David Mohr wrote:
> >On 2015-07-13 18:58, Kent Overstreet wrote:
> >>Short announcement, because I'm in the process of moving - but I wanted
> >>to get
> >>this out there because the code is up and I think it's reasonably stable
> >>right
> >>now.
> >>
> >>Bcachefs is a posix filesystem that I've been working towards for -
> >>well, quite
> >>awhile now: it's intended as a competitor/replacement for
> >>ext4/xfs/btrfs.
> >>
> >>Current features
> >> - multiple devices
> >> - replication
> >> - tiering
> >> - data checksumming and compression (zlib only; also the code doesn't
> >>work with
> >> tiering yet)
> >> - most of the normal posix fs features (no fallocate or quotas yet)
> >>
> >>Planned features:
> >> - snapshots!
> >> - erasure coding
> >> - more
> >>
> >>There will be a longer announcement on LKML/linux-fs in the near future
> >>(after
> >>I'm finished moving) - but I'd like to get it a bit more testing from a
> >>wider
> >>audience first, if possible.
> >
> >Hi Kent,
> >
> >one quick question about the roadmap at this point: As far as I understand
> >bcachefs basically integrates bcache features directly in the filesystem.
> >So does this deprecate bcache itself in your opinion? Bcache is obviously
> >still useful for other FS, but I just want to know how things will get
> >maintained in the future.
> >
> It would be rather disappointing if this were the case. bcache is quite
> useful for backing block devices that have no local filesystem, such as
> devices exported via iSCSI, devices used directly by VMs, etc...
- bcachefs/bcache2 getting merged is a _long_ way off, and when it does it's
going to be more of an ext2/ext3 thing - the existing upstream bcache version
will stay there for the foreseeable future.
- bcachefs/bcache2 still has all the same functionality bcache has for caching
other block devices, and exporting thin provisioned block devices - that
functionality won't be going away any time soon, if ever - so you'll be able
to migrate to the new bcache code and on disk format without changing
anything about how you use it.
The "backing device/cached dev" path _might_ eventually get deprecated in
favor of having bcache manage all the block devices directly and export thin
provisioned block devices - this is the existing "flash_vol_create"
functionality.
Reason being the thin provisioned/fully bcache managed block devices path is
quite a bit simpler and diverges less from the functionality bcachefs uses -
and also cache coherency is fundamentally easier with bcache managing all the
storage so performance should be better too.
However, if the backing device functionality ever gets removed it's a _long_
ways off, and I'll be asking for user feedback and making sure the thin
provisioned/bcache managed block devices functionality works for everyone
first.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-24 19:32 ` Kent Overstreet
@ 2015-07-24 19:42 ` Jason Warr
0 siblings, 0 replies; 36+ messages in thread
From: Jason Warr @ 2015-07-24 19:42 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache
On 7/24/2015 2:32 PM, Kent Overstreet wrote:
> On Tue, Jul 21, 2015 at 04:53:44PM -0500, Jason Warr wrote:
>>
>> On 7/21/2015 1:37 PM, David Mohr wrote:
>>> On 2015-07-13 18:58, Kent Overstreet wrote:
>>>> Short announcement, because I'm in the process of moving - but I wanted
>>>> to get
>>>> this out there because the code is up and I think it's reasonably stable
>>>> right
>>>> now.
>>>>
>>>> Bcachefs is a posix filesystem that I've been working towards for -
>>>> well, quite
>>>> awhile now: it's intended as a competitor/replacement for
>>>> ext4/xfs/btrfs.
>>>>
>>>> Current features
>>>> - multiple devices
>>>> - replication
>>>> - tiering
>>>> - data checksumming and compression (zlib only; also the code doesn't
>>>> work with
>>>> tiering yet)
>>>> - most of the normal posix fs features (no fallocate or quotas yet)
>>>>
>>>> Planned features:
>>>> - snapshots!
>>>> - erasure coding
>>>> - more
>>>>
>>>> There will be a longer announcement on LKML/linux-fs in the near future
>>>> (after
>>>> I'm finished moving) - but I'd like to get it a bit more testing from a
>>>> wider
>>>> audience first, if possible.
>>> Hi Kent,
>>>
>>> one quick question about the roadmap at this point: As far as I understand
>>> bcachefs basically integrates bcache features directly in the filesystem.
>>> So does this deprecate bcache itself in your opinion? Bcache is obviously
>>> still useful for other FS, but I just want to know how things will get
>>> maintained in the future.
>>>
>> It would be rather disappointing if this were the case. bcache is quite
>> useful for backing block devices that have no local filesystem, such as
>> devices exported via iSCSI, devices used directly by VMs, etc...
> - bcachefs/bcache2 getting merged is a _long_ way off, and when it does it's
> going to be more of an ext2/ext3 thing - the existing upstream bcache version
> will stay there for the foreseeable future.
>
> - bcachefs/bcache2 still has all the same functionality bcache has for caching
> other block devices, and exporting thin provisioned block devices - that
> functionality won't be going away any time soon, if ever - so you'll be able
> to migrate to the new bcache code and on disk format without changing
> anything about how you use it.
>
> The "backing device/cached dev" path _might_ eventually get deprecated in
> favor of having bcache manage all the block devices directly and export thin
> provisioned block devices - this is the existing "flash_vol_create"
> functionality.
>
> Reason being the thin provisioned/fully bcache managed block devices path is
> quite a bit simpler and diverges less from the functionality bcachefs uses -
> and also cache coherency is fundamentally easier with bcache managing all the
> storage so performance should be better too.
>
> However, if the backing device functionality ever gets removed it's a _long_
> ways off, and I'll be asking for user feedback and making sure the thin
> provisioned/bcache managed block devices functionality works for everyone
> first.
As long as we get the same basic functionality in addition to the
filesystem layer, it sounds like you plan on that, I'm happy.
Thank you for addressing this.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-24 19:15 ` Kent Overstreet
@ 2015-07-24 20:47 ` Ming Lin
2015-07-28 18:41 ` Ming Lin
0 siblings, 1 reply; 36+ messages in thread
From: Ming Lin @ 2015-07-24 20:47 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache@vger.kernel.org
On Fri, 2015-07-24 at 12:15 -0700, Kent Overstreet wrote:
> On Tue, Jul 21, 2015 at 10:11:11PM -0700, Ming Lin wrote:
> > On Fri, Jul 17, 2015 at 10:21 PM, Ming Lin <mlin@kernel.org> wrote:
> > > On Fri, 2015-07-17 at 19:10 -0700, Kent Overstreet wrote:
> > >> BTW - probably the most valuable thing you could help out with is the
> > >> documentation, in particular the guide:
> > >> http://bcache.evilpiepirate.org/BcacheGuide/
> > >>
> > >> Can you read through (at least some of) that, and tell me what's useful and what
> > >> needs clarifying? And tell me what you'd like to see added to the guide next -
> > >> I'll try and work on documentation over the next two weeks, since I probably
> > >> won't be able to do much real coding with my test machines offline.
> > >
> > > Yes, I'll read through that.
> >
> > Would you add some example to explain how the extents/inodes/dirents are stored
> > in the btree on disk?
>
> Can you be more specific? Like how inodes/dirents map to keys in the btree, or
> how it all ends up on disk?
How it all ends up on disk.
>
> The inodes/dirents code is pretty short, I'd look at inode.c and dirent.c
>
> > I'm reading the debug code in drivers/md/bcache/debug.c.
> > It seems helpful to learn about the internal btree structure.
>
> Are you interested in more the format of the btree node itself, on disk? Like
> struct btree_node, struct btree_node_entry, struct bset, and the packing?
>
> I could try and elaborate on that in the guide, give me some specific questions
> to cover if you've got any
Only textbook knowledge of b+tree.
Yes, I'm interested in the format of btree on disk.
Some graph like this could help.
https://btrfs.wiki.kernel.org/index.php/Btrfs_design
And I want to learn how the btree node insert/delete/update happens on
disk. These maybe too detail. I'm going to write a small tool to dump
the file system. Then I could understand better the on disk btree
format.
Thanks.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-24 20:47 ` Ming Lin
@ 2015-07-28 18:41 ` Ming Lin
2015-07-28 18:45 ` Ming Lin
2015-08-06 22:58 ` Kent Overstreet
0 siblings, 2 replies; 36+ messages in thread
From: Ming Lin @ 2015-07-28 18:41 UTC (permalink / raw)
To: Ming Lin; +Cc: Kent Overstreet, linux-bcache@vger.kernel.org
On Fri, Jul 24, 2015 at 1:47 PM, Ming Lin <mlin@kernel.org> wrote:
>
> And I want to learn how the btree node insert/delete/update happens on
> disk. These maybe too detail. I'm going to write a small tool to dump
> the file system. Then I could understand better the on disk btree
> format.
Here is my simple tool to dump parts of the on-disk format.
http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=deb258e2
It's not in good shape, but simple enough to learn the on-disk format.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-28 18:41 ` Ming Lin
@ 2015-07-28 18:45 ` Ming Lin
2015-08-06 6:40 ` Ming Lin
2015-08-06 22:58 ` Kent Overstreet
1 sibling, 1 reply; 36+ messages in thread
From: Ming Lin @ 2015-07-28 18:45 UTC (permalink / raw)
To: Ming Lin; +Cc: Kent Overstreet, linux-bcache@vger.kernel.org
On Tue, Jul 28, 2015 at 11:41 AM, Ming Lin <mlin@kernel.org> wrote:
> On Fri, Jul 24, 2015 at 1:47 PM, Ming Lin <mlin@kernel.org> wrote:
>>
>> And I want to learn how the btree node insert/delete/update happens on
>> disk. These maybe too detail. I'm going to write a small tool to dump
>> the file system. Then I could understand better the on disk btree
>> format.
>
> Here is my simple tool to dump parts of the on-disk format.
> http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=deb258e2
Actually: http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=3121eec
>
> It's not in good shape, but simple enough to learn the on-disk format.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-28 18:45 ` Ming Lin
@ 2015-08-06 6:40 ` Ming Lin
2015-08-06 23:11 ` Kent Overstreet
0 siblings, 1 reply; 36+ messages in thread
From: Ming Lin @ 2015-08-06 6:40 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache@vger.kernel.org
On Tue, 2015-07-28 at 11:45 -0700, Ming Lin wrote:
> On Tue, Jul 28, 2015 at 11:41 AM, Ming Lin <mlin@kernel.org> wrote:
> > On Fri, Jul 24, 2015 at 1:47 PM, Ming Lin <mlin@kernel.org> wrote:
> >>
> >> And I want to learn how the btree node insert/delete/update happens on
> >> disk. These maybe too detail. I'm going to write a small tool to dump
> >> the file system. Then I could understand better the on disk btree
> >> format.
> >
> > Here is my simple tool to dump parts of the on-disk format.
> > http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=deb258e2
>
> Actually: http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=3121eec
>
> >
> > It's not in good shape, but simple enough to learn the on-disk format.
Hi Kent,
I'm trying to understand how the root inode is stored in the inode
btree.
dd if=/dev/zero of=fs.img bs=10M count=1
bcacheadm format -C fs.img
mount -t bcache -o loop fs.img /mnt
umount /mnt
hexdump -C fs.img > fs.hex
From my simple tool, I know that the inode btree starts from offset
0xec000
000ec000 43 ef f3 df ff ff ff ff 86 c1 47 1e 99 25 51 35 |C.........G..%Q5|
000ec010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000ec020 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff |................|
000ec030 ff ff ff ff ff ff ff ff 01 05 00 00 00 00 00 00 |................|
000ec040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000ec070 88 b5 38 e2 45 36 eb f6 00 00 00 00 00 00 00 00 |..8.E6..........|
000ec080 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
000ec090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000ed000 31 66 fd 31 ff ff ff ff 88 b5 38 e2 45 36 eb f6 |1f.1......8.E6..|
000ed010 02 00 00 00 00 00 00 00 01 00 00 00 03 00 0b 00 |................|
000ed020 0b 01 80 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000ed030 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................|
000ed040 ed 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.A..............|
000ed050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000ed070 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000ed080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
btree_node (0xec000)
bset (0xed008) ---> bset->u64s = 0x0b = 11
bkey_packed (0xed020)
bkey (0xed020)
bch_inode (0xed040 to 0xed077) ---> root inode
Is the decode above correct?
I found the root inode manually. But how is it actually found by code?
Could you help to explain what it is from 0xec070 to 0xed007?
Are they also bsets?
Thanks,
Ming
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-07-28 18:41 ` Ming Lin
2015-07-28 18:45 ` Ming Lin
@ 2015-08-06 22:58 ` Kent Overstreet
2015-08-06 23:27 ` Ming Lin
1 sibling, 1 reply; 36+ messages in thread
From: Kent Overstreet @ 2015-08-06 22:58 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-bcache@vger.kernel.org
On Tue, Jul 28, 2015 at 11:41:52AM -0700, Ming Lin wrote:
> On Fri, Jul 24, 2015 at 1:47 PM, Ming Lin <mlin@kernel.org> wrote:
> >
> > And I want to learn how the btree node insert/delete/update happens on
> > disk. These maybe too detail. I'm going to write a small tool to dump
> > the file system. Then I could understand better the on disk btree
> > format.
>
> Here is my simple tool to dump parts of the on-disk format.
> http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=deb258e2
>
> It's not in good shape, but simple enough to learn the on-disk format.
Hey! Sorry for taking so long to respond, just got my computer set up back in
Alaska.
If you want to keep going with your tool, this might be a starting point for a
debugfs tool - which bcache definitely needs at some point.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-08-06 6:40 ` Ming Lin
@ 2015-08-06 23:11 ` Kent Overstreet
2015-08-07 5:21 ` Ming Lin
0 siblings, 1 reply; 36+ messages in thread
From: Kent Overstreet @ 2015-08-06 23:11 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-bcache@vger.kernel.org
On Wed, Aug 05, 2015 at 11:40:06PM -0700, Ming Lin wrote:
> On Tue, 2015-07-28 at 11:45 -0700, Ming Lin wrote:
> > On Tue, Jul 28, 2015 at 11:41 AM, Ming Lin <mlin@kernel.org> wrote:
> > > On Fri, Jul 24, 2015 at 1:47 PM, Ming Lin <mlin@kernel.org> wrote:
> > >>
> > >> And I want to learn how the btree node insert/delete/update happens on
> > >> disk. These maybe too detail. I'm going to write a small tool to dump
> > >> the file system. Then I could understand better the on disk btree
> > >> format.
> > >
> > > Here is my simple tool to dump parts of the on-disk format.
> > > http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=deb258e2
> >
> > Actually: http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=3121eec
> >
> > >
> > > It's not in good shape, but simple enough to learn the on-disk format.
>
> Hi Kent,
>
> I'm trying to understand how the root inode is stored in the inode
> btree.
>
> dd if=/dev/zero of=fs.img bs=10M count=1
> bcacheadm format -C fs.img
> mount -t bcache -o loop fs.img /mnt
> umount /mnt
> hexdump -C fs.img > fs.hex
>
> From my simple tool, I know that the inode btree starts from offset
> 0xec000
The root node of the inode btree? Are you handling trees with multiple nodes
yet?
>
> 000ec000 43 ef f3 df ff ff ff ff 86 c1 47 1e 99 25 51 35 |C.........G..%Q5|
> 000ec010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> 000ec020 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff |................|
> 000ec030 ff ff ff ff ff ff ff ff 01 05 00 00 00 00 00 00 |................|
> 000ec040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> *
> 000ec070 88 b5 38 e2 45 36 eb f6 00 00 00 00 00 00 00 00 |..8.E6..........|
> 000ec080 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
> 000ec090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> *
> 000ed000 31 66 fd 31 ff ff ff ff 88 b5 38 e2 45 36 eb f6 |1f.1......8.E6..|
> 000ed010 02 00 00 00 00 00 00 00 01 00 00 00 03 00 0b 00 |................|
> 000ed020 0b 01 80 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> 000ed030 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................|
> 000ed040 ed 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.A..............|
> 000ed050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> *
> 000ed070 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> 000ed080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> *
>
> btree_node (0xec000)
> bset (0xed008) ---> bset->u64s = 0x0b = 11
> bkey_packed (0xed020)
> bkey (0xed020)
> bch_inode (0xed040 to 0xed077) ---> root inode
>
> Is the decode above correct?
I think so. The code that deals with reading in a btree node disk and
interpreting the contents is mainly in bch_btree_node_read_done(), btree_io.c -
it looks like you found that?
> I found the root inode manually. But how is it actually found by code?
The root inode is the inode with inode number BCACHE_ROOT_INO (4096) -
http://evilpiepirate.org/git/linux-bcache.git/tree/drivers/md/bcache/fs.c?h=bcache-dev&id=5cf7fb11d124839eea2191fd7e8eddecb296d67d#n2285
So to do it correctly, you'll need the bkey packing code in order to unpack the
key (if it was packed) so that you can get the actual inode number of the key.
You'll also need to do something like the mergesort algorithm (or something
equivalent; you don't need to do the actual mergesort if you're just doing a
linear search for one key). That is - if there's multiple bsets, they will
likely contain duplicates and keys in newer bsets overwrite keys in older bsets.
> Could you help to explain what it is from 0xec070 to 0xed007?
> Are they also bsets?
Without knowing your block size and spending a fair amount of time staring at
the hexdump, I don't know what starts there - but quite possibly yes; bsets that
aren't at the start of the btree node are embeddedd in a struct
btree_node_entry, not a struct btree_node.
To tell if it's a valid bset, you compare bset->seq against the seq in the first
bset - it's a random number generated for each new btree node; if they match
then the bset there goes with that btree node.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-08-06 22:58 ` Kent Overstreet
@ 2015-08-06 23:27 ` Ming Lin
2015-08-06 23:59 ` Kent Overstreet
0 siblings, 1 reply; 36+ messages in thread
From: Ming Lin @ 2015-08-06 23:27 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache@vger.kernel.org
On Thu, Aug 6, 2015 at 3:58 PM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Tue, Jul 28, 2015 at 11:41:52AM -0700, Ming Lin wrote:
>> On Fri, Jul 24, 2015 at 1:47 PM, Ming Lin <mlin@kernel.org> wrote:
>> >
>> > And I want to learn how the btree node insert/delete/update happens on
>> > disk. These maybe too detail. I'm going to write a small tool to dump
>> > the file system. Then I could understand better the on disk btree
>> > format.
>>
>> Here is my simple tool to dump parts of the on-disk format.
>> http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=deb258e2
>>
>> It's not in good shape, but simple enough to learn the on-disk format.
>
> Hey! Sorry for taking so long to respond, just got my computer set up back in
> Alaska.
>
> If you want to keep going with your tool, this might be a starting point for a
> debugfs tool - which bcache definitely needs at some point.
Yes, that's my goal.
I'll improve it once I get more familiar with bcachefs on-disk format.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-08-06 23:27 ` Ming Lin
@ 2015-08-06 23:59 ` Kent Overstreet
0 siblings, 0 replies; 36+ messages in thread
From: Kent Overstreet @ 2015-08-06 23:59 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-bcache@vger.kernel.org
On Thu, Aug 06, 2015 at 04:27:51PM -0700, Ming Lin wrote:
> On Thu, Aug 6, 2015 at 3:58 PM, Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> > On Tue, Jul 28, 2015 at 11:41:52AM -0700, Ming Lin wrote:
> >> On Fri, Jul 24, 2015 at 1:47 PM, Ming Lin <mlin@kernel.org> wrote:
> >> >
> >> > And I want to learn how the btree node insert/delete/update happens on
> >> > disk. These maybe too detail. I'm going to write a small tool to dump
> >> > the file system. Then I could understand better the on disk btree
> >> > format.
> >>
> >> Here is my simple tool to dump parts of the on-disk format.
> >> http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=deb258e2
> >>
> >> It's not in good shape, but simple enough to learn the on-disk format.
> >
> > Hey! Sorry for taking so long to respond, just got my computer set up back in
> > Alaska.
> >
> > If you want to keep going with your tool, this might be a starting point for a
> > debugfs tool - which bcache definitely needs at some point.
>
> Yes, that's my goal.
> I'll improve it once I get more familiar with bcachefs on-disk format.
I imagine the sanest thing to do will be to reuse some of the kernel side code -
at the very least, the bkey packing code. That code is already pretty self
contained, and it's very algorithmic - no point in redoing it, and no real
reason to do it differently.
If it makes things easier, we could probably shuffle code around a bit so that
perhaps bkey.c contains only code that can be easily compiled in userspace.
I'm not sure if there's any other significant code that you'd want to use in
userspace - possibly the mergesort code (i.e.
bch_extent_sort_fix_overlapping()), but that code is going to be harder to lift
out and compile in userspace without changes.
Journal replay is going to be another major issue... the problem is, the btree
isn't up to date until you do journal replay, and the way bcache does journal
replay is with the same index update path that it uses at runtime - which
modifies the btree, i.e. it can't do journal replay without modifying what's on
disk.
We don't want the userspace debugfs tool to be modifying the disk image, so the
method bcache uses is right out.
The method I had in mind was that when you read the journal, you keep that list
of index updates to do around, in memory - then, when you read or are looking at
any given btree node, you iterate over all the keys in the journal replay list
and apply only the ones that apply to the current node. If the insertions don't
fit into the current node (i.e. if we would have to split the node if we were
doing a normal index update) - just grow the node in memory, since we're just
going to be tossing it out when we're done instead of writing out our changes.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [ANNOUNCE] bcachefs!
2015-08-06 23:11 ` Kent Overstreet
@ 2015-08-07 5:21 ` Ming Lin
0 siblings, 0 replies; 36+ messages in thread
From: Ming Lin @ 2015-08-07 5:21 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache@vger.kernel.org
On Thu, 2015-08-06 at 16:11 -0700, Kent Overstreet wrote:
> On Wed, Aug 05, 2015 at 11:40:06PM -0700, Ming Lin wrote:
> > On Tue, 2015-07-28 at 11:45 -0700, Ming Lin wrote:
> > > On Tue, Jul 28, 2015 at 11:41 AM, Ming Lin <mlin@kernel.org> wrote:
> > > > On Fri, Jul 24, 2015 at 1:47 PM, Ming Lin <mlin@kernel.org> wrote:
> > > >>
> > > >> And I want to learn how the btree node insert/delete/update happens on
> > > >> disk. These maybe too detail. I'm going to write a small tool to dump
> > > >> the file system. Then I could understand better the on disk btree
> > > >> format.
> > > >
> > > > Here is my simple tool to dump parts of the on-disk format.
> > > > http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=deb258e2
> > >
> > > Actually: http://www.minggr.net/cgit/cgit.cgi/bcache-tools/commit/?id=3121eec
> > >
> > > >
> > > > It's not in good shape, but simple enough to learn the on-disk format.
> >
> > Hi Kent,
> >
> > I'm trying to understand how the root inode is stored in the inode
> > btree.
> >
> > dd if=/dev/zero of=fs.img bs=10M count=1
> > bcacheadm format -C fs.img
> > mount -t bcache -o loop fs.img /mnt
> > umount /mnt
> > hexdump -C fs.img > fs.hex
> >
> > From my simple tool, I know that the inode btree starts from offset
> > 0xec000
>
> The root node of the inode btree? Are you handling trees with multiple nodes
> yet?
Yes and no.
>
> >
> > 000ec000 43 ef f3 df ff ff ff ff 86 c1 47 1e 99 25 51 35 |C.........G..%Q5|
> > 000ec010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > 000ec020 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff |................|
> > 000ec030 ff ff ff ff ff ff ff ff 01 05 00 00 00 00 00 00 |................|
> > 000ec040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > *
> > 000ec070 88 b5 38 e2 45 36 eb f6 00 00 00 00 00 00 00 00 |..8.E6..........|
> > 000ec080 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
> > 000ec090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > *
> > 000ed000 31 66 fd 31 ff ff ff ff 88 b5 38 e2 45 36 eb f6 |1f.1......8.E6..|
> > 000ed010 02 00 00 00 00 00 00 00 01 00 00 00 03 00 0b 00 |................|
> > 000ed020 0b 01 80 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > 000ed030 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................|
> > 000ed040 ed 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.A..............|
> > 000ed050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > *
> > 000ed070 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > 000ed080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> > *
> >
> > btree_node (0xec000)
> > bset (0xed008) ---> bset->u64s = 0x0b = 11
> > bkey_packed (0xed020)
> > bkey (0xed020)
> > bch_inode (0xed040 to 0xed077) ---> root inode
> >
> > Is the decode above correct?
>
> I think so. The code that deals with reading in a btree node disk and
> interpreting the contents is mainly in bch_btree_node_read_done(), btree_io.c -
> it looks like you found that?
I haven't dig into the code yet.
Firstly to understand the on-disk structure by hexdump.
>
> > I found the root inode manually. But how is it actually found by code?
>
> The root inode is the inode with inode number BCACHE_ROOT_INO (4096) -
> http://evilpiepirate.org/git/linux-bcache.git/tree/drivers/md/bcache/fs.c?h=bcache-dev&id=5cf7fb11d124839eea2191fd7e8eddecb296d67d#n2285
>
> So to do it correctly, you'll need the bkey packing code in order to unpack the
> key (if it was packed) so that you can get the actual inode number of the key.
>
> You'll also need to do something like the mergesort algorithm (or something
> equivalent; you don't need to do the actual mergesort if you're just doing a
> linear search for one key). That is - if there's multiple bsets, they will
> likely contain duplicates and keys in newer bsets overwrite keys in older bsets.
Don't understand this part for now. I'll learn it.
>
> > Could you help to explain what it is from 0xec070 to 0xed007?
> > Are they also bsets?
>
> Without knowing your block size and spending a fair amount of time staring at
> the hexdump, I don't know what starts there - but quite possibly yes; bsets that
> aren't at the start of the btree node are embeddedd in a struct
> btree_node_entry, not a struct btree_node.
>
> To tell if it's a valid bset, you compare bset->seq against the seq in the first
> bset - it's a random number generated for each new btree node; if they match
> then the bset there goes with that btree node.
The block size is 4K.
OK, now I can interpret the hexdump.
000ec000 43 ef f3 df ff ff ff ff 86 c1 47 1e 99 25 51 35 |C.........G..%Q5|
000ec010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000ec020 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff |................|
000ec030 ff ff ff ff ff ff ff ff 01 05 00 00 00 00 00 00 |................|
000ec040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000ec070 88 b5 38 e2 45 36 eb f6 00 00 00 00 00 00 00 00 |..8.E6..........|
000ec080 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
000ec090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000ed000 31 66 fd 31 ff ff ff ff 88 b5 38 e2 45 36 eb f6 |1f.1......8.E6..|
000ed010 02 00 00 00 00 00 00 00 01 00 00 00 03 00 0b 00 |................|
000ed020 0b 01 80 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000ed030 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |................|
000ed040 ed 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.A..............|
000ed050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000ed070 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000ed080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000ee000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
There are 2 bsets: bset->seq "88 b5 38 e2 45 36 eb f6"
btree_node (0xec000)
bset_1 (0xec070) ---> bset->u64s = 0 (a empty bset?)
btree_node_entry (0xed000)
bset_2 (0xed008) ---> bset->u64s = 0x0b = 11
bkey_packed (0xed020)
bkey (0xed020)
bch_inode (0xed040 to 0xed077) ---> root inode
Why is there a empty bset at the start of the btree node?
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2015-08-07 5:21 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-14 0:58 [ANNOUNCE] bcachefs! Kent Overstreet
[not found] ` <CACaajQtwx45r8GcRmchrQwDts1GH-V8g0x1FwGfDvnfm02bq+Q@mail.gmail.com>
2015-07-14 8:11 ` Kent Overstreet
2015-07-20 1:11 ` Denis Bychkov
[not found] ` <CAC7rs0uWSt85F443PRw1zvybccg+EfebaSyH9EhUwHjhTGryRA@mail.gmail.com>
[not found] ` <CAC7rs0upqkuH1CPd-OAmrpQ=8PmaDpzHYY1MaBDpAL6TS_iKyw@mail.gmail.com>
2015-07-20 2:52 ` Denis Bychkov
2015-07-24 19:25 ` Kent Overstreet
2015-07-15 6:11 ` Ming Lin
[not found] ` <CAC7rs0sbg2ci6=niQ0X11AONZbr2AOYhRbxfDH_w4N4A7dyPLw@mail.gmail.com>
2015-07-15 7:15 ` Ming Lin
2015-07-15 7:39 ` Ming Lin
2015-07-17 23:17 ` Kent Overstreet
2015-07-17 23:35 ` Ming Lin
2015-07-17 23:40 ` Kent Overstreet
2015-07-17 23:48 ` Ming Lin
2015-07-17 23:51 ` Kent Overstreet
2015-07-17 23:58 ` Ming Lin
2015-07-18 2:10 ` Kent Overstreet
2015-07-18 5:21 ` Ming Lin
2015-07-22 5:11 ` Ming Lin
2015-07-22 5:15 ` Ming Lin
2015-07-24 19:15 ` Kent Overstreet
2015-07-24 20:47 ` Ming Lin
2015-07-28 18:41 ` Ming Lin
2015-07-28 18:45 ` Ming Lin
2015-08-06 6:40 ` Ming Lin
2015-08-06 23:11 ` Kent Overstreet
2015-08-07 5:21 ` Ming Lin
2015-08-06 22:58 ` Kent Overstreet
2015-08-06 23:27 ` Ming Lin
2015-08-06 23:59 ` Kent Overstreet
2015-07-18 0:01 ` Denis Bychkov
2015-07-18 2:12 ` Kent Overstreet
2015-07-19 7:46 ` Denis Bychkov
2015-07-21 18:37 ` David Mohr
2015-07-21 21:53 ` Jason Warr
2015-07-24 19:32 ` Kent Overstreet
2015-07-24 19:42 ` Jason Warr
2015-07-22 7:19 ` Killian De Volder
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox