linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* stat(2) returning device ID not existing in mountinfo
@ 2016-09-16 20:28 Tomasz Sterna
  2016-09-20 13:15 ` Jeff Mahoney
  0 siblings, 1 reply; 5+ messages in thread
From: Tomasz Sterna @ 2016-09-16 20:28 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2185 bytes --]

Hi.

I have spotted an issue with stat(2) call on files on btrfs.
It is giving me dev_t st_dev number that does not correspond to any
mounted filesystem in proc's mountinfo.

A quick example:

$ grep btrfs /proc/self/mountinfo 
61 0 0:36 /root / rw,relatime shared:1 - btrfs /dev/bcache0 rw,ssd,space_cache,subvolid=535,subvol=/root
75 61 0:36 /home /home rw,relatime shared:30 - btrfs /dev/bcache0 rw,ssd,space_cache,subvolid=258,subvol=/home

As you can see both btrfs subvolumes are 0:36, but files on these:

$ stat -c "%d" /etc/passwd
38
$ stat -c "%d" /home/smoku/test.txt
44

Passing these through major(3)/minor(3) give: 0:38 and 0:44

There is clearly something fishy going on. :-)
Simple one-liner shows that only btrfs and autofs misbehave like this:

$ </proc/self/mountinfo cut -d' ' -f3,5,9 | while read DEV PATH FS; do echo $DEV $(/usr/bin/stat -c "%d" $PATH) $FS; done
0:16 16 sysfs
0:4 4 proc
0:6 6 devtmpfs
0:17 17 securityfs
0:18 18 tmpfs
0:19 19 devpts
0:20 20 tmpfs
0:21 21 tmpfs
0:22 22 cgroup
0:23 23 pstore
0:24 24 efivarfs
0:25 25 cgroup
0:26 26 cgroup
0:27 27 cgroup
0:28 28 cgroup
0:29 29 cgroup
0:30 30 cgroup
0:31 31 cgroup
0:32 32 cgroup
0:33 33 cgroup
0:34 34 cgroup
0:35 35 configfs
0:36 38 btrfs
0:15 15 hugetlbfs
0:39 68 autofs
0:14 14 mqueue
0:40 40 tmpfs
0:7 7 debugfs
0:42 42 nfsd
0:36 44 btrfs
8:2 2050 ext3
8:1 2049 vfat
0:47 47 rpc_pipefs
0:50 50 fusectl
0:51 51 tmpfs
0:49 49 fuse.gvfsd-fuse
0:68 68 binfmt_misc


I already attempted a illinformed-patch in fs/btrfs/super.c:

@@ -1127,6 +1127,7 @@ static int btrfs_fill_super(struct super_block *sb,
 		goto fail_close;
 	}
 
+	sb->s_dev = inode->i_sb->s_dev;
 	sb->s_root = d_make_root(inode);
 	if (!sb->s_root) {
 		err = -ENOMEM;

but it didn't help.

I would like to dig deeper and fix it, but first I have to ask:
- Which number is wrong?
  The one returned by stat() or the one in mountinfo?

I am running:

$ uname -a
Linux lair.home.lan 4.7.3-200.pf3.fc24.x86_64 #1 SMP Tue Sep 13 12:34:03 CEST 2016 x86_64 x86_64 x86_64 GNU/Linux



-- 
smoku @ http://abadcafe.pl/ @ http://xiaoka.com/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: stat(2) returning device ID not existing in mountinfo
  2016-09-16 20:28 stat(2) returning device ID not existing in mountinfo Tomasz Sterna
@ 2016-09-20 13:15 ` Jeff Mahoney
  2017-02-13 19:21   ` Goffredo Baroncelli
  2017-02-15 18:25   ` Goffredo Baroncelli
  0 siblings, 2 replies; 5+ messages in thread
From: Jeff Mahoney @ 2016-09-20 13:15 UTC (permalink / raw)
  To: Tomasz Sterna, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2870 bytes --]

On 9/16/16 4:28 PM, Tomasz Sterna wrote:
> Hi.
> 
> I have spotted an issue with stat(2) call on files on btrfs.
> It is giving me dev_t st_dev number that does not correspond to any
> mounted filesystem in proc's mountinfo.

That's by design.  Your particular file system may only use one device
but, internally, btrfs uses virtualized storage that may be spread
across multiple devices.  To make things more complicated, snapshots
mean that:

sled1a:/mnt # btrfs sub list .
ID 257 gen 14 top level 5 path a
ID 258 gen 14 top level 5 path b

sled1a:/mnt # ls -laRi
.:
total 16
256 drwxr-xr-x 1 root root   4 Sep 20 09:08 .
256 drwxr-xr-x 1 root root 220 Sep 16 09:49 ..
256 drwxr-xr-x 1 root root   8 Sep 14 10:24 a
256 drwxr-xr-x 1 root root   8 Sep 14 10:24 b

./a:
total 4112
256 drwxr-xr-x 1 root root       8 Sep 14 10:24 .
256 drwxr-xr-x 1 root root       4 Sep 20 09:08 ..
257 -rw-r--r-- 1 root root 4194304 Sep 14 10:24 file

./b:
total 4112
256 drwxr-xr-x 1 root root       8 Sep 14 10:24 .
256 drwxr-xr-x 1 root root       4 Sep 20 09:08 ..
257 -rw-r--r-- 1 root root 4194304 Sep 14 10:24 file

Under normal circumstances those are two files with the same st_dev and
the same inode number.  That would normally correspond to a hard link,
but the files do not (necessarily) correspond to the same file.

... but because we use anonymous device numbers for each subvolume, we
have different device numbers for each one.

sled1a:/mnt # stat --format "%n st_dev=%d" {a,b}/file
a/file st_dev=69
b/file st_dev=70

It's a pretty big usability wart that we don't consistently report the
device number.  We do it correctly in stat() but there are other places
in the code that assume that inode->i_sb->s_dev will work.  In the SUSE
kernels, we have patches that add a super_operation to report the
correct device number everywhere, but even that is a hack.

> I already attempted a illinformed-patch in fs/btrfs/super.c:
> 
> @@ -1127,6 +1127,7 @@ static int btrfs_fill_super(struct super_block *sb,
>  		goto fail_close;
>  	}
>  
> +	sb->s_dev = inode->i_sb->s_dev;
>  	sb->s_root = d_make_root(inode);
>  	if (!sb->s_root) {
>  		err = -ENOMEM;
> 
> but it didn't help.

It wouldn't.  That is assigning a variable to itself.

> I would like to dig deeper and fix it, but first I have to ask:
> - Which number is wrong?
>   The one returned by stat() or the one in mountinfo?

The one in mountinfo, but then that means that the user only sees the
anonymous devices in mount(8), which isn't what we want either.

I'm afraid the correct fix is very involved and requires non-trivial
changes in the VFS layer as well.  It's on my long-term TODO list.  I
currently have some patches that do the magic with vfsmounts but it's
far from being usable.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: stat(2) returning device ID not existing in mountinfo
  2016-09-20 13:15 ` Jeff Mahoney
@ 2017-02-13 19:21   ` Goffredo Baroncelli
  2017-02-15 18:25   ` Goffredo Baroncelli
  1 sibling, 0 replies; 5+ messages in thread
From: Goffredo Baroncelli @ 2017-02-13 19:21 UTC (permalink / raw)
  To: Jeff Mahoney, linux-btrfs; +Cc: Tomasz Sterna, Bernhard Voelker

Hi,

I want to highlight this bug another time.

I encountered this bug, when I was looking to a problem with find. I my machine find took an huge quantity of memory (up to 3GB) when used by updatedb.

	http://lists.gnu.org/archive/html/findutils-patches/2016-12/msg00000.html

The root of the problem was a leak memory in find. What was strange however is that the findutils developers weren't unable to reproduce this bug. They find the leak, but not a so high memory usage.

After some email, it was discovered that find (when used in updatedb) checks if the filesystem is changed during the tree walking. They checked the device-id returned by stat against the one returned by mountinfo.

For btrfs these differ, and the check is repeat each time. Because the memory leak was per "filesystem check", using find on a btrfs filesystem caused a huge leak.

I hope that some btrfs developer could address this, because I suspect that a lot of tools compare the device id returned by /proc/self/mountinfo against the one returned by stat(2).


BR
G.Baroncelli




On 2016-09-20 15:15, Jeff Mahoney wrote:
> On 9/16/16 4:28 PM, Tomasz Sterna wrote:
>> Hi.
>>
>> I have spotted an issue with stat(2) call on files on btrfs.
>> It is giving me dev_t st_dev number that does not correspond to any
>> mounted filesystem in proc's mountinfo.
> 
> That's by design.  Your particular file system may only use one device
> but, internally, btrfs uses virtualized storage that may be spread
> across multiple devices.  To make things more complicated, snapshots
> mean that:
> 
> sled1a:/mnt # btrfs sub list .
> ID 257 gen 14 top level 5 path a
> ID 258 gen 14 top level 5 path b
> 
> sled1a:/mnt # ls -laRi
> .:
> total 16
> 256 drwxr-xr-x 1 root root   4 Sep 20 09:08 .
> 256 drwxr-xr-x 1 root root 220 Sep 16 09:49 ..
> 256 drwxr-xr-x 1 root root   8 Sep 14 10:24 a
> 256 drwxr-xr-x 1 root root   8 Sep 14 10:24 b
> 
> ./a:
> total 4112
> 256 drwxr-xr-x 1 root root       8 Sep 14 10:24 .
> 256 drwxr-xr-x 1 root root       4 Sep 20 09:08 ..
> 257 -rw-r--r-- 1 root root 4194304 Sep 14 10:24 file
> 
> ./b:
> total 4112
> 256 drwxr-xr-x 1 root root       8 Sep 14 10:24 .
> 256 drwxr-xr-x 1 root root       4 Sep 20 09:08 ..
> 257 -rw-r--r-- 1 root root 4194304 Sep 14 10:24 file
> 
> Under normal circumstances those are two files with the same st_dev and
> the same inode number.  That would normally correspond to a hard link,
> but the files do not (necessarily) correspond to the same file.
> 
> ... but because we use anonymous device numbers for each subvolume, we
> have different device numbers for each one.
> 
> sled1a:/mnt # stat --format "%n st_dev=%d" {a,b}/file
> a/file st_dev=69
> b/file st_dev=70
> 
> It's a pretty big usability wart that we don't consistently report the
> device number.  We do it correctly in stat() but there are other places
> in the code that assume that inode->i_sb->s_dev will work.  In the SUSE
> kernels, we have patches that add a super_operation to report the
> correct device number everywhere, but even that is a hack.
> 
>> I already attempted a illinformed-patch in fs/btrfs/super.c:
>>
>> @@ -1127,6 +1127,7 @@ static int btrfs_fill_super(struct super_block *sb,
>>  		goto fail_close;
>>  	}
>>  
>> +	sb->s_dev = inode->i_sb->s_dev;
>>  	sb->s_root = d_make_root(inode);
>>  	if (!sb->s_root) {
>>  		err = -ENOMEM;
>>
>> but it didn't help.
> 
> It wouldn't.  That is assigning a variable to itself.
> 
>> I would like to dig deeper and fix it, but first I have to ask:
>> - Which number is wrong?
>>   The one returned by stat() or the one in mountinfo?
> 
> The one in mountinfo, but then that means that the user only sees the
> anonymous devices in mount(8), which isn't what we want either.
> 
> I'm afraid the correct fix is very involved and requires non-trivial
> changes in the VFS layer as well.  It's on my long-term TODO list.  I
> currently have some patches that do the magic with vfsmounts but it's
> far from being usable.
> 
> -Jeff
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: stat(2) returning device ID not existing in mountinfo
  2016-09-20 13:15 ` Jeff Mahoney
  2017-02-13 19:21   ` Goffredo Baroncelli
@ 2017-02-15 18:25   ` Goffredo Baroncelli
  2017-02-17  8:13     ` Duncan
  1 sibling, 1 reply; 5+ messages in thread
From: Goffredo Baroncelli @ 2017-02-15 18:25 UTC (permalink / raw)
  To: Jeff Mahoney, linux-btrfs
  Cc: Tomasz Sterna, Bernhard Voelker, findutils-patches

(Resended because I don saw it in the mailing lists)

-----
Hi,

I want to highlight this bug another time.

I encountered this bug, when I was looking to a problem with find. I my machine find took an huge quantity of memory (up to 3GB) when used by updatedb.

	http://lists.gnu.org/archive/html/findutils-patches/2016-12/msg00000.html

The root of the problem was a leak memory in find. What was strange however is that the findutils developers weren't unable to reproduce this bug. They find the leak, but not a so high memory usage.

After some email, it was discovered that find (when used in updatedb) checks if the filesystem is changed during the tree walking. They checked the device-id returned by stat against the one returned by mountinfo.

For btrfs these differ, and the check is repeat each time. Because the memory leak was per "filesystem check", using find on a btrfs filesystem caused a huge leak.

I hope that some btrfs developer could address this, because I suspect that a lot of tools compare the device id returned by /proc/self/mountinfo against the one returned by stat(2).


BR
G.Baroncelli




On 2016-09-20 15:15, Jeff Mahoney wrote:
> On 9/16/16 4:28 PM, Tomasz Sterna wrote:
>> Hi.
>>
>> I have spotted an issue with stat(2) call on files on btrfs.
>> It is giving me dev_t st_dev number that does not correspond to any
>> mounted filesystem in proc's mountinfo.
> 
> That's by design.  Your particular file system may only use one device
> but, internally, btrfs uses virtualized storage that may be spread
> across multiple devices.  To make things more complicated, snapshots
> mean that:
> 
> sled1a:/mnt # btrfs sub list .
> ID 257 gen 14 top level 5 path a
> ID 258 gen 14 top level 5 path b
> 
> sled1a:/mnt # ls -laRi
> .:
> total 16
> 256 drwxr-xr-x 1 root root   4 Sep 20 09:08 .
> 256 drwxr-xr-x 1 root root 220 Sep 16 09:49 ..
> 256 drwxr-xr-x 1 root root   8 Sep 14 10:24 a
> 256 drwxr-xr-x 1 root root   8 Sep 14 10:24 b
> 
> ./a:
> total 4112
> 256 drwxr-xr-x 1 root root       8 Sep 14 10:24 .
> 256 drwxr-xr-x 1 root root       4 Sep 20 09:08 ..
> 257 -rw-r--r-- 1 root root 4194304 Sep 14 10:24 file
> 
> ./b:
> total 4112
> 256 drwxr-xr-x 1 root root       8 Sep 14 10:24 .
> 256 drwxr-xr-x 1 root root       4 Sep 20 09:08 ..
> 257 -rw-r--r-- 1 root root 4194304 Sep 14 10:24 file
> 
> Under normal circumstances those are two files with the same st_dev and
> the same inode number.  That would normally correspond to a hard link,
> but the files do not (necessarily) correspond to the same file.
> 
> ... but because we use anonymous device numbers for each subvolume, we
> have different device numbers for each one.
> 
> sled1a:/mnt # stat --format "%n st_dev=%d" {a,b}/file
> a/file st_dev=69
> b/file st_dev=70
> 
> It's a pretty big usability wart that we don't consistently report the
> device number.  We do it correctly in stat() but there are other places
> in the code that assume that inode->i_sb->s_dev will work.  In the SUSE
> kernels, we have patches that add a super_operation to report the
> correct device number everywhere, but even that is a hack.
> 
>> I already attempted a illinformed-patch in fs/btrfs/super.c:
>>
>> @@ -1127,6 +1127,7 @@ static int btrfs_fill_super(struct super_block *sb,
>>  		goto fail_close;
>>  	}
>>  
>> +	sb->s_dev = inode->i_sb->s_dev;
>>  	sb->s_root = d_make_root(inode);
>>  	if (!sb->s_root) {
>>  		err = -ENOMEM;
>>
>> but it didn't help.
> 
> It wouldn't.  That is assigning a variable to itself.
> 
>> I would like to dig deeper and fix it, but first I have to ask:
>> - Which number is wrong?
>>   The one returned by stat() or the one in mountinfo?
> 
> The one in mountinfo, but then that means that the user only sees the
> anonymous devices in mount(8), which isn't what we want either.
> 
> I'm afraid the correct fix is very involved and requires non-trivial
> changes in the VFS layer as well.  It's on my long-term TODO list.  I
> currently have some patches that do the magic with vfsmounts but it's
> far from being usable.
> 
> -Jeff
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: stat(2) returning device ID not existing in mountinfo
  2017-02-15 18:25   ` Goffredo Baroncelli
@ 2017-02-17  8:13     ` Duncan
  0 siblings, 0 replies; 5+ messages in thread
From: Duncan @ 2017-02-17  8:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: findutils-patches

Goffredo Baroncelli posted on Wed, 15 Feb 2017 19:25:57 +0100 as
excerpted:

> (Resended because I don saw it in the mailing lists)
> 
> -----
> Hi,
> 
> I want to highlight this bug another time.
> 
> I encountered this bug, when I was looking to a problem with find. I my
> machine find took an huge quantity of memory (up to 3GB) when used by
> updatedb.

FWIW, I saw it (via news.gmane.org list2news service) here.  It's well 
over my head, tho, so I didn't reply, but I found it interesting.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-17  8:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-16 20:28 stat(2) returning device ID not existing in mountinfo Tomasz Sterna
2016-09-20 13:15 ` Jeff Mahoney
2017-02-13 19:21   ` Goffredo Baroncelli
2017-02-15 18:25   ` Goffredo Baroncelli
2017-02-17  8:13     ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).