public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Elana Hashman <Elana.Hashman@twosigma.com>
Cc: "'tytso@mit.edu'" <tytso@mit.edu>,
	"'linux-ext4@vger.kernel.org'" <linux-ext4@vger.kernel.org>
Subject: Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels
Date: Thu, 8 Nov 2018 10:47:22 -0800	[thread overview]
Message-ID: <20181108184722.GB27852@magnolia> (raw)
In-Reply-To: <9abbdde6145a4887a8d32c65974f7832@exmbdft5.ad.twosigma.com>

On Thu, Nov 08, 2018 at 05:59:18PM +0000, Elana Hashman wrote:
> Hi Ted,
> 
> We've run into a mysterious "phantom" full filesystem issue on our Kubernetes fleet. We initially encountered this issue on kernel 4.1.35, but are still experiencing the problem after upgrading to 4.14.67. Essentially, `df` reports our root filesystems as full and they behave as though they are full, but the "used" space cannot be accounted for. Rebooting the system, remounting the root filesystem read-only and then remounting as read-write, or booting into single-user mode all free up the "used" space. The disk slowly fills up over time, suggesting that there might be some kind of leak; we previously saw this affecting hosts with ~200 days of uptime on the 4.1 kernel, but are now seeing it affect a 4.14 host with only ~70 days of uptime.
> 
> Here is some data from an example host, running the 4.14.67 kernel. The root disk is ext4.
> 
> $ uname -a
> Linux <hostname> 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux
> $ grep ' / ' /proc/mounts
> /dev/disk/by-uuid/<some-uuid> / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
> 
> `df` reports 0 bytes free:
> 
> $ df -h /
> Filesystem                                              Size  Used Avail Use% Mounted on
> /dev/disk/by-uuid/<some-uuid>   50G   48G     0 100% /

This is very odd.  I wonder, how many of those overlayfses are still
mounted on the system at this point?  Over in xfs land we've discovered
that overlayfs subtly changes the lifetime behavior of incore inodes,
maybe that's what's going on here?  (Pure speculation on my part...)

> Deleted, open files account for almost no disk capacity:
> 
> $ sudo lsof -a +L1 /
> COMMAND    PID   USER   FD   TYPE DEVICE SIZE/OFF NLINK    NODE NAME
> java      5313 user    3r   REG    8,3  6806312     0 1315847 /var/lib/sss/mc/passwd (deleted)
> java      5313 user   11u   REG    8,3    55185     0 2494654 /tmp/classpath.1668Gp (deleted)
> system_ar 5333 user    3r   REG    8,3  6806312     0 1315847 /var/lib/sss/mc/passwd (deleted)
> java      5421 user    3r   REG    8,3  6806312     0 1315847 /var/lib/sss/mc/passwd (deleted)
> java      5421 user   11u   REG    8,3   149313     0 2494486 /tmp/java.fzTwWp (deleted)
> java      5421 tsdist   12u   REG    8,3    55185     0 2500513 /tmp/classpath.7AmxHO (deleted)
> 
> `du` can only account for 16GB of file usage:
> 
> $ sudo du -hxs /
> 16G     /
> 
> But what is most puzzling is the numbers reported by e2freefrag, which don't add up:
> 
> $ sudo e2freefrag /dev/disk/by-uuid/<some-uuid>
> Device: /dev/disk/by-uuid/<some-uuid>
> Blocksize: 4096 bytes
> Total blocks: 13107200
> Free blocks: 7778076 (59.3%)
> 
> Min. free extent: 4 KB
> Max. free extent: 8876 KB
> Avg. free extent: 224 KB
> Num. free extent: 6098
> 
> HISTOGRAM OF FREE EXTENT SIZES:
> Extent Size Range :  Free extents   Free Blocks  Percent
>     4K...    8K-  :          1205          1205    0.02%
>     8K...   16K-  :           980          2265    0.03%
>    16K...   32K-  :           653          3419    0.04%
>    32K...   64K-  :          1337         15374    0.20%
>    64K...  128K-  :           631         14151    0.18%
>   128K...  256K-  :           224         10205    0.13%
>   256K...  512K-  :           261         23818    0.31%
>   512K... 1024K-  :           303         56801    0.73%
>     1M...    2M-  :           387        135907    1.75%
>     2M...    4M-  :           103         64740    0.83%
>     4M...    8M-  :            12         15005    0.19%
>     8M...   16M-  :             2          4267    0.05%

Clearly a bug in e2freefrag, the percentages are supposed to sum to 100.
Patches soon.

--D

> 
> This looks like a bug to me; the histogram in the manpage example has percentages that add up to 100% but this doesn't even add up to 5%.
> 
> After a reboot, `df` reflects real utilization:
> 
> $ df -h /
> Filesystem                                              Size  Used Avail Use% Mounted on
> /dev/disk/by-uuid/<some-uuid>   50G   16G   31G  34% /
> 
> We are using overlay2fs for Docker, as well as rbd mounts; I'm not sure how they might interact.
> 
> Thanks for your help,
> 
> --
> Elana Hashman
> ehashman@twosigma.com

  parent reply	other threads:[~2018-11-09  4:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-08 17:59 Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Elana Hashman
2018-11-08 18:13 ` Reindl Harald
2018-11-08 18:20   ` Elana Hashman
2018-11-08 18:47 ` Darrick J. Wong [this message]
2018-12-05 16:26   ` Elana Hashman
2019-01-23 19:59     ` Thomas Walker
2019-06-26 15:17       ` Thomas Walker
2019-07-11  9:23         ` Jan Kara
2019-07-11 14:40           ` Geoffrey Thomas
2019-07-11 15:23             ` Jan Kara
2019-07-11 17:10             ` Theodore Ts'o
2019-07-12 19:19               ` Thomas Walker
2019-07-12 20:28                 ` Theodore Ts'o
2019-07-12 21:47                   ` Geoffrey Thomas
2019-07-25 21:22                     ` Geoffrey Thomas
2019-07-29 10:09                       ` Jan Kara
2019-07-29 11:18                         ` ext4 file system is constantly writing to the block device with no activity from the applications, is it a bug? Dmitrij Gusev
2019-07-29 12:55                           ` Theodore Y. Ts'o
2019-07-29 21:12                             ` Dmitrij Gusev
2019-01-24  1:54 ` Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Liu Bo
2019-01-24 14:40   ` Elana Hashman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181108184722.GB27852@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=Elana.Hashman@twosigma.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox