From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mxo2.dft.dmz.twosigma.com ([208.77.212.182]:54197 "EHLO mxo2.dft.dmz.twosigma.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726751AbeKIDpE (ORCPT ); Thu, 8 Nov 2018 22:45:04 -0500 From: Elana Hashman To: "'tytso@mit.edu'" CC: "'linux-ext4@vger.kernel.org'" Subject: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Date: Thu, 8 Nov 2018 17:59:18 +0000 Message-ID: <9abbdde6145a4887a8d32c65974f7832@exmbdft5.ad.twosigma.com> Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org List-ID: Hi Ted, We've run into a mysterious "phantom" full filesystem issue on our Kubernetes fleet. We initially encountered this issue on kernel 4.1.35, but are still experiencing the problem after upgrading to 4.14.67. Essentially, `df` reports our root filesystems as full and they behave as though they are full, but the "used" space cannot be accounted for. Rebooting the system, remounting the root filesystem read-only and then remounting as read-write, or booting into single-user mode all free up the "used" space. The disk slowly fills up over time, suggesting that there might be some kind of leak; we previously saw this affecting hosts with ~200 days of uptime on the 4.1 kernel, but are now seeing it affect a 4.14 host with only ~70 days of uptime. Here is some data from an example host, running the 4.14.67 kernel. The root disk is ext4. $ uname -a Linux 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux $ grep ' / ' /proc/mounts /dev/disk/by-uuid/ / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0 `df` reports 0 bytes free: $ df -h / Filesystem                                              Size  Used Avail Use% Mounted on /dev/disk/by-uuid/   50G 48G     0 100% / Deleted, open files account for almost no disk capacity: $ sudo lsof -a +L1 / COMMAND    PID   USER FD   TYPE DEVICE SIZE/OFF NLINK    NODE NAME java      5313 user    3r   REG    8,3  6806312     0 1315847 /var/lib/sss/mc/passwd (deleted) java      5313 user   11u REG    8,3    55185     0 2494654 /tmp/classpath.1668Gp (deleted) system_ar 5333 user    3r   REG    8,3  6806312     0 1315847 /var/lib/sss/mc/passwd (deleted) java      5421 user    3r   REG    8,3  6806312     0 1315847 /var/lib/sss/mc/passwd (deleted) java      5421 user   11u REG    8,3   149313     0 2494486 /tmp/java.fzTwWp (deleted) java      5421 tsdist   12u REG    8,3    55185     0 2500513 /tmp/classpath.7AmxHO (deleted) `du` can only account for 16GB of file usage: $ sudo du -hxs / 16G     / But what is most puzzling is the numbers reported by e2freefrag, which don't add up: $ sudo e2freefrag /dev/disk/by-uuid/ Device: /dev/disk/by-uuid/ Blocksize: 4096 bytes Total blocks: 13107200 Free blocks: 7778076 (59.3%) Min. free extent: 4 KB Max. free extent: 8876 KB Avg. free extent: 224 KB Num. free extent: 6098 HISTOGRAM OF FREE EXTENT SIZES: Extent Size Range :  Free extents Free Blocks  Percent     4K...    8K-  :      1205          1205    0.02%     8K...   16K- :          980          2265    0.03%   16K...   32K- :         653          3419    0.04%   32K...   64K- :        1337         15374    0.20%   64K...  128K- :           631         14151    0.18%  128K...  256K- :          224         10205    0.13%  256K...  512K- :          261         23818    0.31%  512K... 1024K-  :     303         56801    0.73%     1M...    2M-  :       387        135907    1.75%     2M...    4M-  :       103         64740    0.83%     4M...    8M-  :        12         15005    0.19%     8M...   16M- :            2          4267    0.05% This looks like a bug to me; the histogram in the manpage example has percentages that add up to 100% but this doesn't even add up to 5%. After a reboot, `df` reflects real utilization: $ df -h / Filesystem                                              Size  Used Avail Use% Mounted on /dev/disk/by-uuid/   50G 16G 31G 34% / We are using overlay2fs for Docker, as well as rbd mounts; I'm not sure how they might interact. Thanks for your help, -- Elana Hashman ehashman@twosigma.com