* ext3_dx_add_entry complains about Directory index full @ 2015-02-04 9:04 Olaf Hering 2015-02-04 10:52 ` Andreas Dilger 0 siblings, 1 reply; 7+ messages in thread From: Olaf Hering @ 2015-02-04 9:04 UTC (permalink / raw) To: linux-ext4 To reduce Jans load I send it here for advice: Today I got these warnings for the backup partition: [ 0.000000] Linux version 3.18.5 (abuild@build23) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Mon Jan 19 09:08:56 UTC 2015 [102565.308869] kjournald starting. Commit interval 5 seconds [102565.315974] EXT3-fs (dm-5): using internal journal [102565.315980] EXT3-fs (dm-5): mounted filesystem with ordered data mode [104406.015708] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! [104406.239904] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! [104406.254162] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! [104406.270793] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! [104406.287443] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! According to google this indicates that the filesystem has more than 32k subdirectories. According to wikipedia this limit can be avoided by enabling the dir_index feature. According to dumpe2fs the feature is enabled already. Does the warning above mean something else? Jan suggested to create a debug image with "e2image -r /dev/dm-5 - | xz > ext3-image.e2i.xz", but this creates more than 250G of private data. I wonder if the math within the kernel is done correctly. If so I will move the data to another drive and reformat the thing with another filesystem. If however the math is wrong somewhere, I'm willing to keep it for a while until the issue is understood. # dumpe2fs -h /dev/dm-5 dumpe2fs 1.41.14 (22-Dec-2010) Filesystem volume name: BACKUP_OLH_500G Last mounted on: /run/media/olaf/BACKUP_OLH_500G Filesystem UUID: f0d41610-a993-4b77-8845-f0f07e37f61d Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 26214400 Block count: 419430400 Reserved block count: 419430 Free blocks: 75040285 Free inodes: 24328812 First block: 1 Block size: 1024 Fragment size: 1024 Reserved GDT blocks: 256 Blocks per group: 8192 Fragments per group: 8192 Inodes per group: 512 Inode blocks per group: 128 Filesystem created: Tue Feb 12 18:24:13 2013 Last mount time: Thu Jan 29 09:15:28 2015 Last write time: Thu Jan 29 09:15:28 2015 Mount count: 161 Maximum mount count: -1 Last checked: Mon May 26 10:09:36 2014 Check interval: 0 (<none>) Lifetime writes: 299 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 55aeb7a2-43ca-4104-ad21-56d7a523dc8f Journal backup: inode blocks Journal features: journal_incompat_revoke Journal size: 32M Journal length: 32768 Journal sequence: 0x000a2725 Journal start: 17366 The backup is done with rsnapshot, which uses hardlinks and rsync to create a new subdir with just the changed files. # for t in d f l ; do echo "type $t: `find /media/BACKUP_OLH_500G/ -xdev -type $t | wc -l`" ; done type d: 1051396 type f: 20824894 type l: 6876 With the hack below I got this output: [14161.626156] scsi 4:0:0:0: Direct-Access ATA ST3500418AS CC45 PQ: 0 ANSI: 5 [14161.626671] sd 4:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB) [14161.626762] sd 4:0:0:0: [sdb] Write Protect is off [14161.626769] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [14161.626810] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [14161.628058] sd 4:0:0:0: Attached scsi generic sg1 type 0 [14161.651340] sdb: sdb1 [14161.651978] sd 4:0:0:0: [sdb] Attached SCSI disk [14176.784403] kjournald starting. Commit interval 5 seconds [14176.790307] EXT3-fs (dm-5): using internal journal [14176.790316] EXT3-fs (dm-5): mounted filesystem with ordered data mode [14596.410693] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! /hourly.0 localhost/olh/maildir/olh-maildir Maildir/old/xen-devel.old/cur 1422000479.29469_1.probook.fritz.box:2,S [15335.342389] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! /hourly.0 localhost/olh/maildir/olh-maildir Maildir/old/xen-devel.old/cur 1422000479.29469_1.probook.fritz.box:2,S diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c index f197736..5022eda 100644 --- a/fs/ext3/namei.c +++ b/fs/ext3/namei.c @@ -1525,11 +1525,20 @@ static int ext3_dx_add_entry(handle_t *handle, struct dentry *dentry, struct dx_entry *entries2; struct dx_node *node2; struct buffer_head *bh2; + struct dentry *parents = dentry->d_parent; + struct dentry *parents2; + unsigned int i = 4; if (levels && (dx_get_count(frames->entries) == dx_get_limit(frames->entries))) { + while (parents && i > 0 && parents->d_parent) + i--, parents = parents->d_parent; + parents2 = parents; + i = 4; + while (parents2 && i > 0 && parents2->d_parent) + i--, parents2 = parents2->d_parent; ext3_warning(sb, __func__, - "Directory index full!"); + "Directory index full! %pd4 %pd4 %pd4 %pd", parents2, parents, dentry->d_parent, dentry); err = -ENOSPC; goto cleanup; } This does not dump the inode yet. I suspect it will point to other hardlinks of the dentry above. Thanks for reading, Olaf ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: ext3_dx_add_entry complains about Directory index full 2015-02-04 9:04 ext3_dx_add_entry complains about Directory index full Olaf Hering @ 2015-02-04 10:52 ` Andreas Dilger 2015-02-04 13:52 ` Olaf Hering 0 siblings, 1 reply; 7+ messages in thread From: Andreas Dilger @ 2015-02-04 10:52 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-ext4 On Feb 4, 2015, at 2:04 AM, Olaf Hering <olaf@aepfle.de> wrote: > Today I got these warnings for the backup partition: > > [ 0.000000] Linux version 3.18.5 (abuild@build23) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Mon Jan 19 09:08:56 UTC 2015 > > [102565.308869] kjournald starting. Commit interval 5 seconds > [102565.315974] EXT3-fs (dm-5): using internal journal > [102565.315980] EXT3-fs (dm-5): mounted filesystem with ordered data mode > [104406.015708] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! > [104406.239904] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! > [104406.254162] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! > [104406.270793] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! > [104406.287443] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! > > According to google this indicates that the filesystem has more than 32k > subdirectories. According to wikipedia this limit can be avoided by > enabling the dir_index feature. According to dumpe2fs the feature is > enabled already. Does the warning above mean something else? How many files/subdirs in this directory? The old ext3 limit was 32000 subdirs, which the dir_index fixed, but the new limit is 65000 subdirs without "dir_index" enabled. The 65000 subdir limit can be exceeded by turning on the "dir_nlink" feature of the filesystem with "tune2fs -O dir_nlink", to allow an "unlimited" number of subdirs (subject to other directory limits, about 10-12M entries for 16-char filenames). The other potential problem is if you create and delete a large number of files from this directory, then the hash tables can become full and the leaf blocks are imbalanced and some become full even when many others are not (htree only has an average leaf fullness of 3/4 of each block). This could probably happen if you have more than 5M files in a long-lived directory in your backup fs. This can be fixed (for some time at least) via "e2fsck -fD" on the unmounted filesystem to compact the directories. We do have patches to allow 3-level hash tables for htree directories in Lustre, instead of the current 2-level maximum. They also increase the maximum directory size beyond 2GB. The last time I brought this up, it didn't seem like it was of interest to others, but maybe opinions changed. http://git.hpdd.intel.com/fs/lustre-release.git/blob/HEAD:/ldiskfs/kernel_patches/patches/sles11sp2/ext4-pdirop.patch It's tangled together with another feature that allows (for Lustre at least) concurrent create/lookup/unlink in a single directory, but there was no interest in getting support for that into the VFS, so we only use it when multiple clients are accessing the directory concurrently. Cheers, Andreas > Jan suggested to create a debug image with "e2image -r /dev/dm-5 - | > xz > ext3-image.e2i.xz", but this creates more than 250G of private data. > > I wonder if the math within the kernel is done correctly. If so I will move the > data to another drive and reformat the thing with another filesystem. > If however the math is wrong somewhere, I'm willing to keep it for a while > until the issue is understood. > > > # dumpe2fs -h /dev/dm-5 > dumpe2fs 1.41.14 (22-Dec-2010) > Filesystem volume name: BACKUP_OLH_500G > Last mounted on: /run/media/olaf/BACKUP_OLH_500G > Filesystem UUID: f0d41610-a993-4b77-8845-f0f07e37f61d > Filesystem magic number: 0xEF53 > Filesystem revision #: 1 (dynamic) > Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file > Filesystem flags: signed_directory_hash > Default mount options: (none) > Filesystem state: clean > Errors behavior: Continue > Filesystem OS type: Linux > Inode count: 26214400 > Block count: 419430400 > Reserved block count: 419430 > Free blocks: 75040285 > Free inodes: 24328812 > First block: 1 > Block size: 1024 > Fragment size: 1024 > Reserved GDT blocks: 256 > Blocks per group: 8192 > Fragments per group: 8192 > Inodes per group: 512 > Inode blocks per group: 128 > Filesystem created: Tue Feb 12 18:24:13 2013 > Last mount time: Thu Jan 29 09:15:28 2015 > Last write time: Thu Jan 29 09:15:28 2015 > Mount count: 161 > Maximum mount count: -1 > Last checked: Mon May 26 10:09:36 2014 > Check interval: 0 (<none>) > Lifetime writes: 299 MB > Reserved blocks uid: 0 (user root) > Reserved blocks gid: 0 (group root) > First inode: 11 > Inode size: 256 > Required extra isize: 28 > Desired extra isize: 28 > Journal inode: 8 > Default directory hash: half_md4 > Directory Hash Seed: 55aeb7a2-43ca-4104-ad21-56d7a523dc8f > Journal backup: inode blocks > Journal features: journal_incompat_revoke > Journal size: 32M > Journal length: 32768 > Journal sequence: 0x000a2725 > Journal start: 17366 > > > > The backup is done with rsnapshot, which uses hardlinks and rsync to create a > new subdir with just the changed files. > > # for t in d f l ; do echo "type $t: `find /media/BACKUP_OLH_500G/ -xdev -type $t | wc -l`" ; done > type d: 1051396 > type f: 20824894 > type l: 6876 > > With the hack below I got this output: > > [14161.626156] scsi 4:0:0:0: Direct-Access ATA ST3500418AS CC45 PQ: 0 ANSI: 5 > [14161.626671] sd 4:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB) > [14161.626762] sd 4:0:0:0: [sdb] Write Protect is off > [14161.626769] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00 > [14161.626810] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > [14161.628058] sd 4:0:0:0: Attached scsi generic sg1 type 0 > [14161.651340] sdb: sdb1 > [14161.651978] sd 4:0:0:0: [sdb] Attached SCSI disk > [14176.784403] kjournald starting. Commit interval 5 seconds > [14176.790307] EXT3-fs (dm-5): using internal journal > [14176.790316] EXT3-fs (dm-5): mounted filesystem with ordered data mode > [14596.410693] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! /hourly.0 localhost/olh/maildir/olh-maildir Maildir/old/xen-devel.old/cur 1422000479.29469_1.probook.fritz.box:2,S > [15335.342389] EXT3-fs (dm-5): warning: ext3_dx_add_entry: Directory index full! /hourly.0 localhost/olh/maildir/olh-maildir Maildir/old/xen-devel.old/cur 1422000479.29469_1.probook.fritz.box:2,S > > > diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c > index f197736..5022eda 100644 > --- a/fs/ext3/namei.c > +++ b/fs/ext3/namei.c > @@ -1525,11 +1525,20 @@ static int ext3_dx_add_entry(handle_t *handle, struct dentry *dentry, > struct dx_entry *entries2; > struct dx_node *node2; > struct buffer_head *bh2; > + struct dentry *parents = dentry->d_parent; > + struct dentry *parents2; > + unsigned int i = 4; > > if (levels && (dx_get_count(frames->entries) == > dx_get_limit(frames->entries))) { > + while (parents && i > 0 && parents->d_parent) > + i--, parents = parents->d_parent; > + parents2 = parents; > + i = 4; > + while (parents2 && i > 0 && parents2->d_parent) > + i--, parents2 = parents2->d_parent; > ext3_warning(sb, __func__, > - "Directory index full!"); > + "Directory index full! %pd4 %pd4 %pd4 %pd", parents2, parents, dentry->d_parent, dentry); > err = -ENOSPC; > goto cleanup; > } > > > This does not dump the inode yet. I suspect it will point to other hardlinks of the dentry above. > > > Thanks for reading, > > Olaf > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ext3_dx_add_entry complains about Directory index full 2015-02-04 10:52 ` Andreas Dilger @ 2015-02-04 13:52 ` Olaf Hering 2015-02-04 16:30 ` Olaf Hering 2015-02-04 21:32 ` Andreas Dilger 0 siblings, 2 replies; 7+ messages in thread From: Olaf Hering @ 2015-02-04 13:52 UTC (permalink / raw) To: Andreas Dilger; +Cc: linux-ext4 On Wed, Feb 04, Andreas Dilger wrote: > How many files/subdirs in this directory? The old ext3 limit was 32000 > subdirs, which the dir_index fixed, but the new limit is 65000 subdirs > without "dir_index" enabled. See below: > > # for t in d f l ; do echo "type $t: `find /media/BACKUP_OLH_500G/ -xdev -type $t | wc -l`" ; done > > type d: 1051396 > > type f: 20824894 > > type l: 6876 > The 65000 subdir limit can be exceeded by turning on the "dir_nlink" > feature of the filesystem with "tune2fs -O dir_nlink", to allow an > "unlimited" number of subdirs (subject to other directory limits, about > 10-12M entries for 16-char filenames). I enabled this using another box, which turned the thing into an ext4 filesystem. Now ext4_dx_add_entry complains. > The other potential problem is if you create and delete a large number > of files from this directory, then the hash tables can become full and > the leaf blocks are imbalanced and some become full even when many others > are not (htree only has an average leaf fullness of 3/4 of each block). > This could probably happen if you have more than 5M files in a long-lived > directory in your backup fs. This can be fixed (for some time at least) > via "e2fsck -fD" on the unmounted filesystem to compact the directories. Ok, will try that. Thanks. Olaf ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ext3_dx_add_entry complains about Directory index full 2015-02-04 13:52 ` Olaf Hering @ 2015-02-04 16:30 ` Olaf Hering 2015-02-04 21:32 ` Andreas Dilger 1 sibling, 0 replies; 7+ messages in thread From: Olaf Hering @ 2015-02-04 16:30 UTC (permalink / raw) To: Andreas Dilger; +Cc: linux-ext4 On Wed, Feb 04, Olaf Hering wrote: > Ok, will try that. Thanks. root@linux-fceg:~ # time env -i /sbin/e2fsck -fDvv /dev/mapper/luks-861f1f73-7037-486a-9a8a-8588367fcf33 e2fsck 1.42.12 (29-Aug-2014) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 3A: Optimizing directories Pass 4: Checking reference counts Pass 5: Checking group summary information BACKUP_OLH_500G: ***** FILE SYSTEM WAS MODIFIED ***** 1886589 inodes used (7.20%, out of 26214400) 38925 non-contiguous files (2.1%) 28851 non-contiguous directories (1.5%) # of inodes with ind/dind/tind blocks: 163156/45817/319 344807093 blocks used (82.21%, out of 419430400) 0 bad blocks 8 large files 859307 regular files 1026949 directories 0 character device files 0 block device files 0 fifos 19504583 links 322 symbolic links (316 fast symbolic links) 2 sockets ------------ 21391163 files real 78m31.853s user 3m24.616s sys 1m20.599s root@probook:~ # dumpe2fs -h /dev/dm-5 dumpe2fs 1.41.14 (22-Dec-2010) Filesystem volume name: BACKUP_OLH_500G Last mounted on: /media/BACKUP_OLH_500G Filesystem UUID: f0d41610-a993-4b77-8845-f0f07e37f61d Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file dir_nlink Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 26214400 Block count: 419430400 Reserved block count: 419430 Free blocks: 74623307 Free inodes: 24327811 First block: 1 Block size: 1024 Fragment size: 1024 Reserved GDT blocks: 256 Blocks per group: 8192 Fragments per group: 8192 Inodes per group: 512 Inode blocks per group: 128 Filesystem created: Tue Feb 12 18:24:13 2013 Last mount time: Wed Feb 4 17:02:29 2015 Last write time: Wed Feb 4 17:02:29 2015 Mount count: 1 Maximum mount count: -1 Last checked: Wed Feb 4 15:08:34 2015 Check interval: 0 (<none>) Lifetime writes: 5039 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 55aeb7a2-43ca-4104-ad21-56d7a523dc8f Journal backup: inode blocks Journal features: journal_incompat_revoke Journal size: 32M Journal length: 32768 Journal sequence: 0x000a3a58 Journal start: 1 But still: [44220.530001] scsi 4:0:0:0: Direct-Access ATA ST3500418AS CC45 PQ: 0 ANSI: 5 [44220.530455] sd 4:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB) [44220.530548] sd 4:0:0:0: [sdb] Write Protect is off [44220.530557] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [44220.530596] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [44220.534670] sd 4:0:0:0: Attached scsi generic sg1 type 0 [44220.549014] sdb: sdb1 [44220.549721] sd 4:0:0:0: [sdb] Attached SCSI disk [44238.004550] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [45191.549831] EXT4-fs warning (device dm-5): ext4_dx_add_entry:1990: Directory index full! Guess its time to wipe it and go with something else. Olaf ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ext3_dx_add_entry complains about Directory index full 2015-02-04 13:52 ` Olaf Hering 2015-02-04 16:30 ` Olaf Hering @ 2015-02-04 21:32 ` Andreas Dilger 2015-02-05 9:19 ` Olaf Hering 1 sibling, 1 reply; 7+ messages in thread From: Andreas Dilger @ 2015-02-04 21:32 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-ext4 On Feb 4, 2015, at 6:52 AM, Olaf Hering <olaf@aepfle.de> wrote: > On Wed, Feb 04, Andreas Dilger wrote: > >> How many files/subdirs in this directory? The old ext3 limit was 32000 >> subdirs, which the dir_index fixed, but the new limit is 65000 subdirs >> without "dir_index" enabled. > > See below: > >>> # for t in d f l ; do echo "type $t: `find /media/BACKUP_OLH_500G/ -xdev -type $t | wc -l`" ; done >>> type d: 1051396 >>> type f: 20824894 >>> type l: 6876 Is "BACKUP_OLH_500G" a single large directory with 1M directories and 20M files in it? In that case, you are hitting the limits for the current ext4 directory size with 20M+ entries. Otherwise, I would expect you have subdirectories, and the link/count limits are per directory so getting these numbers for the affected directory are what is important. Running something like http://www.pdsi-scidac.org/fsstats/ can give you a good idea of what the file/directory size/age/counts min/max/avg distributions are like for your filesystem. Finding the largest directories with something like: find /media/BACKUP_OLH_500G -type d -size +10M -ls would tell us how big your directories actually are. The fsstats data will also tell you what the min/max/avg filename length is, which may also be a factor. It would be surprising that you have such a large directory in a single backup. We typically test up to 10M files in a single directory. > root@linux-fceg:~ # time env -i /sbin/e2fsck -fDvv /dev/mapper/luks-861f1f73-7037-486a-9a8a-8588367fcf33 > e2fsck 1.42.12 (29-Aug-2014) > 859307 regular files > 1026949 directories > 19504583 links This implies that you have only 1.8M in-use files, while the above reports 20M filenames, almost all of them hard links (about 23 links per file). That said, the error being reported is on the name insert and not on the link counts, so either there are some directories with huge numbers of files or the file names are so long that it causes the directory leaves to fill up very quickly. > Block size: 1024 AH! This is the root of your problem. Formatting with 1024-byte blocks means that the two-level directory hash tree can only hold about 128^2 * (1024 / filename_length * 3 / 4) entries, maybe 500k entries or less if the names are long. This wouldn't be the default for a 500GB filesystem, but maybe you picked that to optimize space usage of small files a bit? Definitely 1KB blocksize is not optimal for performance, and 4KB is much better. Unfortunately, you need to reformat to get to 4kB blocks. Cheers, Andreas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ext3_dx_add_entry complains about Directory index full 2015-02-04 21:32 ` Andreas Dilger @ 2015-02-05 9:19 ` Olaf Hering 2015-02-06 6:52 ` Andreas Dilger 0 siblings, 1 reply; 7+ messages in thread From: Olaf Hering @ 2015-02-05 9:19 UTC (permalink / raw) To: Andreas Dilger; +Cc: linux-ext4 On Wed, Feb 04, Andreas Dilger wrote: > On Feb 4, 2015, at 6:52 AM, Olaf Hering <olaf@aepfle.de> wrote: > > On Wed, Feb 04, Andreas Dilger wrote: > > > >> How many files/subdirs in this directory? The old ext3 limit was 32000 > >> subdirs, which the dir_index fixed, but the new limit is 65000 subdirs > >> without "dir_index" enabled. > > > > See below: > > > >>> # for t in d f l ; do echo "type $t: `find /media/BACKUP_OLH_500G/ -xdev -type $t | wc -l`" ; done > >>> type d: 1051396 > >>> type f: 20824894 > >>> type l: 6876 > > Is "BACKUP_OLH_500G" a single large directory with 1M directories and > 20M files in it? In that case, you are hitting the limits for the > current ext4 directory size with 20M+ entries. Its organized in subdirs named hourly.{0..23} daily.{0.6} weekly.{0..3} monthly.{0..11}. > Finding the largest directories with something like: > > find /media/BACKUP_OLH_500G -type d -size +10M -ls > > would tell us how big your directories actually are. The fsstats data > will also tell you what the min/max/avg filename length is, which may > also be a factor. There is no output from this find command for large directories. > > Block size: 1024 > > AH! This is the root of your problem. Formatting with 1024-byte > blocks means that the two-level directory hash tree can only hold > about 128^2 * (1024 / filename_length * 3 / 4) entries, maybe 500k > entries or less if the names are long. > > This wouldn't be the default for a 500GB filesystem, but maybe you > picked that to optimize space usage of small files a bit? Definitely > 1KB blocksize is not optimal for performance, and 4KB is much better. Yes, I used 1024 blocksize to not waste space for the many small files. I wonder what other filesystem would be able to cope? Does xfs or btrfs do any better for these kind of data? Thanks for the feedback! Olaf ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ext3_dx_add_entry complains about Directory index full 2015-02-05 9:19 ` Olaf Hering @ 2015-02-06 6:52 ` Andreas Dilger 0 siblings, 0 replies; 7+ messages in thread From: Andreas Dilger @ 2015-02-06 6:52 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-ext4 On Feb 5, 2015, at 2:19 AM, Olaf Hering <olaf@aepfle.de> wrote: > On Wed, Feb 04, Andreas Dilger wrote: >> >> Finding the largest directories with something like: >> >> find /media/BACKUP_OLH_500G -type d -size +10M -ls >> >> would tell us how big your directories actually are. The fsstats data >> will also tell you what the min/max/avg filename length is, which may >> also be a factor. > > There is no output from this find command for large directories. I tested a 1KB blocksize filesystem, and the actual directory size was only about 1.8MB when it ran out of space in the htree. That worked out to be about 250k 12-character filenames in a single directory. Even doubling the blocksize to 2KB you would get 2^3=8x as many entries in the directory (twice as many internal blocks in each of the two htree levels, and the leaf blocks are twice as large). That would give you about 2M entries in a single directory, and I doubt it would significantly impact the space usage unless you are mostly backing up small files. >>> Block size: 1024 >> >> AH! This is the root of your problem. Formatting with 1024-byte >> blocks means that the two-level directory hash tree can only hold >> about 128^2 * (1024 / filename_length * 3 / 4) entries, maybe 500k >> entries or less if the names are long. >> >> This wouldn't be the default for a 500GB filesystem, but maybe you >> picked that to optimize space usage of small files a bit? Definitely >> 1KB blocksize is not optimal for performance, and 4KB is much better. > > Yes, I used 1024 blocksize to not waste space for the many small files. >>> Inode count: 26214400 >>> Block count: 419430400 >>> Reserved block count: 419430 >>> Free blocks: 75040285 >>> Free inodes: 24328812 You are using (419430400 - 75040285 - 419430) = 343970685 blocks for (26214400 - 24328812) = 1885588 files, which is an average file size of 182KB. You currently "waste" about half a block per file (0.5KB/file), so 1885588 * 0.5 = 920MB = 1/500 = 0.2% of your filesystem due to partially-used blocks at the end of every file. With a 2KB blocksize this would increase to about 1840MB or 0.4%, which really isn't very much space on a modern drive. >>> # of inodes with ind/dind/tind blocks: 163156/45817/319 However, there would also be increased efficiency because of fewer index blocks. These indirect blocks are currently at least 163156 + 45817 * (1024 / 4 / 2 + 1) + 319 * (1024 / 4 + 1) = 6155532 KB or 6011MB of space, which is much more than you have saved due to having the small blocksize. If you formatted the filesystem with "-t ext4" (enables "extents" among other things) there would likely be no indirect/index blocks at all since extent-mapped inodes can directly address 256MB directly from the inode (assuming fragmentation is not too bad) on a 2KB blocksize filesystem. You get other benefits from reformatting with "-t ext4" like flex_bg and uninit_bg that can speed up e2fsck times significantly. > I wonder what other filesystem would be able to cope? Does xfs or btrfs > do any better for these kind of data? I can't really say, since I've never used those filesystems. I suspect you could do much better to increase the blocksize on ext4 than what you have now. Cheers, Andreas ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-02-06 6:52 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-04 9:04 ext3_dx_add_entry complains about Directory index full Olaf Hering 2015-02-04 10:52 ` Andreas Dilger 2015-02-04 13:52 ` Olaf Hering 2015-02-04 16:30 ` Olaf Hering 2015-02-04 21:32 ` Andreas Dilger 2015-02-05 9:19 ` Olaf Hering 2015-02-06 6:52 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox