From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>, Theodore Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: block allocator issue with ext4+DAX
Date: Wed, 30 Mar 2016 16:01:29 -0600 [thread overview]
Message-ID: <20160330220129.GA9101@linux.intel.com> (raw)
I've hit an issue in my testing which I believe to be related to the ext4
block allocator when using the DAX mount option. I originally found this
issue with the generic/102 xfstest, but have reduced it to the minimal
reproducer at the bottom of this email. I've been able to reproduce this with
both BRD and with PMEM as the underlying block device.
For this test we're running in a very small filesystem, only 512 MiB. We
fallocate() 400 MiB of that space, unlink the file, then try and rewrite that
400 MiB file one chunk at a time.
What actually happens is that during the rewrite we run out of memory and the
DAX call to get_block() in dax_io() fails with -ENOSPC.
Here are the steps to reproduce this issue:
# fdisk -l /dev/ram0
Disk /dev/ram0: 1 GiB, 1073741824 bytes, 2097152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
# mkfs.ext4 /dev/ram0 512M
# mount /dev/ram0 /mnt
# gcc -o test test.c
# ./test # success!
# umount /mnt
# mount -o dax /dev/ram0 /mnt # requires CONFIG_BLK_DEV_RAM_DAX
# ./test # failure
Partial write - only 577536 written
This test succeeds with xfs, ext2, and with ext4 without the DAX mount option.
I've also tried it with O_DIRECT, and that has the same behavior - we succeed
without DAX and fail with DAX.
Another clue is that a sync() call in the middle of the test between the
unlink and the following writes clears up the issue.
Something that might be related is the output in
/proc/fs/ext4/ram0/mb_groups. Here is that output when we're in a good
state, and the writes will succeed:
#group: free frags first [ 2^0 2^1 2^2 2^3 2^4 2^5 2^6 2^7 2^8 2^9 2^10 2^11 2^12 2^13 ]
#0 : 30673 1 2095 [ 1 0 0 0 1 0 1 1 1 1 1 0 1 3 ]
#1 : 32735 1 33 [ 1 1 1 1 1 0 1 1 1 1 1 1 1 3 ]
#2 : 28672 1 4096 [ 0 0 0 0 0 0 0 0 0 0 0 0 1 3 ]
#3 : 32735 1 33 [ 1 1 1 1 1 0 1 1 1 1 1 1 1 3 ]
Here is the output in that file when we're in a bad state, and our writes are
about to fail:
#group: free frags first [ 2^0 2^1 2^2 2^3 2^4 2^5 2^6 2^7 2^8 2^9 2^10 2^11 2^12 2^13 ]
#0 : 18385 1 14383 [ 1 0 0 0 1 0 1 1 1 1 1 0 0 2 ]
#1 : 2015 1 33 [ 1 1 1 1 1 0 1 1 1 1 1 0 0 0 ]
#2 : 0 0 32768 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
#3 : 2015 1 33 [ 1 1 1 1 1 0 1 1 1 1 1 0 0 0 ]
It appears as though we've exhausted group #2. Interestingly, if I run sync()
at this point it takes us from the bad output to the good, which leads me to
believe the newly unlinked blocks in group #2 are finally being freed back
into that group for reallocation or something. (I've clearly reached the
limits of my ext4-fu. :) )
I'm happy to help test proposed fixes.
Thanks,
- Ross
---
#define _GNU_SOURCE
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define MB(a) ((a)*1024ULL*1024)
int main(int argc, char *argv[])
{
int i, fd, ret;
void *buffer;
buffer = malloc(MB(1));
fd = open("/mnt/file", O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);
if (fd < 0) {
perror("fd");
return 1;
}
ret = fallocate(fd, 0, 0, MB(400));
if (ret) {
perror("fallocate");
return 1;
}
close(fd);
unlink("/mnt/file");
/* a sync() call here makes the DAX case of this test pass */
// sync();
fd = open("/mnt/file", O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);
if (fd < 0) {
perror("fd");
return 1;
}
for (i = 0; i < 400; i++) {
ret = write(fd, buffer, MB(1));
if (ret < 0) {
perror("write");
return 1;
} else if (ret != MB(1)) {
fprintf(stderr, "Partial write - only %lu written\n",
ret);
return 1;
}
}
close(fd);
free(buffer);
return 0;
}
next reply other threads:[~2016-03-30 22:01 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-30 22:01 Ross Zwisler [this message]
2016-03-31 8:59 ` block allocator issue with ext4+DAX Jan Kara
2016-03-31 15:13 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160330220129.GA9101@linux.intel.com \
--to=ross.zwisler@linux.intel.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).