From: Zach Brown <zab@redhat.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH 2/6] btrfs: fix readdir hang with offsets past INT_MAX
Date: Tue, 4 Jun 2013 15:17:56 -0700 [thread overview]
Message-ID: <1370384280-28652-3-git-send-email-zab@redhat.com> (raw)
In-Reply-To: <1370384280-28652-1-git-send-email-zab@redhat.com>
To work around bugs in userspace btrfs_real_readdir() sets f_pos to an
offset that will prevent any future entries from being returned once the
last entry is hit. Over time this supposedly impossible offset was
decreased from the initial U64_MAX to INT_MAX to appease 32bit
userspace.
https://oss.oracle.com/pipermail/btrfs-devel/2008-January/000437.html
commit c2a8b6e11009398ca9363d8ba8d4e7e40fb897fd
commit 89f135d8b53bcccafd91a075366d2704ba257cf3
commit 406266ab9ac8ed8b085c58aacd9e3161480dc5d5
The remaining problem is that resetting f_pos to some impossible offset
causes userspace to spin when it's, well, possible for an entry to have
that offset. It takes a single thread on a modern cpu about nine hours
of constant file creation and removal to hit an offset past INT_MAX on a
single spindle.
Instead of trying to find an impossible f_pos that doesn't break various
layers of the stack, let's use f_version to indicate that readdir should
stop returning entries until seek changes f_pos and clears f_version.
Signed-off-by: Zach Brown <zab@redhat.com>
---
fs/btrfs/inode.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6a5784b..e6e2b86 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4983,6 +4983,16 @@ unsigned char btrfs_filetype_table[] = {
DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
};
+/*
+ * There have been buggy applications that can't handle one readdir pass
+ * returning the same name for different inodes that are unlinked and
+ * re-created during the readdir pass. This was partially worked around
+ * by trying to set f_pos to magic values that broke either 32bit userspace
+ * or entries with huge offsets. Now we set f_version to a magic value
+ * which prevents readdir results until seek resets f_pos and f_version.
+ */
+#define BTRFS_READDIR_EOF ~0ULL
+
static int btrfs_real_readdir(struct file *filp, void *dirent,
filldir_t filldir)
{
@@ -5008,6 +5018,9 @@ static int btrfs_real_readdir(struct file *filp, void *dirent,
char *name_ptr;
int name_len;
+ if (filp->f_version == BTRFS_READDIR_EOF)
+ return 0;
+
/* FIXME, use a real flag for deciding about the key type */
if (root->fs_info->tree_root == root)
key_type = BTRFS_DIR_ITEM_KEY;
@@ -5145,14 +5158,9 @@ next:
goto nopos;
}
- /* Reached end of directory/root */
- if (key_type == BTRFS_DIR_INDEX_KEY) {
- /*
- * 32-bit glibc will use getdents64, but then strtol -
- * so the last number we can serve is this.
- */
- filp->f_pos = 0x7fffffff;
- }
+ /* prevent further readdir results without seeking once we hit EOF */
+ if (key_type == BTRFS_DIR_INDEX_KEY)
+ filp->f_version = BTRFS_READDIR_EOF;
nopos:
ret = 0;
--
1.7.11.7
next prev parent reply other threads:[~2013-06-04 22:18 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-04 22:17 [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups Zach Brown
2013-06-04 22:17 ` [PATCH 1/6] btrfs: set readdir f_pos only after filldir Zach Brown
2013-06-05 1:19 ` Miao Xie
2013-06-04 22:17 ` Zach Brown [this message]
2013-06-04 22:17 ` [PATCH 3/6] btrfs: trivial delayed item readdir list cleanups Zach Brown
2013-06-04 22:17 ` [PATCH 4/6] btrfs: simplify finding next/prev delayed items Zach Brown
2013-06-04 22:17 ` [PATCH 5/6] btrfs: add helper to get delayed item root Zach Brown
2013-06-04 22:18 ` [PATCH 6/6] btrfs: get fewer delayed item refs during readdir Zach Brown
2013-06-04 23:16 ` [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups Chris Mason
2013-06-04 23:26 ` Zach Brown
2013-06-05 2:34 ` Miao Xie
2013-06-05 13:36 ` David Sterba
2013-06-06 1:35 ` Miao Xie
2013-06-06 13:55 ` David Sterba
2013-06-06 14:32 ` Chris Mason
2013-06-10 22:39 ` Zach Brown
2013-06-12 12:59 ` Chris Mason
2013-07-01 12:54 ` Josef Bacik
2013-07-01 13:18 ` Chris Mason
2013-07-01 16:10 ` Zach Brown
2013-07-01 17:18 ` Chris Mason
2013-07-11 23:19 ` Zach Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1370384280-28652-3-git-send-email-zab@redhat.com \
--to=zab@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).