* [PATCH] vfs: clear remainder of 'full_fds_bits' in dup_fd()
@ 2015-11-06 3:42 Eric Biggers
2015-11-06 4:15 ` Linus Torvalds
0 siblings, 1 reply; 5+ messages in thread
From: Eric Biggers @ 2015-11-06 3:42 UTC (permalink / raw)
To: linux-fsdevel; +Cc: linux-kernel, viro, edumazet, torvalds, Eric Biggers
This fixes a bug from commit f3f86e33dc3d ("vfs: Fix pathological
performance case for __alloc_fd()").
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
---
fs/file.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index 6f6eb2b..36e5103 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -276,7 +276,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
{
struct files_struct *newf;
struct file **old_fds, **new_fds;
- int open_files, size, i;
+ int open_files, i;
struct fdtable *old_fdt, *new_fdt;
*errorp = -ENOMEM;
@@ -357,18 +357,16 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
}
spin_unlock(&oldf->file_lock);
- /* compute the remainder to be cleared */
- size = (new_fdt->max_fds - open_files) * sizeof(struct file *);
-
- /* This is long word aligned thus could use a optimized version */
- memset(new_fds, 0, size);
-
+ /* clear the remainder if needed */
if (new_fdt->max_fds > open_files) {
- int left = (new_fdt->max_fds - open_files) / 8;
+ int left = new_fdt->max_fds - open_files;
int start = open_files / BITS_PER_LONG;
- memset(&new_fdt->open_fds[start], 0, left);
- memset(&new_fdt->close_on_exec[start], 0, left);
+ memset(new_fds, 0, left * sizeof(struct file *));
+ memset(&new_fdt->open_fds[start], 0, left / 8);
+ memset(&new_fdt->close_on_exec[start], 0, left / 8);
+ memset(&new_fdt->full_fds_bits[BITBIT_NR(open_files)], 0,
+ BITBIT_SIZE(new_fdt->max_fds) - BITBIT_SIZE(open_files));
}
rcu_assign_pointer(newf->fdt, new_fdt);
--
2.6.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] vfs: clear remainder of 'full_fds_bits' in dup_fd()
2015-11-06 3:42 [PATCH] vfs: clear remainder of 'full_fds_bits' in dup_fd() Eric Biggers
@ 2015-11-06 4:15 ` Linus Torvalds
2015-11-06 4:33 ` Eric Biggers
2015-11-06 6:32 ` Eric Biggers
0 siblings, 2 replies; 5+ messages in thread
From: Linus Torvalds @ 2015-11-06 4:15 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-fsdevel, Linux Kernel Mailing List, Al Viro, Eric Dumazet
On Thu, Nov 5, 2015 at 7:42 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
> This fixes a bug from commit f3f86e33dc3d ("vfs: Fix pathological
> performance case for __alloc_fd()").
Gaah. Good catch. The first version of that patch allocated the
full_fds_bits array separately and always cleared it (using kzalloc),
but then people hated on that.. So I did the "obvious" thing to just
allocate it as part of the same allocation as open_fds. And didn't
think about that thay does to the dup_fd() code.
That said, the more I look at your patch, the more I hate it. Not
because your patch is wrong, but because dup_fd() is such a nasty
horrid mess, and your patch looks bad just because that function is
bad.
Just look at what "copy_fdtable()" does: it's the exact same bitmap
copy, but it actually makes more sense there. Well, at least it's more
compact without the very odd "let's drop the spinlock in the middle
and clear the end later. That optimization just doesn't seem to be
worth it. Especially since we don't do it for the expend_fdtable()
case.
So what I would do is to just extract out the bitmap copying from
"copy_fdtable()" (call it "copy_fd_bitmaps()"?) and then use that for
both copy_fdtable() and for dup_fd(). And then I guess you could leave
the
memset(new_fds, 0, size);
outside the spinlock still, but at least have the bitmap copying in
one sane place rather than spread out oddly like that.
Would you mind doing that version of the patch instead? I can do it
too, but I'd rather give authorship to you, since you found the stupid
issue, which is actually the much bigger deal.
Linus
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] vfs: clear remainder of 'full_fds_bits' in dup_fd()
2015-11-06 4:15 ` Linus Torvalds
@ 2015-11-06 4:33 ` Eric Biggers
2015-11-06 6:32 ` Eric Biggers
1 sibling, 0 replies; 5+ messages in thread
From: Eric Biggers @ 2015-11-06 4:33 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-fsdevel, Linux Kernel Mailing List, Al Viro, Eric Dumazet
On Thu, Nov 05, 2015 at 08:15:49PM -0800, Linus Torvalds wrote:
> Would you mind doing that version of the patch instead? I can do it
> too, but I'd rather give authorship to you, since you found the stupid
> issue, which is actually the much bigger deal.
Sure, I'll try it that way and see what it looks like.
Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] vfs: clear remainder of 'full_fds_bits' in dup_fd()
2015-11-06 4:15 ` Linus Torvalds
2015-11-06 4:33 ` Eric Biggers
@ 2015-11-06 6:32 ` Eric Biggers
2015-11-06 7:07 ` Linus Torvalds
1 sibling, 1 reply; 5+ messages in thread
From: Eric Biggers @ 2015-11-06 6:32 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-fsdevel, Linux Kernel Mailing List, Al Viro, Eric Dumazet
Here's the revised patch:
vfs: clear remainder of 'full_fds_bits' in dup_fd()
This fixes a bug from commit f3f86e33dc3d ("vfs: Fix pathological
performance case for __alloc_fd()").
v2: refactor to share fd bitmap copying code
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
---
fs/file.c | 64 +++++++++++++++++++++++++++++++--------------------------------
1 file changed, 32 insertions(+), 32 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index 6f6eb2b..a6ff7c3 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -60,8 +60,31 @@ static void free_fdtable_rcu(struct rcu_head *rcu)
#define BITBIT_SIZE(nr) (BITBIT_NR(nr) * sizeof(long))
/*
- * Expand the fdset in the files_struct. Called with the files spinlock
- * held for write.
+ * Copy 'count' fd bits from the old table to the new table and clear the extra
+ * space if any. This does not copy the file pointers. Called with the files
+ * spinlock held for write.
+ */
+static void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt,
+ unsigned int count)
+{
+ unsigned int cpy, set;
+
+ cpy = count / BITS_PER_BYTE;
+ set = (nfdt->max_fds - count) / BITS_PER_BYTE;
+ memcpy(nfdt->open_fds, ofdt->open_fds, cpy);
+ memset((char *)nfdt->open_fds + cpy, 0, set);
+ memcpy(nfdt->close_on_exec, ofdt->close_on_exec, cpy);
+ memset((char *)nfdt->close_on_exec + cpy, 0, set);
+
+ cpy = BITBIT_SIZE(count);
+ set = BITBIT_SIZE(nfdt->max_fds) - cpy;
+ memcpy(nfdt->full_fds_bits, ofdt->full_fds_bits, cpy);
+ memset((char *)nfdt->full_fds_bits + cpy, 0, set);
+}
+
+/*
+ * Copy all file descriptors from the old table to the new, expanded table and
+ * clear the extra space. Called with the files spinlock held for write.
*/
static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
{
@@ -72,19 +95,9 @@ static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
cpy = ofdt->max_fds * sizeof(struct file *);
set = (nfdt->max_fds - ofdt->max_fds) * sizeof(struct file *);
memcpy(nfdt->fd, ofdt->fd, cpy);
- memset((char *)(nfdt->fd) + cpy, 0, set);
+ memset((char *)nfdt->fd + cpy, 0, set);
- cpy = ofdt->max_fds / BITS_PER_BYTE;
- set = (nfdt->max_fds - ofdt->max_fds) / BITS_PER_BYTE;
- memcpy(nfdt->open_fds, ofdt->open_fds, cpy);
- memset((char *)(nfdt->open_fds) + cpy, 0, set);
- memcpy(nfdt->close_on_exec, ofdt->close_on_exec, cpy);
- memset((char *)(nfdt->close_on_exec) + cpy, 0, set);
-
- cpy = BITBIT_SIZE(ofdt->max_fds);
- set = BITBIT_SIZE(nfdt->max_fds) - cpy;
- memcpy(nfdt->full_fds_bits, ofdt->full_fds_bits, cpy);
- memset(cpy+(char *)nfdt->full_fds_bits, 0, set);
+ copy_fd_bitmaps(nfdt, ofdt, ofdt->max_fds);
}
static struct fdtable * alloc_fdtable(unsigned int nr)
@@ -276,7 +289,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
{
struct files_struct *newf;
struct file **old_fds, **new_fds;
- int open_files, size, i;
+ int open_files, i;
struct fdtable *old_fdt, *new_fdt;
*errorp = -ENOMEM;
@@ -333,13 +346,11 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
open_files = count_open_files(old_fdt);
}
+ copy_fd_bitmaps(new_fdt, old_fdt, open_files);
+
old_fds = old_fdt->fd;
new_fds = new_fdt->fd;
- memcpy(new_fdt->open_fds, old_fdt->open_fds, open_files / 8);
- memcpy(new_fdt->close_on_exec, old_fdt->close_on_exec, open_files / 8);
- memcpy(new_fdt->full_fds_bits, old_fdt->full_fds_bits, BITBIT_SIZE(open_files));
-
for (i = open_files; i != 0; i--) {
struct file *f = *old_fds++;
if (f) {
@@ -357,19 +368,8 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
}
spin_unlock(&oldf->file_lock);
- /* compute the remainder to be cleared */
- size = (new_fdt->max_fds - open_files) * sizeof(struct file *);
-
- /* This is long word aligned thus could use a optimized version */
- memset(new_fds, 0, size);
-
- if (new_fdt->max_fds > open_files) {
- int left = (new_fdt->max_fds - open_files) / 8;
- int start = open_files / BITS_PER_LONG;
-
- memset(&new_fdt->open_fds[start], 0, left);
- memset(&new_fdt->close_on_exec[start], 0, left);
- }
+ /* clear the remainder */
+ memset(new_fds, 0, (new_fdt->max_fds - open_files) * sizeof(struct file *));
rcu_assign_pointer(newf->fdt, new_fdt);
--
2.6.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] vfs: clear remainder of 'full_fds_bits' in dup_fd()
2015-11-06 6:32 ` Eric Biggers
@ 2015-11-06 7:07 ` Linus Torvalds
0 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2015-11-06 7:07 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-fsdevel, Linux Kernel Mailing List, Al Viro, Eric Dumazet
On Thu, Nov 5, 2015 at 10:32 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
> Here's the revised patch:
Thanks, applied.
Linus
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-11-06 7:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-06 3:42 [PATCH] vfs: clear remainder of 'full_fds_bits' in dup_fd() Eric Biggers
2015-11-06 4:15 ` Linus Torvalds
2015-11-06 4:33 ` Eric Biggers
2015-11-06 6:32 ` Eric Biggers
2015-11-06 7:07 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox