From: Andrew Morton <akpm@osdl.org>
To: Chuck Lever <cel@citi.umich.edu>
Cc: cel@netapp.com, linux-kernel@vger.kernel.org, trond.myklebust@fys.uio.no
Subject: Re: [PATCH 3/6] nfs: Eliminate nfs_get_user_pages()
Date: Fri, 19 May 2006 11:17:34 -0700 [thread overview]
Message-ID: <20060519111734.523232b4.akpm@osdl.org> (raw)
In-Reply-To: <20060519180028.3244.7809.stgit@brahms.dsl.sfldmi.ameritech.net>
Chuck Lever <cel@netapp.com> wrote:
>
> Neil Brown observed that the kmalloc() in nfs_get_user_pages() is more
> likely to fail if the I/O is large enough to require the allocation of more
> than a single page to keep track of all the pinned pages in the user's
> buffer.
>
> Instead of tracking one large page array per dreq/iocb, track pages per
> nfs_read/write_data, just like the cached I/O path does. An array for
> pages is already allocated for us by nfs_readdata_alloc() (and the write
> and commit equivalents).
>
> This is also required for adding support for vectored I/O to the NFS direct
> I/O path.
>
> The original reason to pin the user buffer and allocate all the NFS data
> structures before trying to schedule I/O was to ensure all needed resources
> are allocated on the client before starting to send requests. This reduces
> the chance that resource exhaustion on the client will cause a short read
> or write.
>
> On the other hand, for an application making very large application I/O
> requests, this means that it will be nearly impossible for the application
> to make forward progress on a resource-limited client.
>
> Thus, moving the buffer pinning functionality into the I/O scheduling
> loops should be good for scalability. The next patch will do the same for
> NFS data structure allocation.
>
> +static void nfs_release_user_pages(struct page **pages, int npages)
> {
> - int result = -ENOMEM;
> - unsigned long page_count;
> - size_t array_size;
> -
> - page_count = (user_addr + size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> - page_count -= user_addr >> PAGE_SHIFT;
> -
> - array_size = (page_count * sizeof(struct page *));
> - *pages = kmalloc(array_size, GFP_KERNEL);
> - if (*pages) {
> - down_read(¤t->mm->mmap_sem);
> - result = get_user_pages(current, current->mm, user_addr,
> - page_count, (rw == READ), 0,
> - *pages, NULL);
> - up_read(¤t->mm->mmap_sem);
> - if (result != page_count) {
> - /*
> - * If we got fewer pages than expected from
> - * get_user_pages(), the user buffer runs off the
> - * end of a mapping; return EFAULT.
> - */
> - if (result >= 0) {
> - nfs_free_user_pages(*pages, result, 0);
> - result = -EFAULT;
> - } else
> - kfree(*pages);
> - *pages = NULL;
> - }
> - }
> - return result;
> + int i;
> + for (i = 0; i < npages; i++)
> + page_cache_release(pages[i]);
> }
If `npages' is negative, this does the right thing.
> + result = get_user_pages(current, current->mm, user_addr,
> + data->npages, 1, 0, data->pagevec, NULL);
> + up_read(¤t->mm->mmap_sem);
> + if (unlikely(result < data->npages))
> + goto out_err;
> ...
> +out_err:
> + nfs_release_user_pages(data->pagevec, result);
And `npages' can indeed be negative.
So. No bug there, but the code is a little unobvious and fragile - if
someone were to alter a type then subtle bugs would happen.
Perhaps
if (result > 0)
nfs_release_user_pages(...);
would be cleaner. Or at least a loud comment in nfs_release_user_pages().
next prev parent reply other threads:[~2006-05-19 18:15 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-19 17:56 [PATCH 0/6] Support scatter/gather I/O in NFS direct I/O path Chuck Lever
2006-05-19 18:00 ` [PATCH 1/6] nfs: "open code" the NFS direct write rescheduler Chuck Lever
2006-05-19 18:10 ` Andrew Morton
2006-05-19 18:37 ` Chuck Lever
2006-05-19 18:46 ` Andrew Morton
2006-05-19 18:56 ` Chuck Lever
2006-05-19 18:00 ` [PATCH 2/6] nfs: remove user_addr and user_count from nfs_direct_req Chuck Lever
2006-05-19 18:00 ` [PATCH 3/6] nfs: Eliminate nfs_get_user_pages() Chuck Lever
2006-05-19 18:17 ` Andrew Morton [this message]
2006-05-19 19:18 ` Chuck Lever
2006-05-19 18:00 ` [PATCH 4/6] nfs: alloc nfs_read/write_data as direct I/O is scheduled Chuck Lever
2006-05-19 18:00 ` [PATCH 5/6] nfs: check all iov segments for correct memory access rights Chuck Lever
2006-05-19 18:22 ` Andrew Morton
2006-05-19 18:46 ` Chuck Lever
2006-05-19 19:36 ` Chuck Lever
2006-05-19 20:07 ` Andrew Morton
2006-05-19 18:25 ` Badari Pulavarty
2006-05-22 11:27 ` Andi Kleen
2006-05-19 18:00 ` [PATCH 6/6] nfs: Support vector I/O throughout the NFS direct I/O path Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060519111734.523232b4.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=cel@citi.umich.edu \
--cc=cel@netapp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox