From: Chuck Lever <cel@citi.umich.edu>
To: Andrew Morton <akpm@osdl.org>
Cc: cel@netapp.com, linux-kernel@vger.kernel.org, trond.myklebust@fys.uio.no
Subject: Re: [PATCH 3/6] nfs: Eliminate nfs_get_user_pages()
Date: Fri, 19 May 2006 15:18:04 -0400 [thread overview]
Message-ID: <446E19EC.1070902@citi.umich.edu> (raw)
In-Reply-To: <20060519111734.523232b4.akpm@osdl.org>
Andrew Morton wrote:
> Chuck Lever <cel@netapp.com> wrote:
>> Neil Brown observed that the kmalloc() in nfs_get_user_pages() is more
>> likely to fail if the I/O is large enough to require the allocation of more
>> than a single page to keep track of all the pinned pages in the user's
>> buffer.
>>
>> Instead of tracking one large page array per dreq/iocb, track pages per
>> nfs_read/write_data, just like the cached I/O path does. An array for
>> pages is already allocated for us by nfs_readdata_alloc() (and the write
>> and commit equivalents).
>>
>> This is also required for adding support for vectored I/O to the NFS direct
>> I/O path.
>>
>> The original reason to pin the user buffer and allocate all the NFS data
>> structures before trying to schedule I/O was to ensure all needed resources
>> are allocated on the client before starting to send requests. This reduces
>> the chance that resource exhaustion on the client will cause a short read
>> or write.
>>
>> On the other hand, for an application making very large application I/O
>> requests, this means that it will be nearly impossible for the application
>> to make forward progress on a resource-limited client.
>>
>> Thus, moving the buffer pinning functionality into the I/O scheduling
>> loops should be good for scalability. The next patch will do the same for
>> NFS data structure allocation.
>>
>> +static void nfs_release_user_pages(struct page **pages, int npages)
>> {
>> - int result = -ENOMEM;
>> - unsigned long page_count;
>> - size_t array_size;
>> -
>> - page_count = (user_addr + size + PAGE_SIZE - 1) >> PAGE_SHIFT;
>> - page_count -= user_addr >> PAGE_SHIFT;
>> -
>> - array_size = (page_count * sizeof(struct page *));
>> - *pages = kmalloc(array_size, GFP_KERNEL);
>> - if (*pages) {
>> - down_read(¤t->mm->mmap_sem);
>> - result = get_user_pages(current, current->mm, user_addr,
>> - page_count, (rw == READ), 0,
>> - *pages, NULL);
>> - up_read(¤t->mm->mmap_sem);
>> - if (result != page_count) {
>> - /*
>> - * If we got fewer pages than expected from
>> - * get_user_pages(), the user buffer runs off the
>> - * end of a mapping; return EFAULT.
>> - */
>> - if (result >= 0) {
>> - nfs_free_user_pages(*pages, result, 0);
>> - result = -EFAULT;
>> - } else
>> - kfree(*pages);
>> - *pages = NULL;
>> - }
>> - }
>> - return result;
>> + int i;
>> + for (i = 0; i < npages; i++)
>> + page_cache_release(pages[i]);
>> }
>
> If `npages' is negative, this does the right thing.
>
>> + result = get_user_pages(current, current->mm, user_addr,
>> + data->npages, 1, 0, data->pagevec, NULL);
>> + up_read(¤t->mm->mmap_sem);
>> + if (unlikely(result < data->npages))
>> + goto out_err;
>> ...
>> +out_err:
>> + nfs_release_user_pages(data->pagevec, result);
>
> And `npages' can indeed be negative.
I fixed this by making all of these an "unsigned long".
get_user_pages() returns an unsigned long result, so all these
comparisons should always work correctly.
nfs_count_pages() now also returns an unsigned long, but I don't see how
it is possible for it to compute a negative value.
> So. No bug there, but the code is a little unobvious and fragile - if
> someone were to alter a type then subtle bugs would happen.
>
> Perhaps
>
> if (result > 0)
> nfs_release_user_pages(...);
>
> would be cleaner. Or at least a loud comment in nfs_release_user_pages().
--
corporate: cel at netapp dot com
personal: chucklever at bigfoot dot com
next prev parent reply other threads:[~2006-05-19 19:18 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-19 17:56 [PATCH 0/6] Support scatter/gather I/O in NFS direct I/O path Chuck Lever
2006-05-19 18:00 ` [PATCH 1/6] nfs: "open code" the NFS direct write rescheduler Chuck Lever
2006-05-19 18:10 ` Andrew Morton
2006-05-19 18:37 ` Chuck Lever
2006-05-19 18:46 ` Andrew Morton
2006-05-19 18:56 ` Chuck Lever
2006-05-19 18:00 ` [PATCH 2/6] nfs: remove user_addr and user_count from nfs_direct_req Chuck Lever
2006-05-19 18:00 ` [PATCH 3/6] nfs: Eliminate nfs_get_user_pages() Chuck Lever
2006-05-19 18:17 ` Andrew Morton
2006-05-19 19:18 ` Chuck Lever [this message]
2006-05-19 18:00 ` [PATCH 4/6] nfs: alloc nfs_read/write_data as direct I/O is scheduled Chuck Lever
2006-05-19 18:00 ` [PATCH 5/6] nfs: check all iov segments for correct memory access rights Chuck Lever
2006-05-19 18:22 ` Andrew Morton
2006-05-19 18:46 ` Chuck Lever
2006-05-19 19:36 ` Chuck Lever
2006-05-19 20:07 ` Andrew Morton
2006-05-19 18:25 ` Badari Pulavarty
2006-05-22 11:27 ` Andi Kleen
2006-05-19 18:00 ` [PATCH 6/6] nfs: Support vector I/O throughout the NFS direct I/O path Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=446E19EC.1070902@citi.umich.edu \
--to=cel@citi.umich.edu \
--cc=akpm@osdl.org \
--cc=cel@netapp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox