* get_user_pages() on an mmap()ed file allowed? What to do if 0 < get_user_pages(..., nr_pages, ...) < nr_pages?
@ 2009-08-03 15:23 Leon Woestenberg
2009-08-03 16:30 ` Hugh Dickins
0 siblings, 1 reply; 11+ messages in thread
From: Leon Woestenberg @ 2009-08-03 15:23 UTC (permalink / raw)
To: linux-kernel
Hello,
I have a PCI device driver performing DMA to a scattered user-space buffer.
Given a malloc()ed buffer, get_user_pages(..., buffer, nr_pages, ...)
always returns to requested number of pages and everything works as
expected.
So far so good.
Since that I changed userspace to mmap() a file, instead of
malloc()ing a buffer.
The mmap() in userspace works.
However, in the driver get_user_pages() starts to return less pages
than I requested, in an undeterministic fashion (most of the times I
get the expected number,
sometimes I get only a part of the requested pages).
Reading the get_user_pages() implementation dazzles me too much,
still. I wonder if I am violating the kernel API?
- is it allowed to have a PCI device DMA-read from memory pages, that
belong to a file mmap()'d by userspace?
- what are valid reasons for get_user_pages() to fail?
- what should a driver do when get_user_pages() returns less pages
than requested?
A snippet of the code:
my user space does:
int fd = open(filename, O_RDONLY);
assert(fd >= 0);
/* map the file in memory */
char *buffer = mmap(0, buffersize, PROT_READ, MAP_SHARED, fd, 0);
assert(buffer != MAP_FAILED);
/* advice sequential access */
int rc = madvise(buffer, buffersize, MADV_SEQUENTIAL);
assert(rc == 0);
my driver does:
const unsigned long first = (boe & PAGE_MASK) >> PAGE_SHIFT;
const unsigned long last = ((boe + count - 1) & PAGE_MASK) >> PAGE_SHIFT;
const int nr_pages = last - first + 1;
...
down_read(¤t->mm->mmap_sem);
rc = get_user_pages(current, current->mm, start & PAGE_MASK,
nr_pages, 0 /* do not write*/, 1 /* do force */, pages, NULL);
up_read(¤t->mm->mmap_sem);
BUG_ON(rc < nr_pages);
Thanks,
--
Leon
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_user_pages() on an mmap()ed file allowed? What to do if 0 < get_user_pages(..., nr_pages, ...) < nr_pages?
2009-08-03 15:23 get_user_pages() on an mmap()ed file allowed? What to do if 0 < get_user_pages(..., nr_pages, ...) < nr_pages? Leon Woestenberg
@ 2009-08-03 16:30 ` Hugh Dickins
2009-08-04 8:57 ` Leon Woestenberg
2009-08-04 9:18 ` Brice Goglin
0 siblings, 2 replies; 11+ messages in thread
From: Hugh Dickins @ 2009-08-03 16:30 UTC (permalink / raw)
To: Leon Woestenberg; +Cc: linux-kernel
On Mon, 3 Aug 2009, Leon Woestenberg wrote:
>
> I have a PCI device driver performing DMA to a scattered user-space buffer.
> Given a malloc()ed buffer, get_user_pages(..., buffer, nr_pages, ...)
> always returns to requested number of pages and everything works as
> expected.
> So far so good.
>
> Since that I changed userspace to mmap() a file, instead of
> malloc()ing a buffer.
> The mmap() in userspace works.
>
> However, in the driver get_user_pages() starts to return less pages
> than I requested, in an undeterministic fashion (most of the times I
> get the expected number,
> sometimes I get only a part of the requested pages).
>
> Reading the get_user_pages() implementation dazzles me too much,
> still. I wonder if I am violating the kernel API?
>
> - is it allowed to have a PCI device DMA-read from memory pages, that
> belong to a file mmap()'d by userspace?
Yes.
> - what are valid reasons for get_user_pages() to fail?
I'd hesitate to give a complete answer to that: but main reasons
would be SIGKILL, or running out of memory (-ENOMEM), or running
off the end of a mapping or mapped object, or no permission to it
(-EFAULT): with a short page count returned instead of error if
some pages were successfully gotten before hitting the error.
> - what should a driver do when get_user_pages() returns less pages
> than requested?
Probably put_page the pages gotten then report the surprise;
perhaps, before putting the pages gotten, try get_user_pages
on the next alone, to see what error code is returned for that.
Unless it's happy to work with fewer pages than requested,
in which case work with them and ignore the surprise.
>
>
> A snippet of the code:
>
>
> my user space does:
>
> int fd = open(filename, O_RDONLY);
> assert(fd >= 0);
>
> /* map the file in memory */
> char *buffer = mmap(0, buffersize, PROT_READ, MAP_SHARED, fd, 0);
> assert(buffer != MAP_FAILED);
>
> /* advice sequential access */
> int rc = madvise(buffer, buffersize, MADV_SEQUENTIAL);
> assert(rc == 0);
>
> my driver does:
>
> const unsigned long first = (boe & PAGE_MASK) >> PAGE_SHIFT;
> const unsigned long last = ((boe + count - 1) & PAGE_MASK) >> PAGE_SHIFT;
> const int nr_pages = last - first + 1;
> ...
> down_read(¤t->mm->mmap_sem);
> rc = get_user_pages(current, current->mm, start & PAGE_MASK,
> nr_pages, 0 /* do not write*/, 1 /* do force */, pages, NULL);
> up_read(¤t->mm->mmap_sem);
>
> BUG_ON(rc < nr_pages);
When that BUG triggers, is rc a positive number of pages,
or a negative error code - which? (or even 0, but it shouldn't be).
I assume from your Subject that you've already seen a positive number
of pages.
Code doesn't look wrong to me (except you shouldn't BUG), though I am
having to assume that buffer and boe and start are all the same address,
and count fits within buffersize; or at least that the range to which you
apply get_user_pages really does fit within the area you have mmap'ed.
(I'd advise against using 1 /* do force */, I don't think you need
that: the force is mysterious, and should only be called upon in direst
need. But it shouldn't actually be causing you any problem here.)
Is the file you have mmap'ed big enough? If it's not as long as the
last page you're trying to get_user_pages on, or gets truncated, then
indeed that will give -EFAULT or a short count - just as trying to
access the end of the mapping in userspace would give you SIGBUS.
Hugh
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_user_pages() on an mmap()ed file allowed? What to do if 0 < get_user_pages(..., nr_pages, ...) < nr_pages?
2009-08-03 16:30 ` Hugh Dickins
@ 2009-08-04 8:57 ` Leon Woestenberg
2009-08-04 9:50 ` KAMEZAWA Hiroyuki
2009-08-04 9:18 ` Brice Goglin
1 sibling, 1 reply; 11+ messages in thread
From: Leon Woestenberg @ 2009-08-04 8:57 UTC (permalink / raw)
To: Hugh Dickins; +Cc: linux-kernel
Hello ,
On Mon, Aug 3, 2009 at 6:30 PM, Hugh Dickins<hugh.dickins@tiscali.co.uk> wrote:
> On Mon, 3 Aug 2009, Leon Woestenberg wrote:
>>
>> - is it allowed to have a PCI device DMA-read from memory pages, that
>> belong to a file mmap()'d by userspace?
>
> Yes.
>
>> - what are valid reasons for get_user_pages() to fail?
>
> I'd hesitate to give a complete answer to that: but main reasons
> would be SIGKILL, or running out of memory (-ENOMEM), or running
> off the end of a mapping or mapped object, or no permission to it
> (-EFAULT): with a short page count returned instead of error if
> some pages were successfully gotten before hitting the error.
>
Next step will be to see where it bails out.
>> - what should a driver do when get_user_pages() returns less pages
>> than requested?
>
> Probably put_page the pages gotten then report the surprise;
> perhaps, before putting the pages gotten, try get_user_pages
> on the next alone, to see what error code is returned for that.
>
> Unless it's happy to work with fewer pages than requested,
> in which case work with them and ignore the surprise.
>
I expect a certain amount of data to be DMA'd from the PCI device to
the file mmap, so I'ld rather map the complete file before I start
DMA.
>> BUG_ON(rc < nr_pages);
>
> When that BUG triggers, is rc a positive number of pages,
> or a negative error code - which? (or even 0, but it shouldn't be).
> I assume from your Subject that you've already seen a positive number
> of pages.
>
Correct, undeterministicly it sometimes return the requested amount,
sometimes some part of it, or sometimes errors out.
> Code doesn't look wrong to me (except you shouldn't BUG), though I am
>
Correct. I took a snippet
> having to assume that buffer and boe and start are all the same address,
Correct.
> and count fits within buffersize; or at least that the range to which you
> apply get_user_pages really does fit within the area you have mmap'ed.
>
Yes, the file is large enough, a multiple of PAGE_SIZE, as is the mmap length.
> (I'd advise against using 1 /* do force */, I don't think you need
> that: the force is mysterious, and should only be called upon in direst
> need. But it shouldn't actually be causing you any problem here.)
>
Thanks. I found that force did not mean "force the mapping" but
rather enforce read plus write permissions.
> Is the file you have mmap'ed big enough? If it's not as long as the
> last page you're trying to get_user_pages on, or gets truncated, then
> indeed that will give -EFAULT or a short count - just as trying to
> access the end of the mapping in userspace would give you SIGBUS.
>
Thanks for all the pointers, I think I have all the conditions right.
I'll see what get_user_pages() internally fails on. It's one of the
API's that does not either fail or complete, it can say: "hey I did a
part of it, now buzz off!"
I still wonder if the function can at all be wrapped with a
all-or-nothing function?
Regards,
--
Leon
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_user_pages() on an mmap()ed file allowed? What to do if 0 < get_user_pages(..., nr_pages, ...) < nr_pages?
2009-08-03 16:30 ` Hugh Dickins
2009-08-04 8:57 ` Leon Woestenberg
@ 2009-08-04 9:18 ` Brice Goglin
2009-08-04 9:59 ` Leon Woestenberg
1 sibling, 1 reply; 11+ messages in thread
From: Brice Goglin @ 2009-08-04 9:18 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Leon Woestenberg, linux-kernel
Hugh Dickins wrote:
>> - what should a driver do when get_user_pages() returns less pages
>> than requested?
>
> Probably put_page the pages gotten then report the surprise;
> perhaps, before putting the pages gotten, try get_user_pages
> on the next alone, to see what error code is returned for that.
I wonder if we should change get_user_pages to store ERR_PTR(ret)
in page[i] when it fails to get page #i.
Brice
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_user_pages() on an mmap()ed file allowed? What to do if 0 < get_user_pages(..., nr_pages, ...) < nr_pages?
2009-08-04 8:57 ` Leon Woestenberg
@ 2009-08-04 9:50 ` KAMEZAWA Hiroyuki
2009-08-04 10:07 ` Leon Woestenberg
0 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-04 9:50 UTC (permalink / raw)
To: Leon Woestenberg; +Cc: Hugh Dickins, linux-kernel
On Tue, 4 Aug 2009 10:57:33 +0200
Leon Woestenberg <leon.woestenberg@gmail.com> wrote:
> >> - what should a driver do when get_user_pages() returns less pages
> >> than requested?
> >
> > Probably put_page the pages gotten then report the surprise;
> > perhaps, before putting the pages gotten, try get_user_pages
> > on the next alone, to see what error code is returned for that.
> >
> > Unless it's happy to work with fewer pages than requested,
> > in which case work with them and ignore the surprise.
> >
> I expect a certain amount of data to be DMA'd from the PCI device to
> the file mmap, so I'ld rather map the complete file before I start
> DMA.
>
I wonder.... If your device does DMA from-PCI-to-user, then,
rc = get_user_pages(current, current->mm, start & PAGE_MASK,
nr_pages, 0 /* do not write*/, 1 /* do force */, pages, NULL);
This is *write* access. isn't it ? (contents of pages got by this call
will be overwritten by DMA ?)
About (rc != nr_pages) case, I doubt there are difference between
mmap region (or size of file) and [start, start+count) passed to you device.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_user_pages() on an mmap()ed file allowed? What to do if 0 < get_user_pages(..., nr_pages, ...) < nr_pages?
2009-08-04 9:18 ` Brice Goglin
@ 2009-08-04 9:59 ` Leon Woestenberg
2009-08-04 10:39 ` [PATCH] mm: get_user_pages() stores ERR_PTR() in pages[i] on failure Brice Goglin
0 siblings, 1 reply; 11+ messages in thread
From: Leon Woestenberg @ 2009-08-04 9:59 UTC (permalink / raw)
To: Brice Goglin; +Cc: Hugh Dickins, linux-kernel
Hello Brice,
On Tue, Aug 4, 2009 at 11:18 AM, Brice Goglin<Brice.Goglin@inria.fr> wrote:
> Hugh Dickins wrote:
>>> - what should a driver do when get_user_pages() returns less pages
>>> than requested?
>>
>> Probably put_page the pages gotten then report the surprise;
>> perhaps, before putting the pages gotten, try get_user_pages
>> on the next alone, to see what error code is returned for that.
>
> I wonder if we should change get_user_pages to store ERR_PTR(ret)
> in page[i] when it fails to get page #i.
>
Yes, I would see that as an improvement in finding out why rc <
nr_pages, in case rc > 0.
Also I think it does not break existing users.
Regards,
--
Leon
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_user_pages() on an mmap()ed file allowed? What to do if 0 < get_user_pages(..., nr_pages, ...) < nr_pages?
2009-08-04 9:50 ` KAMEZAWA Hiroyuki
@ 2009-08-04 10:07 ` Leon Woestenberg
0 siblings, 0 replies; 11+ messages in thread
From: Leon Woestenberg @ 2009-08-04 10:07 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: Hugh Dickins, linux-kernel
Hello,
On Tue, Aug 4, 2009 at 11:50 AM, KAMEZAWA
Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 4 Aug 2009 10:57:33 +0200
> Leon Woestenberg <leon.woestenberg@gmail.com> wrote:
>
>
>> >> - what should a driver do when get_user_pages() returns less pages
>> >> than requested?
>> >
>> > Probably put_page the pages gotten then report the surprise;
>> > perhaps, before putting the pages gotten, try get_user_pages
>> > on the next alone, to see what error code is returned for that.
>> >
>> > Unless it's happy to work with fewer pages than requested,
>> > in which case work with them and ignore the surprise.
>> >
>> I expect a certain amount of data to be DMA'd from the PCI device to
>> the file mmap, so I'ld rather map the complete file before I start
>> DMA.
>>
> I wonder.... If your device does DMA from-PCI-to-user, then,
>
> rc = get_user_pages(current, current->mm, start & PAGE_MASK,
> nr_pages, 0 /* do not write*/, 1 /* do force */, pages, NULL);
>
> This is *write* access. isn't it ? (contents of pages got by this call
> will be overwritten by DMA ?)
>
>From the header:
int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start,
int len, int write, int force, struct page **pages, struct
vm_area_struct **vmas);
So with write being 0, I think it is read access to user space pages.
The actual code looks like this:
/* is the PCI device (DMA) writing to user space? */
to_user = !dir_to_dev;
...
/* to_user != 0 means read from device, write into user space buffer memory */
rc = get_user_pages(current, current->mm, (unsigned long)start &
PAGE_MASK, nr_pages, to_user,
> About (rc != nr_pages) case, I doubt there are difference between
> mmap region (or size of file) and [start, start+count) passed to you device.
>
I will tripple check this.
My current test is rather static (fixed size files I read from, fixed
mmap length etc) and rc varies wildly. As I said, I will check what
get_user_pages() fails on internally.
I will probably cook up a patch with Brice's idea (storing the error
code in the page[i], where i = rc).
Regards,
--
Leon
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] mm: get_user_pages() stores ERR_PTR() in pages[i] on failure
2009-08-04 9:59 ` Leon Woestenberg
@ 2009-08-04 10:39 ` Brice Goglin
2009-08-04 11:20 ` Leon Woestenberg
2009-08-04 12:00 ` Hugh Dickins
0 siblings, 2 replies; 11+ messages in thread
From: Brice Goglin @ 2009-08-04 10:39 UTC (permalink / raw)
To: Leon Woestenberg; +Cc: Brice Goglin, Hugh Dickins, linux-kernel
>> I wonder if we should change get_user_pages to store ERR_PTR(ret)
>> in page[i] when it fails to get page #i.
>>
> Yes, I would see that as an improvement in finding out why rc <
> nr_pages, in case rc > 0.
>
> Also I think it does not break existing users.
>
Only compile-tested (and not in the nommu case).
When get_user_pages() fails to get a non-first page, it returns
the number of successfully gotten pages. The caller has to call
get_user_pages() again on the failed page to figure out the
error code.
Store the error code with ERR_PTR() in pages[i] when we fail
to get page #i.
The arch specific get_user_pages_fast() do not need to be changed
since they revert to the main get_user_pages() on failure.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
diff --git a/mm/memory.c b/mm/memory.c
index aede2ce..cd4efa3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1217,6 +1217,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
int force = !!(flags & GUP_FLAGS_FORCE);
int ignore = !!(flags & GUP_FLAGS_IGNORE_VMA_PERMISSIONS);
int ignore_sigkill = !!(flags & GUP_FLAGS_IGNORE_SIGKILL);
+ int err = -EFAULT;
if (nr_pages <= 0)
return 0;
@@ -1243,7 +1244,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
/* user gate pages are read-only */
if (!ignore && write)
- return i ? : -EFAULT;
+ goto abort;
if (pg > TASK_SIZE)
pgd = pgd_offset_k(pg);
else
@@ -1253,11 +1254,11 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
BUG_ON(pud_none(*pud));
pmd = pmd_offset(pud, pg);
if (pmd_none(*pmd))
- return i ? : -EFAULT;
+ goto abort;
pte = pte_offset_map(pmd, pg);
if (pte_none(*pte)) {
pte_unmap(pte);
- return i ? : -EFAULT;
+ goto abort;
}
if (pages) {
struct page *page = vm_normal_page(gate_vma, start, *pte);
@@ -1277,7 +1278,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
if (!vma ||
(vma->vm_flags & (VM_IO | VM_PFNMAP)) ||
(!ignore && !(vm_flags & vma->vm_flags)))
- return i ? : -EFAULT;
+ goto abort;
if (is_vm_hugetlb_page(vma)) {
i = follow_hugetlb_page(mm, vma, pages, vmas,
@@ -1302,8 +1303,10 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
* we're only unlocking already resident/mapped pages.
*/
if (unlikely(!ignore_sigkill &&
- fatal_signal_pending(current)))
- return i ? i : -ERESTARTSYS;
+ fatal_signal_pending(current))) {
+ err = -ERESTARTSYS;
+ goto abort;
+ }
if (write)
foll_flags |= FOLL_WRITE;
@@ -1317,10 +1320,11 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
FAULT_FLAG_WRITE : 0);
if (ret & VM_FAULT_ERROR) {
- if (ret & VM_FAULT_OOM)
- return i ? i : -ENOMEM;
- else if (ret & VM_FAULT_SIGBUS)
- return i ? i : -EFAULT;
+ if (ret & VM_FAULT_OOM) {
+ err = -ENOMEM;
+ goto abort;
+ } else if (ret & VM_FAULT_SIGBUS)
+ goto abort;
BUG();
}
if (ret & VM_FAULT_MAJOR)
@@ -1346,8 +1350,10 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
cond_resched();
}
- if (IS_ERR(page))
- return i ? i : PTR_ERR(page);
+ if (IS_ERR(page)) {
+ err = PTR_ERR(page);
+ goto abort;
+ }
if (pages) {
pages[i] = page;
@@ -1362,6 +1368,11 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
} while (nr_pages && start < vma->vm_end);
} while (nr_pages);
return i;
+
+ abort:
+ if (pages)
+ pages[i] = ERR_PTR(err);
+ return i ? : err;
}
/**
diff --git a/mm/nommu.c b/mm/nommu.c
index 53cab10..5fe8083 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -212,6 +212,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
return i;
finish_or_fault:
+ if (pages)
+ pages[i] = ERR_PTR(-EFAULT);
return i ? : -EFAULT;
}
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: get_user_pages() stores ERR_PTR() in pages[i] on failure
2009-08-04 10:39 ` [PATCH] mm: get_user_pages() stores ERR_PTR() in pages[i] on failure Brice Goglin
@ 2009-08-04 11:20 ` Leon Woestenberg
2009-08-04 12:00 ` Hugh Dickins
1 sibling, 0 replies; 11+ messages in thread
From: Leon Woestenberg @ 2009-08-04 11:20 UTC (permalink / raw)
To: Brice Goglin; +Cc: Hugh Dickins, linux-kernel
Hello Brice,
On Tue, Aug 4, 2009 at 12:39 PM, Brice Goglin<Brice.Goglin@inria.fr> wrote:>> Only compile-tested (and not in the nommu case).>> @@ -1317,10 +1320,11 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,> FAULT_FLAG_WRITE : 0);>> if (ret & VM_FAULT_ERROR) {> - if (ret & VM_FAULT_OOM)> - return i ? i : -ENOMEM;> - else if (ret & VM_FAULT_SIGBUS)> - return i ? i : -EFAULT;> + if (ret & VM_FAULT_OOM) {> + err = -ENOMEM;> + goto abort;> + } else if (ret & VM_FAULT_SIGBUS)> + goto abort;
Minor style issue; when the if-block has brackets, the else(if)-blockshould also have brackets (even when it's a one-liner on the elseblock).
Looks good, I will go ahead and test it (back-port/patched against 2.6.30).
Regards,-- Leonÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: get_user_pages() stores ERR_PTR() in pages[i] on failure
2009-08-04 10:39 ` [PATCH] mm: get_user_pages() stores ERR_PTR() in pages[i] on failure Brice Goglin
2009-08-04 11:20 ` Leon Woestenberg
@ 2009-08-04 12:00 ` Hugh Dickins
2009-08-04 16:25 ` Leon Woestenberg
1 sibling, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2009-08-04 12:00 UTC (permalink / raw)
To: Brice Goglin; +Cc: Leon Woestenberg, linux-kernel
On Tue, 4 Aug 2009, Brice Goglin wrote:
>
> >> I wonder if we should change get_user_pages to store ERR_PTR(ret)
> >> in page[i] when it fails to get page #i.
> >>
> > Yes, I would see that as an improvement in finding out why rc <
> > nr_pages, in case rc > 0.
> >
> > Also I think it does not break existing users.
> >
>
> Only compile-tested (and not in the nommu case).
>
>
>
>
> When get_user_pages() fails to get a non-first page, it returns
> the number of successfully gotten pages. The caller has to call
> get_user_pages() again on the failed page to figure out the
> error code.
>
> Store the error code with ERR_PTR() in pages[i] when we fail
> to get page #i.
>
> The arch specific get_user_pages_fast() do not need to be changed
> since they revert to the main get_user_pages() on failure.
>
> Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
This is a nice idea, and it looks like a good patch to implement
it, and I wish get_user_pages() had done that from the start, but...
... some callers of get_user_pages() might expect it not to touch
entries in the page array beyond the fail point: for example, they
might start off with a zeroed array, then when they come to put_page
on the array afterwards, would skip any NULL entries found - without
having to have noted the returned number of pages. I don't think
that way of working is supported by any guarantee in documentation,
but it's not unreasonable to expect it to behave in that way.
If there were nothing intree which worked that way, then I'd
say let's go ahead with your nice patch. But I soon reached
drivers/gpu/drm/ttm/ttm_tt.c then drivers/gpu/drm/via/via_dmablit.c
which both appear to work in that way. I've not looked beyond them,
but I suspect those two cases amount to a NAK on your patch, sadly.
Of course we _could_ fix individual callers to work with it, but
for me they sound a warning that it's too late to change now.
Hugh
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: get_user_pages() stores ERR_PTR() in pages[i] on failure
2009-08-04 12:00 ` Hugh Dickins
@ 2009-08-04 16:25 ` Leon Woestenberg
0 siblings, 0 replies; 11+ messages in thread
From: Leon Woestenberg @ 2009-08-04 16:25 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Brice Goglin, linux-kernel
Hello,
thanks for looking along on this one.
On Tue, Aug 4, 2009 at 2:00 PM, Hugh Dickins<hugh.dickins@tiscali.co.uk> wrote:
> On Tue, 4 Aug 2009, Brice Goglin wrote:
>>
>> >> I wonder if we should change get_user_pages to store ERR_PTR(ret)
>> >> in page[i] when it fails to get page #i.
>> >>
>> > Yes, I would see that as an improvement in finding out why rc <
>> > nr_pages, in case rc > 0.
>> >
>> > Also I think it does not break existing users.
>> >
> ... some callers of get_user_pages() might expect it not to touch
> entries in the page array beyond the fail point: for example, they
> might start off with a zeroed array, then when they come to put_page
> on the array afterwards, would skip any NULL entries found - without
> having to have noted the returned number of pages. I don't think
> that way of working is supported by any guarantee in documentation,
> but it's not unreasonable to expect it to behave in that way.
>
Thanks for checking this -- obviously me thinking it would not break
existing users is false.
> Of course we _could_ fix individual callers to work with it, but
> for me they sound a warning that it's too late to change now.
>
The alternative seems to be to describe a best practise of calling
get_user_pages() on the failing page to find out why it failed.
(and hoping it does not return rc > 0 on the second call.)
A better function API would be to return the number of succesfully
gotten pages in a pointer
rc = get_user_pages2(, nr_pages ,..., &nr_pages_gotten);
where rc ==0 iff (nr_pages == nr_pages_gotten).
BTW, in my case I'm hitting the (ret & VM_FAULT_SIGBUS) case. I
suspect I have a multithreading concurrency race, as I have DMA in
both directions to a ring buffer set of files. Seems one over runs the
other, thus the file gets mapped in with different protection, as well
as conflicting DMA directions.
-EBADUSER.
--
Leon
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-08-04 16:25 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-03 15:23 get_user_pages() on an mmap()ed file allowed? What to do if 0 < get_user_pages(..., nr_pages, ...) < nr_pages? Leon Woestenberg
2009-08-03 16:30 ` Hugh Dickins
2009-08-04 8:57 ` Leon Woestenberg
2009-08-04 9:50 ` KAMEZAWA Hiroyuki
2009-08-04 10:07 ` Leon Woestenberg
2009-08-04 9:18 ` Brice Goglin
2009-08-04 9:59 ` Leon Woestenberg
2009-08-04 10:39 ` [PATCH] mm: get_user_pages() stores ERR_PTR() in pages[i] on failure Brice Goglin
2009-08-04 11:20 ` Leon Woestenberg
2009-08-04 12:00 ` Hugh Dickins
2009-08-04 16:25 ` Leon Woestenberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox