From: ynorov@caviumnetworks.com (Yury Norov)
To: linux-arm-kernel@lists.infradead.org
Subject: [Question] New mmap64 syscall?
Date: Wed, 7 Dec 2016 00:24:40 +0530 [thread overview]
Message-ID: <20161206185440.GA4654@yury-N73SV> (raw)
Hi all,
(Sorry if there is similar discussion, and I missed it. I didn't
find something in LKML in last half a year.)
In aarch64/ilp32 discussion Catalin wondered why we don't pass offset
in mmap() as 64-bit value (in 2 registers if needed). Looking at kernel
code I found that there's no generic interface for it. But almost all
architectures provide their own implementations, like this:
SYSCALL_DEFINE6(mips_mmap, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags, unsigned long,
fd, off_t, offset)
{
unsigned long result;
result = -EINVAL;
if (offset & ~PAGE_MASK)
goto out;
result = sys_mmap_pgoff(addr, len, prot, flags, fd, offset >> PAGE_SHIFT);
out:
return result;
}
On glibc side things are even worse. There's no mmap() implementation
that allows to pass 64-bit offset in 32-bit architecture. mmap64() which
is supposed to do this is simply broken:
void *
__mmap64 (void *addr, size_t len, int prot, int flags, int fd, off64_t
offset)
{
[...]
void *result;
result = (void *) INLINE_SYSCALL (mmap2, 6, addr,
len, prot, flags, fd,
(off_t) (offset >> page_shift));
return result;
}
It explicitly declares offset as 64-bit value, but casts it to 32-bit
before passing to the kernel, which is wrong for me. Even if arch has
64-bit off_t, like aarch64/ilp32, the cast will take place because
offset is passed in a single register, which is 32-bit.
I see 3 solutions for my problem:
1. Reuse aarch64/lp64 mmap code for ilp32 in glibc, but wrap offset with
SYSCALL_LL64() macro - which converts offset to the pair for 32-bit
ports. This is simple but local solution. And most probably it's enough.
2. Add new flag to mmap, like MAP_OFFSET_IN_PAIR. This will also work.
The problem here is that there are too much arches that implement
their custom sys_mmap2(). And, of course, this type of flags is
looking ugly.
3. Introduce new mmap64() syscall like this:
sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
(The pointer here because otherwise we have 7 args, if simply pass off_hi and
off_lo in registers.)
With new 64-bit interface we can deprecate mmap2(), and generalize all
implementations in kernel.
I think we can discuss it because 64-bit is the default size for off_t
in all new 32-bit architectures. So generic solution may take place.
The last question here is how important to support offsets bigger than
2^44 on 32-bit machines in practice? It may be a case for ARM64 servers,
which are looking like main aarch64/ilp32 users. If no, we can leave
things as is, and just do nothing.
Yury
On Mon, Dec 05, 2016 at 05:12:43PM +0000, Catalin Marinas wrote:
> On Fri, Oct 21, 2016 at 11:33:10PM +0300, Yury Norov wrote:
> > off_t is passed in register pair just like in aarch32.
> > In this patch corresponding aarch32 handlers are shared to
> > ilp32 code.
> [...]
> > +/*
> > + * Note: off_4k (w5) is always in units of 4K. If we can't do the
> > + * requested offset because it is not page-aligned, we return -EINVAL.
> > + */
> > +ENTRY(compat_sys_mmap2_wrapper)
> > +#if PAGE_SHIFT > 12
> > + tst w5, #~PAGE_MASK >> 12
> > + b.ne 1f
> > + lsr w5, w5, #PAGE_SHIFT - 12
> > +#endif
> > + b sys_mmap_pgoff
> > +1: mov x0, #-EINVAL
> > + ret
> > +ENDPROC(compat_sys_mmap2_wrapper)
>
> For compat sys_mmap2, the pgoff argument is in multiples of 4K. This was
> traditionally used for architectures where off_t is 32-bit to allow
> mapping files to 2^44.
>
> Since off_t is 64-bit with AArch64/ILP32, should we just pass the off_t
> as a 64-bit value in two different registers (w5 and w6)?
next reply other threads:[~2016-12-06 18:54 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-06 18:54 Yury Norov [this message]
2016-12-06 21:20 ` [Question] New mmap64 syscall? Arnd Bergmann
2016-12-07 10:34 ` Yury Norov
2016-12-07 11:07 ` Dr. Philipp Tomsich
2016-12-07 12:39 ` Yury Norov
2016-12-07 16:32 ` Catalin Marinas
2016-12-07 16:43 ` Dr. Philipp Tomsich
2016-12-07 21:30 ` Arnd Bergmann
2016-12-10 9:10 ` Pavel Machek
2016-12-10 9:21 ` Pavel Machek
2016-12-11 12:56 ` Yury Norov
2016-12-11 12:56 ` [PATCH 1/3] mm: move argument checkers of mmap_pgoff() to separated routine Yury Norov
2016-12-11 12:56 ` [PATCH 2/3] sys_mmap64() Yury Norov
2016-12-11 14:48 ` kbuild test robot
2016-12-11 14:56 ` kbuild test robot
2016-12-11 12:56 ` [PATCH 3/3] mm: make pagoff_t type 64-bit Yury Norov
2016-12-11 13:31 ` kbuild test robot
2016-12-11 13:41 ` kbuild test robot
2016-12-11 14:59 ` Arnd Bergmann
2016-12-16 10:55 ` Yury Norov
2016-12-16 11:02 ` Arnd Bergmann
2016-12-18 9:23 ` Christoph Hellwig
2016-12-07 13:23 ` [Question] New mmap64 syscall? Florian Weimer
2016-12-07 15:48 ` Yury Norov
2016-12-08 15:47 ` Florian Weimer
2017-01-03 20:54 ` Pavel Machek
2017-01-12 16:13 ` Florian Weimer
2017-01-12 21:51 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161206185440.GA4654@yury-N73SV \
--to=ynorov@caviumnetworks.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox