All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <ynorov@caviumnetworks.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: libc-alpha@sourceware.org, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	szabolcs.nagy@arm.com, heiko.carstens@de.ibm.com,
	cmetcalf@ezchip.com, philipp.tomsich@theobroma-systems.com,
	joseph@codesourcery.com, zhouchengming1@huawei.com,
	Prasun.Kapoor@caviumnetworks.com, agraf@suse.de,
	geert@linux-m68k.org, kilobyte@angband.pl,
	manuel.montezelo@gmail.com, pinskia@gmail.com,
	linyongting@huawei.com, klimov.linux@gmail.com,
	broonie@kernel.org, bamvor.zhangjian@huawei.com,
	linux-arm-kernel@lists.infradead.org, maxim.kuvyrkov@linaro.org,
	Nathan_Lynch@mentor.com, schwidefsky@de.ibm.com,
	davem@davemloft.net, christoph.muellner@theobroma-systems.com
Subject: Re: [Question] New mmap64 syscall?
Date: Wed, 7 Dec 2016 16:04:51 +0530	[thread overview]
Message-ID: <20161207103451.GA869@yury-N73SV> (raw)
In-Reply-To: <3014428.VXGdOARdm1@wuerfel>

On Tue, Dec 06, 2016 at 10:20:20PM +0100, Arnd Bergmann wrote:
> On Wednesday, December 7, 2016 12:24:40 AM CET Yury Norov wrote:
> > 3. Introduce new mmap64() syscall like this:
> > sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
> > (The pointer here because otherwise we have 7 args, if simply pass off_hi and
> > off_lo in registers.)
> 
> This wouldn't have to be a pair, just a pointer to a 64-bit number.
> 
> > With new 64-bit interface we can deprecate mmap2(), and generalize all
> > implementations in kernel.
> > 
> > I think we can discuss it because 64-bit is the default size for off_t 
> > in all new 32-bit architectures. So generic solution may take place.
> > 
> > The last question here is how important to support offsets bigger than
> > 2^44 on 32-bit machines in practice? It may be a case for ARM64 servers,
> > which are looking like main aarch64/ilp32 users. If no, we can leave
> > things as is, and just do nothing.
> 
> If there is a use case for larger than 16TB offsets, we should add
> the call on all architectures, probably using your approach 3. I don't
> think that we should treat it as anything special for arm64 though.

From this point of view, 16+TB offset is a matter of 16+TB storage,
and it's more than real. The other consideration to add it is that
we have 64-bit support for offsets in syscalls like sys_llseek().
So mmap64() will simply extend this support.

I can prepare this patch. Some implementation details I'd like to
clarify:
Syscall declaration:
SYSCALL_DEFINE6(mmap64, unsigned long, addr, unsigned long, len,
                unsigned long, prot, unsigned long, flags,
                unsigned long, fd, unsigned long long *, offset);

sys_mmap64() deprecates sys_mmap2(), and __ARCH_WANT_MMAP2 is
introduced to keep it enabled for all existing architectures.
All modern arches (aarch64/ilp32 is the first candidate) will have
mmap64() only. The example is set/getrlimit() or renameat() drop
patches (b0da6d44).
                                
On GLIBC side, __OFF_T_MATCHES_OFF64_t will wire mmap() from
linux/generic/wordsize32/mmap.c to mmap64() from linux/mmap64.c. 

mmap64() will first try __NR_mmap64, and if not defined, or ENOSYS
is returned, __NR_mmap2 will be called. This is to let userspace that
supports both mmap2() and mmap64() have full 64-bit offset support, not
44-bit one.

For __NR_mmap2 case, I'd also add the check against offsets more than
2^44, and set errno to EOVERFLOW in that case.

Any thoughts?

Yury.

WARNING: multiple messages have this Message-ID (diff)
From: Yury Norov <ynorov@caviumnetworks.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: libc-alpha@sourceware.org, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	szabolcs.nagy@arm.com, heiko.carstens@de.ibm.com,
	cmetcalf@ezchip.com, philipp.tomsich@theobroma-systems.com,
	joseph@codesourcery.com, zhouchengming1@huawei.com,
	Prasun.Kapoor@caviumnetworks.com, agraf@suse.de,
	geert@linux-m68k.org, kilobyte@angband.pl,
	manuel.montezelo@gmail.com, pinskia@gmail.com,
	linyongting@huawei.com, klimov.linux@gmail.com,
	broonie@kernel.org, bamvor.zhangjian@huawei.com,
	linux-arm-kernel@lists.infradead.org, maxim.kuvyrkov@linaro.org,
	Nathan_Lynch@mentor.com, schwidefsky@de.ibm.com,
	davem@davemloft.net, christoph.muellner@theobroma-systems.com
Subject: Re: [Question] New mmap64 syscall?
Date: Wed, 7 Dec 2016 16:04:51 +0530	[thread overview]
Message-ID: <20161207103451.GA869@yury-N73SV> (raw)
Message-ID: <20161207103451.9g-2MDc_GzPD13XfmN_uOYjR-VI7TQag42rvl7IhOEU@z> (raw)
In-Reply-To: <3014428.VXGdOARdm1@wuerfel>

On Tue, Dec 06, 2016 at 10:20:20PM +0100, Arnd Bergmann wrote:
> On Wednesday, December 7, 2016 12:24:40 AM CET Yury Norov wrote:
> > 3. Introduce new mmap64() syscall like this:
> > sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
> > (The pointer here because otherwise we have 7 args, if simply pass off_hi and
> > off_lo in registers.)
> 
> This wouldn't have to be a pair, just a pointer to a 64-bit number.
> 
> > With new 64-bit interface we can deprecate mmap2(), and generalize all
> > implementations in kernel.
> > 
> > I think we can discuss it because 64-bit is the default size for off_t 
> > in all new 32-bit architectures. So generic solution may take place.
> > 
> > The last question here is how important to support offsets bigger than
> > 2^44 on 32-bit machines in practice? It may be a case for ARM64 servers,
> > which are looking like main aarch64/ilp32 users. If no, we can leave
> > things as is, and just do nothing.
> 
> If there is a use case for larger than 16TB offsets, we should add
> the call on all architectures, probably using your approach 3. I don't
> think that we should treat it as anything special for arm64 though.

WARNING: multiple messages have this Message-ID (diff)
From: ynorov@caviumnetworks.com (Yury Norov)
To: linux-arm-kernel@lists.infradead.org
Subject: [Question] New mmap64 syscall?
Date: Wed, 7 Dec 2016 16:04:51 +0530	[thread overview]
Message-ID: <20161207103451.GA869@yury-N73SV> (raw)
In-Reply-To: <3014428.VXGdOARdm1@wuerfel>

On Tue, Dec 06, 2016 at 10:20:20PM +0100, Arnd Bergmann wrote:
> On Wednesday, December 7, 2016 12:24:40 AM CET Yury Norov wrote:
> > 3. Introduce new mmap64() syscall like this:
> > sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
> > (The pointer here because otherwise we have 7 args, if simply pass off_hi and
> > off_lo in registers.)
> 
> This wouldn't have to be a pair, just a pointer to a 64-bit number.
> 
> > With new 64-bit interface we can deprecate mmap2(), and generalize all
> > implementations in kernel.
> > 
> > I think we can discuss it because 64-bit is the default size for off_t 
> > in all new 32-bit architectures. So generic solution may take place.
> > 
> > The last question here is how important to support offsets bigger than
> > 2^44 on 32-bit machines in practice? It may be a case for ARM64 servers,
> > which are looking like main aarch64/ilp32 users. If no, we can leave
> > things as is, and just do nothing.
> 
> If there is a use case for larger than 16TB offsets, we should add
> the call on all architectures, probably using your approach 3. I don't
> think that we should treat it as anything special for arm64 though.

>From this point of view, 16+TB offset is a matter of 16+TB storage,
and it's more than real. The other consideration to add it is that
we have 64-bit support for offsets in syscalls like sys_llseek().
So mmap64() will simply extend this support.

I can prepare this patch. Some implementation details I'd like to
clarify:
Syscall declaration:
SYSCALL_DEFINE6(mmap64, unsigned long, addr, unsigned long, len,
                unsigned long, prot, unsigned long, flags,
                unsigned long, fd, unsigned long long *, offset);

sys_mmap64() deprecates sys_mmap2(), and __ARCH_WANT_MMAP2 is
introduced to keep it enabled for all existing architectures.
All modern arches (aarch64/ilp32 is the first candidate) will have
mmap64() only. The example is set/getrlimit() or renameat() drop
patches (b0da6d44).
                                
On GLIBC side, __OFF_T_MATCHES_OFF64_t will wire mmap() from
linux/generic/wordsize32/mmap.c to mmap64() from linux/mmap64.c. 

mmap64() will first try __NR_mmap64, and if not defined, or ENOSYS
is returned, __NR_mmap2 will be called. This is to let userspace that
supports both mmap2() and mmap64() have full 64-bit offset support, not
44-bit one.

For __NR_mmap2 case, I'd also add the check against offsets more than
2^44, and set errno to EOVERFLOW in that case.

Any thoughts?

Yury.

WARNING: multiple messages have this Message-ID (diff)
From: Yury Norov <ynorov@caviumnetworks.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: <libc-alpha@sourceware.org>, <linux-arch@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	<szabolcs.nagy@arm.com>, <heiko.carstens@de.ibm.com>,
	<cmetcalf@ezchip.com>, <philipp.tomsich@theobroma-systems.com>,
	<joseph@codesourcery.com>, <zhouchengming1@huawei.com>,
	<Prasun.Kapoor@caviumnetworks.com>, <agraf@suse.de>,
	<geert@linux-m68k.org>, <kilobyte@angband.pl>,
	<manuel.montezelo@gmail.com>, <pinskia@gmail.com>,
	<linyongting@huawei.com>, <klimov.linux@gmail.com>,
	<broonie@kernel.org>, <bamvor.zhangjian@huawei.com>,
	<linux-arm-kernel@lists.infradead.org>,
	<maxim.kuvyrkov@linaro.org>, <Nathan_Lynch@mentor.com>,
	<schwidefsky@de.ibm.com>, <davem@davemloft.net>,
	<christoph.muellner@theobroma-systems.com>
Subject: Re: [Question] New mmap64 syscall?
Date: Wed, 7 Dec 2016 16:04:51 +0530	[thread overview]
Message-ID: <20161207103451.GA869@yury-N73SV> (raw)
In-Reply-To: <3014428.VXGdOARdm1@wuerfel>

On Tue, Dec 06, 2016 at 10:20:20PM +0100, Arnd Bergmann wrote:
> On Wednesday, December 7, 2016 12:24:40 AM CET Yury Norov wrote:
> > 3. Introduce new mmap64() syscall like this:
> > sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
> > (The pointer here because otherwise we have 7 args, if simply pass off_hi and
> > off_lo in registers.)
> 
> This wouldn't have to be a pair, just a pointer to a 64-bit number.
> 
> > With new 64-bit interface we can deprecate mmap2(), and generalize all
> > implementations in kernel.
> > 
> > I think we can discuss it because 64-bit is the default size for off_t 
> > in all new 32-bit architectures. So generic solution may take place.
> > 
> > The last question here is how important to support offsets bigger than
> > 2^44 on 32-bit machines in practice? It may be a case for ARM64 servers,
> > which are looking like main aarch64/ilp32 users. If no, we can leave
> > things as is, and just do nothing.
> 
> If there is a use case for larger than 16TB offsets, we should add
> the call on all architectures, probably using your approach 3. I don't
> think that we should treat it as anything special for arm64 though.

>From this point of view, 16+TB offset is a matter of 16+TB storage,
and it's more than real. The other consideration to add it is that
we have 64-bit support for offsets in syscalls like sys_llseek().
So mmap64() will simply extend this support.

I can prepare this patch. Some implementation details I'd like to
clarify:
Syscall declaration:
SYSCALL_DEFINE6(mmap64, unsigned long, addr, unsigned long, len,
                unsigned long, prot, unsigned long, flags,
                unsigned long, fd, unsigned long long *, offset);

sys_mmap64() deprecates sys_mmap2(), and __ARCH_WANT_MMAP2 is
introduced to keep it enabled for all existing architectures.
All modern arches (aarch64/ilp32 is the first candidate) will have
mmap64() only. The example is set/getrlimit() or renameat() drop
patches (b0da6d44).
                                
On GLIBC side, __OFF_T_MATCHES_OFF64_t will wire mmap() from
linux/generic/wordsize32/mmap.c to mmap64() from linux/mmap64.c. 

mmap64() will first try __NR_mmap64, and if not defined, or ENOSYS
is returned, __NR_mmap2 will be called. This is to let userspace that
supports both mmap2() and mmap64() have full 64-bit offset support, not
44-bit one.

For __NR_mmap2 case, I'd also add the check against offsets more than
2^44, and set errno to EOVERFLOW in that case.

Any thoughts?

Yury.

  reply	other threads:[~2016-12-07 10:34 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-06 18:54 [Question] New mmap64 syscall? Yury Norov
2016-12-06 18:54 ` Yury Norov
2016-12-06 18:54 ` Yury Norov
2016-12-06 21:20 ` Arnd Bergmann
2016-12-06 21:20   ` Arnd Bergmann
2016-12-07 10:34   ` Yury Norov [this message]
2016-12-07 10:34     ` Yury Norov
2016-12-07 10:34     ` Yury Norov
2016-12-07 10:34     ` Yury Norov
2016-12-07 11:07     ` Dr. Philipp Tomsich
2016-12-07 11:07       ` Dr. Philipp Tomsich
2016-12-07 12:39       ` Yury Norov
2016-12-07 12:39         ` Yury Norov
2016-12-07 16:32         ` Catalin Marinas
2016-12-07 16:32           ` Catalin Marinas
2016-12-07 16:32           ` Catalin Marinas
2016-12-07 16:43           ` Dr. Philipp Tomsich
2016-12-07 16:43             ` Dr. Philipp Tomsich
2016-12-07 16:43             ` Dr. Philipp Tomsich
2016-12-07 21:30             ` Arnd Bergmann
2016-12-07 21:30               ` Arnd Bergmann
2016-12-07 21:30               ` Arnd Bergmann
2016-12-10  9:10               ` Pavel Machek
2016-12-10  9:10                 ` Pavel Machek
2016-12-10  9:10                 ` Pavel Machek
2016-12-10  9:21                 ` Pavel Machek
2016-12-10  9:21                   ` Pavel Machek
2016-12-10  9:21                   ` Pavel Machek
2016-12-11 12:56                   ` Yury Norov
2016-12-11 12:56                     ` Yury Norov
2016-12-11 12:56                     ` Yury Norov
2016-12-11 12:56                     ` Yury Norov
2016-12-11 12:56                     ` [PATCH 1/3] mm: move argument checkers of mmap_pgoff() to separated routine Yury Norov
2016-12-11 12:56                       ` Yury Norov
2016-12-11 12:56                       ` Yury Norov
2016-12-11 12:56                       ` Yury Norov
2016-12-11 12:56                     ` [PATCH 2/3] sys_mmap64() Yury Norov
2016-12-11 12:56                       ` Yury Norov
2016-12-11 12:56                       ` Yury Norov
2016-12-11 12:56                       ` Yury Norov
2016-12-11 14:48                       ` kbuild test robot
2016-12-11 14:48                         ` kbuild test robot
2016-12-11 14:48                         ` kbuild test robot
2016-12-11 14:48                         ` kbuild test robot
2016-12-11 14:56                       ` kbuild test robot
2016-12-11 14:56                         ` kbuild test robot
2016-12-11 14:56                         ` kbuild test robot
2016-12-11 14:56                         ` kbuild test robot
2016-12-11 12:56                     ` [PATCH 3/3] mm: make pagoff_t type 64-bit Yury Norov
2016-12-11 12:56                       ` Yury Norov
2016-12-11 12:56                       ` Yury Norov
2016-12-11 12:56                       ` Yury Norov
2016-12-11 13:31                       ` kbuild test robot
2016-12-11 13:31                         ` kbuild test robot
2016-12-11 13:31                         ` kbuild test robot
2016-12-11 13:31                         ` kbuild test robot
2016-12-11 13:41                       ` kbuild test robot
2016-12-11 13:41                         ` kbuild test robot
2016-12-11 13:41                         ` kbuild test robot
2016-12-11 13:41                         ` kbuild test robot
2016-12-11 14:59                       ` Arnd Bergmann
2016-12-11 14:59                         ` Arnd Bergmann
2016-12-11 14:59                         ` Arnd Bergmann
2016-12-16 10:55                         ` Yury Norov
2016-12-16 10:55                           ` Yury Norov
2016-12-16 10:55                           ` Yury Norov
2016-12-16 10:55                           ` Yury Norov
2016-12-16 11:02                           ` Arnd Bergmann
2016-12-16 11:02                             ` Arnd Bergmann
2016-12-16 11:02                             ` Arnd Bergmann
2016-12-18  9:23                           ` Christoph Hellwig
2016-12-18  9:23                             ` Christoph Hellwig
2016-12-18  9:23                             ` Christoph Hellwig
2016-12-07 13:23 ` [Question] New mmap64 syscall? Florian Weimer
2016-12-07 13:23   ` Florian Weimer
2016-12-07 15:48   ` Yury Norov
2016-12-07 15:48     ` Yury Norov
2016-12-07 15:48     ` Yury Norov
2016-12-07 15:48     ` Yury Norov
2016-12-08 15:47     ` Florian Weimer
2016-12-08 15:47       ` Florian Weimer
2017-01-03 20:54       ` Pavel Machek
2017-01-03 20:54         ` Pavel Machek
2017-01-03 20:54         ` Pavel Machek
2017-01-12 16:13         ` Florian Weimer
2017-01-12 16:13           ` Florian Weimer
2017-01-12 21:51           ` Pavel Machek
2017-01-12 21:51             ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161207103451.GA869@yury-N73SV \
    --to=ynorov@caviumnetworks.com \
    --cc=Nathan_Lynch@mentor.com \
    --cc=Prasun.Kapoor@caviumnetworks.com \
    --cc=agraf@suse.de \
    --cc=arnd@arndb.de \
    --cc=bamvor.zhangjian@huawei.com \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=christoph.muellner@theobroma-systems.com \
    --cc=cmetcalf@ezchip.com \
    --cc=davem@davemloft.net \
    --cc=geert@linux-m68k.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=joseph@codesourcery.com \
    --cc=kilobyte@angband.pl \
    --cc=klimov.linux@gmail.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linyongting@huawei.com \
    --cc=manuel.montezelo@gmail.com \
    --cc=maxim.kuvyrkov@linaro.org \
    --cc=philipp.tomsich@theobroma-systems.com \
    --cc=pinskia@gmail.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=szabolcs.nagy@arm.com \
    --cc=zhouchengming1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.