From: Dan Rosenberg <drosenberg@vsecurity.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>,
netdev@vger.kernel.org, jon.maloy@ericsson.com,
allan.stephens@windriver.com
Subject: Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
Date: Fri, 29 Oct 2010 10:00:20 -0400 [thread overview]
Message-ID: <1288360820.2092.34.camel@dan> (raw)
In-Reply-To: <AANLkTinAzvfrRHwW-w-Ppdu7LWd7nGTa4EWP=e6pKV2F@mail.gmail.com>
While you guys are at it, you might consider preventing sendto(), etc.
calls from requesting >= 2GB data in one go. Several families have no
restrictions on total size (or even worse, assign the size to a signed
int type and then do a signed comparison as a check). This can result
in all kinds of ugliness when allocating sk_buffs based on that size,
some of which result in kernel panics (due to bad sk_buff tail position)
or heap corruption.
If you'd rather I dig up specific examples, I can do that as well, but I
think making changes to core code to protect individual modules from
their own inevitable stupid decisions is the best choice.
-Dan
On Thu, 2010-10-28 at 23:40 -0700, Linus Torvalds wrote:
> Oh, btw, noticed another small detail..
>
> I don't know if this matters, but the regular read/write routines
> don't actually use INT_MAX as the limit, but instead a "maximally
> page-aligned value that fits in an int":
>
> #define MAX_RW_COUNT (INT_MAX & PAGE_CACHE_MASK)
>
> because the code does _not_ want to turn a nice set of huge
> page-aligned big writes into a write of an odd number (2GB-1).
>
> This may not make much of a difference to networking - you guys are
> already used to working with odd sizes like 1500 bytes of data payload
> per packet etc. Most regular filesystems are much more sensitive to
> things like block (and particularly page-cache sized) boundaries
> because of the vagaries of disk and cache granularities. But MAX_INT
> is a _really_ odd size, and things like csum_and_copy still tends to
> want to get things at least word-aligned, no? And if nothing else, the
> memory copies tend to be better with cacheline boundaries.
>
> It would be sad if a 4GB aligned write turns into
> - one 2GB-1 aligned write
> - one pessimally unaligned 2G-1 write where every read from user
> space is unaligned
> - finally a single 2-byte write.
>
> I suspect it would be better off using the same kind of (MAX_INT &
> PAGE_CACHE_MASK) logic - that 4GB write would still get split into
> three partial writes (and _lots_ of packets ;), but at least they'd
> all be word-aligned.
>
> Does it matter? I dunno. Once you do 2GB+ writes, these kinds of small
> details may be totally hidden in all the noise.
>
> Linus
>
> On Thu, Oct 28, 2010 at 11:22 AM, David Miller <davem@davemloft.net> wrote:
> >
> > This helps protect us from overflow issues down in the
> > individual protocol sendmsg/recvmsg handlers. Once
> > we hit INT_MAX we truncate out the rest of the iovec
> > by setting the iov_len members to zero.
> >
> > This works because:
> >
> > 1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
> > writes are allowed and the application will just continue
> > with another write to send the rest of the data.
> >
> > 2) For datagram oriented sockets, where there must be a
> > one-to-one correspondance between write() calls and
> > packets on the wire, INT_MAX is going to be far larger
> > than the packet size limit the protocol is going to
> > check for and signal with -EMSGSIZE.
> >
> > Based upon a patch by Linus Torvalds.
> >
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> > ---
> >
> > Ok, this is the patch I am testing right now. It ought to
> > plug the TIPC holes wrt. handling iovecs given by the
> > user.
> >
> > I'll look at the recently discovered RDS crap next :-/
> >
> > include/linux/socket.h | 2 +-
> > net/compat.c | 12 +++++++-----
> > net/core/iovec.c | 19 +++++++++----------
> > 3 files changed, 17 insertions(+), 16 deletions(-)
> >
> > diff --git a/include/linux/socket.h b/include/linux/socket.h
> > index 5146b50..86b652f 100644
> > --- a/include/linux/socket.h
> > +++ b/include/linux/socket.h
> > @@ -322,7 +322,7 @@ extern int csum_partial_copy_fromiovecend(unsigned char *kdata,
> > int offset,
> > unsigned int len, __wsum *csump);
> >
> > -extern long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
> > +extern int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
> > extern int memcpy_toiovec(struct iovec *v, unsigned char *kdata, int len);
> > extern int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata,
> > int offset, int len);
> > diff --git a/net/compat.c b/net/compat.c
> > index 63d260e..71bfd8e 100644
> > --- a/net/compat.c
> > +++ b/net/compat.c
> > @@ -34,17 +34,19 @@ static inline int iov_from_user_compat_to_kern(struct iovec *kiov,
> > struct compat_iovec __user *uiov32,
> > int niov)
> > {
> > - int tot_len = 0;
> > + size_t tot_len = 0;
> >
> > while (niov > 0) {
> > compat_uptr_t buf;
> > compat_size_t len;
> >
> > if (get_user(len, &uiov32->iov_len) ||
> > - get_user(buf, &uiov32->iov_base)) {
> > - tot_len = -EFAULT;
> > - break;
> > - }
> > + get_user(buf, &uiov32->iov_base))
> > + return -EFAULT;
> > +
> > + if (len > INT_MAX - tot_len)
> > + len = INT_MAX - tot_len;
> > +
> > tot_len += len;
> > kiov->iov_base = compat_ptr(buf);
> > kiov->iov_len = (__kernel_size_t) len;
> > diff --git a/net/core/iovec.c b/net/core/iovec.c
> > index 72aceb1..e7f5b29 100644
> > --- a/net/core/iovec.c
> > +++ b/net/core/iovec.c
> > @@ -35,10 +35,10 @@
> > * in any case.
> > */
> >
> > -long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
> > +int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
> > {
> > int size, ct;
> > - long err;
> > + size_t err;
> >
> > if (m->msg_namelen) {
> > if (mode == VERIFY_READ) {
> > @@ -62,14 +62,13 @@ long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address,
> > err = 0;
> >
> > for (ct = 0; ct < m->msg_iovlen; ct++) {
> > - err += iov[ct].iov_len;
> > - /*
> > - * Goal is not to verify user data, but to prevent returning
> > - * negative value, which is interpreted as errno.
> > - * Overflow is still possible, but it is harmless.
> > - */
> > - if (err < 0)
> > - return -EMSGSIZE;
> > + size_t len = iov[ct].iov_len;
> > +
> > + if (len > INT_MAX - err) {
> > + len = INT_MAX - err;
> > + iov[ct].iov_len = len;
> > + }
> > + err += len;
> > }
> >
> > return err;
> > --
> > 1.7.3.2
> >
> >
next prev parent reply other threads:[~2010-10-29 14:00 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-28 18:22 [PATCH] net: Limit socket I/O iovec total length to INT_MAX David Miller
2010-10-28 18:33 ` Linus Torvalds
2010-10-28 18:37 ` David Miller
2010-10-29 6:40 ` Linus Torvalds
2010-10-29 14:00 ` Dan Rosenberg [this message]
2010-10-29 15:28 ` Linus Torvalds
2010-10-29 16:21 ` Linus Torvalds
2010-10-29 16:45 ` Al Viro
2010-10-29 17:01 ` Linus Torvalds
2010-10-29 17:32 ` Al Viro
2010-10-29 19:32 ` David Miller
2010-10-29 19:37 ` Linus Torvalds
2010-10-29 19:55 ` David Miller
2010-10-29 20:22 ` Dan Rosenberg
2010-10-29 18:51 ` Rick Jones
2010-10-29 18:59 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1288360820.2092.34.camel@dan \
--to=drosenberg@vsecurity.com \
--cc=allan.stephens@windriver.com \
--cc=davem@davemloft.net \
--cc=jon.maloy@ericsson.com \
--cc=netdev@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).