From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [PATCH net] netlink, mmap: fix edge-case leakages in nf queue zero-copy Date: Wed, 09 Sep 2015 21:43:49 -0700 (PDT) Message-ID: <20150909.214349.1837841017864345220.davem@davemloft.net> References: Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: chamaken@gmail.com, tgraf@suug.ch, fw@strlen.de, kaber@trash.net, netdev@vger.kernel.org To: daniel@iogearbox.net Return-path: Received: from shards.monkeyblade.net ([149.20.54.216]:37256 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751446AbbIJEnu (ORCPT ); Thu, 10 Sep 2015 00:43:50 -0400 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: From: Daniel Borkmann Date: Thu, 10 Sep 2015 02:10:57 +0200 > When netlink mmap on receive side is the consumer of nf queue data, > it can happen that in some edge cases, we write skb shared info into > the user space mmap buffer: > > Assume a possible rx ring frame size of only 4096, and the network skb, > which is being zero-copied into the netlink skb, contains page frags > with an overall skb->len larger than the linear part of the netlink > skb. > > skb_zerocopy(), which is generic and thus not aware of the fact that > shared info cannot be accessed for such skbs then tries to write and > fill frags, thus leaking kernel data/pointers and in some corner cases > possibly writing out of bounds of the mmap area (when filling the > last slot in the ring buffer this way). > > I.e. the ring buffer slot is then of status NL_MMAP_STATUS_VALID, has > an advertised length larger than 4096, where the linear part is visible > at the slot beginning, and the leaked sizeof(struct skb_shared_info) > has been written to the beginning of the next slot (also corrupting > the struct nl_mmap_hdr slot header incl. status etc), since skb->end > points to skb->data + ring->frame_size - NL_MMAP_HDRLEN. > > The fix adds and lets __netlink_alloc_skb() take the actual needed > linear room for the network skb + meta data into account. It's completely > irrelevant for non-mmaped netlink sockets, but in case mmap sockets > are used, it can be decided whether the available skb_tailroom() is > really large enough for the buffer, or whether it needs to internally > fallback to a normal alloc_skb(). > > From nf queue side, the information whether the destination port is > an mmap RX ring is not really available without extra port-to-socket > lookup, thus it can only be determined in lower layers i.e. when > __netlink_alloc_skb() is called that checks internally for this. I > chose to add the extra ldiff parameter as mmap will then still work: > We have data_len and hlen in nfqnl_build_packet_message(), data_len > is the full length (capped at queue->copy_range) for skb_zerocopy() > and hlen some possible part of data_len that needs to be copied; the > rem_len variable indicates the needed remaining linear mmap space. > > The only other workaround in nf queue internally would be after > allocation time by f.e. cap'ing the data_len to the skb_tailroom() > iff we deal with an mmap skb, but that would 1) expose the fact that > we use a mmap skb to upper layers, and 2) trim the skb where we > otherwise could just have moved the full skb into the normal receive > queue. > > After the patch, in my test case the ring slot doesn't fit and therefore > shows NL_MMAP_STATUS_COPY, where a full skb carries all the data and > thus needs to be picked up via recv(). > > Fixes: 3ab1f683bf8b ("nfnetlink: add support for memory mapped netlink") > Signed-off-by: Daniel Borkmann Applied.