* [PATCH 3.14 00/39] 3.14.17-stable review
@ 2014-08-08 21:34 Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 01/39] xfrm: Fix installation of AH IPsec SAs Greg Kroah-Hartman
` (40 more replies)
0 siblings, 41 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, torvalds, akpm, linux, satoru.takeuchi,
shuah.kh, stable
This is the start of the stable review cycle for the 3.14.17 release.
There are 39 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Aug 10 21:33:45 UTC 2014.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.14.17-rc1.gz
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux 3.14.17-rc1
Dave Chinner <dchinner@redhat.com>
xfs: log vector rounding leaks log space
Andrey Utkin <andrey.krieger.utkin@gmail.com>
arch/sparc/math-emu/math_32.c: drop stray break operator
Sowmini Varadhan <sowmini.varadhan@oracle.com>
sparc64: ldc_connect() should not return EINVAL when handshake is in progress.
Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
sunsab: Fix detection of BREAK on sunsab serial console
Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000
David S. Miller <davem@davemloft.net>
sparc64: Guard against flushing openfirmware mappings.
David S. Miller <davem@davemloft.net>
sparc64: Do not insert non-valid PTEs into the TSB hash table.
David S. Miller <davem@davemloft.net>
sparc64: Add membar to Niagara2 memcpy code.
David S. Miller <davem@davemloft.net>
sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.
David S. Miller <davem@davemloft.net>
sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses.
David S. Miller <davem@davemloft.net>
sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR().
David S. Miller <davem@davemloft.net>
sparc64: Add basic validations to {pud,pmd}_bad().
David S. Miller <davem@davemloft.net>
sparc64: Use 'ILOG2_4MB' instead of constant '22'.
David S. Miller <davem@davemloft.net>
sparc64: Fix range check in kern_addr_valid().
David S. Miller <davem@davemloft.net>
sparc64: Fix top-level fault handling bugs.
David S. Miller <davem@davemloft.net>
sparc64: Handle 32-bit tasks properly in compute_effective_address().
David S. Miller <davem@davemloft.net>
sparc64: Don't use _PAGE_PRESENT in pte_modify() mask.
David S. Miller <davem@davemloft.net>
sparc64: Fix hex values in comment above pte_modify().
David S. Miller <davem@davemloft.net>
sparc64: Fix bugs in get_user_pages_fast() wrt. THP.
David S. Miller <davem@davemloft.net>
sparc64: Fix huge PMD invalidation.
David S. Miller <davem@davemloft.net>
sparc64: Fix executable bit testing in set_pmd_at() paths.
Kirill Tkhai <tkhai@yandex.ru>
sparc64: Make itc_sync_lock raw
David S. Miller <davem@davemloft.net>
sparc64: Fix argument sign extension for compat_sys_futex().
Eric Dumazet <edumazet@google.com>
sctp: fix possible seqlock seadlock in sctp_packet_transmit()
Sven Eckelmann <sven@narfation.org>
batman-adv: Fix out-of-order fragmentation support
Sasha Levin <sasha.levin@oracle.com>
iovec: make sure the caller actually wants anything in memcpy_fromiovecend
Vlad Yasevich <vyasevic@redhat.com>
net: Correctly set segment mac_len in skb_segment().
Vlad Yasevich <vyasevic@redhat.com>
macvlan: Initialize vlan_features to turn on offload support.
Daniel Borkmann <dborkman@redhat.com>
net: sctp: inherit auth_capable on INIT collisions
Ivan Vecera <ivecera@redhat.com>
bna: fix performance regression
Christoph Paasch <christoph.paasch@uclouvain.be>
tcp: Fix integer-overflow in TCP vegas
Christoph Paasch <christoph.paasch@uclouvain.be>
tcp: Fix integer-overflows in TCP veno
Dmitry Popov <ixaphire@qrator.net>
ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip"
Florian Fainelli <f.fainelli@gmail.com>
net: phy: re-apply PHY fixups during phy_register_device
Andrey Ryabinin <ryabinin.a.a@gmail.com>
net: sendmsg: fix NULL pointer dereference
Eric Dumazet <edumazet@google.com>
ip: make IP identifiers less predictable
Eric Dumazet <edumazet@google.com>
inetpeer: get rid of ip_id_count
Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
bnx2x: fix crash during TSO tunneling
Tobias Brunner <tobias@strongswan.org>
xfrm: Fix installation of AH IPsec SAs
-------------
Diffstat:
Makefile | 4 +-
arch/sparc/include/asm/pgtable_64.h | 89 ++++++++++++----------
arch/sparc/include/asm/tlbflush_64.h | 12 +--
arch/sparc/include/asm/tsb.h | 3 +-
arch/sparc/kernel/head_64.S | 4 +-
arch/sparc/kernel/ktlb.S | 2 +-
arch/sparc/kernel/ldc.c | 2 +-
arch/sparc/kernel/smp_64.c | 6 +-
arch/sparc/kernel/sys32.S | 2 +-
arch/sparc/kernel/unaligned_64.c | 12 ++-
arch/sparc/lib/NG2memcpy.S | 1 +
arch/sparc/math-emu/math_32.c | 2 +-
arch/sparc/mm/fault_64.c | 98 +++++++++++++------------
arch/sparc/mm/gup.c | 2 +-
arch/sparc/mm/init_64.c | 43 +++++++++--
arch/sparc/mm/tlb.c | 26 +++++--
arch/sparc/mm/tsb.c | 14 +++-
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 1 +
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 9 +++
drivers/net/ethernet/brocade/bna/bnad.c | 2 +-
drivers/net/macvlan.c | 1 +
drivers/net/phy/phy_device.c | 2 +-
drivers/net/ppp/pptp.c | 2 +-
drivers/sbus/char/bbc_envctrl.c | 6 ++
drivers/sbus/char/bbc_i2c.c | 11 ++-
drivers/tty/serial/sunsab.c | 9 +++
fs/xfs/xfs_log.h | 19 +++--
fs/xfs/xfs_log_cil.c | 7 +-
include/net/inetpeer.h | 16 +---
include/net/ip.h | 31 ++++----
include/net/ip_tunnels.h | 1 +
include/net/ipv6.h | 2 -
include/net/secure_seq.h | 2 -
net/batman-adv/fragmentation.c | 10 ++-
net/compat.c | 9 ++-
net/core/iovec.c | 10 ++-
net/core/secure_seq.c | 25 -------
net/core/skbuff.c | 2 +-
net/ipv4/igmp.c | 4 +-
net/ipv4/inetpeer.c | 18 -----
net/ipv4/ip_output.c | 7 +-
net/ipv4/ip_tunnel.c | 29 +++++---
net/ipv4/ip_tunnel_core.c | 2 +-
net/ipv4/ipmr.c | 2 +-
net/ipv4/raw.c | 2 +-
net/ipv4/route.c | 63 +++++++++-------
net/ipv4/tcp_vegas.c | 3 +-
net/ipv4/tcp_veno.c | 2 +-
net/ipv4/xfrm4_mode_tunnel.c | 2 +-
net/ipv6/ip6_output.c | 14 ++++
net/ipv6/output_core.c | 23 ------
net/netfilter/ipvs/ip_vs_xmit.c | 2 +-
net/sctp/associola.c | 1 +
net/sctp/output.c | 2 +-
net/xfrm/xfrm_user.c | 7 +-
55 files changed, 379 insertions(+), 303 deletions(-)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 01/39] xfrm: Fix installation of AH IPsec SAs
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 02/39] bnx2x: fix crash during TSO tunneling Greg Kroah-Hartman
` (39 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Fan Du, Tobias Brunner,
Steffen Klassert
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tobias Brunner <tobias@strongswan.org>
[ Upstream commit a0e5ef53aac8e5049f9344857d8ec5237d31e58b ]
The SPI check introduced in ea9884b3acf3311c8a11db67bfab21773f6f82ba
was intended for IPComp SAs but actually prevented AH SAs from getting
installed (depending on the SPI).
Fixes: ea9884b3acf3 ("xfrm: check user specified spi for IPComp")
Cc: Fan Du <fan.du@windriver.com>
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/xfrm/xfrm_user.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -176,9 +176,7 @@ static int verify_newsa_info(struct xfrm
attrs[XFRMA_ALG_AEAD] ||
attrs[XFRMA_ALG_CRYPT] ||
attrs[XFRMA_ALG_COMP] ||
- attrs[XFRMA_TFCPAD] ||
- (ntohl(p->id.spi) >= 0x10000))
-
+ attrs[XFRMA_TFCPAD])
goto out;
break;
@@ -206,7 +204,8 @@ static int verify_newsa_info(struct xfrm
attrs[XFRMA_ALG_AUTH] ||
attrs[XFRMA_ALG_AUTH_TRUNC] ||
attrs[XFRMA_ALG_CRYPT] ||
- attrs[XFRMA_TFCPAD])
+ attrs[XFRMA_TFCPAD] ||
+ (ntohl(p->id.spi) >= 0x10000))
goto out;
break;
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 02/39] bnx2x: fix crash during TSO tunneling
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 01/39] xfrm: Fix installation of AH IPsec SAs Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 03/39] inetpeer: get rid of ip_id_count Greg Kroah-Hartman
` (38 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Yulong Pei, Michal Schmidt,
Dmitry Kravkov, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
[ Upstream commit fe26566d8a05151ba1dce75081f6270f73ec4ae1 ]
When TSO packet is transmitted additional BD w/o mapping is used
to describe the packed. The BD needs special handling in tx
completion.
kernel: Call Trace:
kernel: <IRQ> [<ffffffff815e19ba>] dump_stack+0x19/0x1b
kernel: [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
kernel: [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80
kernel: [<ffffffff814a8c0d>] ? find_iova+0x4d/0x90
kernel: [<ffffffff814ab0e2>] intel_unmap_page.part.36+0x142/0x160
kernel: [<ffffffff814ad0e6>] intel_unmap_page+0x26/0x30
kernel: [<ffffffffa01f55d7>] bnx2x_free_tx_pkt+0x157/0x2b0 [bnx2x]
kernel: [<ffffffffa01f8dac>] bnx2x_tx_int+0xac/0x220 [bnx2x]
kernel: [<ffffffff8101a0d9>] ? read_tsc+0x9/0x20
kernel: [<ffffffffa01f8fdb>] bnx2x_poll+0xbb/0x3c0 [bnx2x]
kernel: [<ffffffff814d041a>] net_rx_action+0x15a/0x250
kernel: [<ffffffff81067047>] __do_softirq+0xf7/0x290
kernel: [<ffffffff815f3a5c>] call_softirq+0x1c/0x30
kernel: [<ffffffff81014d25>] do_softirq+0x55/0x90
kernel: [<ffffffff810673e5>] irq_exit+0x115/0x120
kernel: [<ffffffff815f4358>] do_IRQ+0x58/0xf0
kernel: [<ffffffff815e94ad>] common_interrupt+0x6d/0x6d
kernel: <EOI> [<ffffffff810bbff7>] ? clockevents_notify+0x127/0x140
kernel: [<ffffffff814834df>] ? cpuidle_enter_state+0x4f/0xc0
kernel: [<ffffffff81483615>] cpuidle_idle_call+0xc5/0x200
kernel: [<ffffffff8101bc7e>] arch_cpu_idle+0xe/0x30
kernel: [<ffffffff810b4725>] cpu_startup_entry+0xf5/0x290
kernel: [<ffffffff815cfee1>] start_secondary+0x265/0x27b
kernel: ---[ end trace 11aa7726f18d7e80 ]---
Fixes: a848ade408b ("bnx2x: add CSUM and TSO support for encapsulation protocols")
Reported-by: Yulong Pei <ypei@redhat.com>
Cc: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 1 +
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 9 +++++++++
2 files changed, 10 insertions(+)
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -337,6 +337,7 @@ struct sw_tx_bd {
u8 flags;
/* Set on the first BD descriptor when there is a split BD */
#define BNX2X_TSO_SPLIT_BD (1<<0)
+#define BNX2X_HAS_SECOND_PBD (1<<1)
};
struct sw_rx_page {
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -223,6 +223,12 @@ static u16 bnx2x_free_tx_pkt(struct bnx2
--nbd;
bd_idx = TX_BD(NEXT_TX_IDX(bd_idx));
+ if (tx_buf->flags & BNX2X_HAS_SECOND_PBD) {
+ /* Skip second parse bd... */
+ --nbd;
+ bd_idx = TX_BD(NEXT_TX_IDX(bd_idx));
+ }
+
/* TSO headers+data bds share a common mapping. See bnx2x_tx_split() */
if (tx_buf->flags & BNX2X_TSO_SPLIT_BD) {
tx_data_bd = &txdata->tx_desc_ring[bd_idx].reg_bd;
@@ -3868,6 +3874,9 @@ netdev_tx_t bnx2x_start_xmit(struct sk_b
/* set encapsulation flag in start BD */
SET_FLAG(tx_start_bd->general_data,
ETH_TX_START_BD_TUNNEL_EXIST, 1);
+
+ tx_buf->flags |= BNX2X_HAS_SECOND_PBD;
+
nbd++;
} else if (xmit_type & XMIT_CSUM) {
/* Set PBD in checksum offload case w/o encapsulation */
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 03/39] inetpeer: get rid of ip_id_count
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 01/39] xfrm: Fix installation of AH IPsec SAs Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 02/39] bnx2x: fix crash during TSO tunneling Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 04/39] ip: make IP identifiers less predictable Greg Kroah-Hartman
` (37 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit 73f156a6e8c1074ac6327e0abd1169e95eb66463 ]
Ideally, we would need to generate IP ID using a per destination IP
generator.
linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.
1) each inet_peer struct consumes 192 bytes
2) inetpeer cache uses a binary tree of inet_peer structs,
with a nominal size of ~66000 elements under load.
3) lookups in this tree are hitting a lot of cache lines, as tree depth
is about 20.
4) If server deals with many tcp flows, we have a high probability of
not finding the inet_peer, allocating a fresh one, inserting it in
the tree with same initial ip_id_count, (cf secure_ip_id())
5) We garbage collect inet_peer aggressively.
IP ID generation do not have to be 'perfect'
Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.
We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.
ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)
secure_ip_id() and secure_ipv6_id() no longer are needed.
Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/net/ppp/pptp.c | 2 -
include/net/inetpeer.h | 16 ++------------
include/net/ip.h | 40 ++++++++++++++++++++---------------
include/net/ipv6.h | 2 -
include/net/secure_seq.h | 2 -
net/core/secure_seq.c | 25 ----------------------
net/ipv4/igmp.c | 4 +--
net/ipv4/inetpeer.c | 18 ----------------
net/ipv4/ip_output.c | 7 ++----
net/ipv4/ip_tunnel_core.c | 2 -
net/ipv4/ipmr.c | 2 -
net/ipv4/raw.c | 2 -
net/ipv4/route.c | 45 ++++++++++++++--------------------------
net/ipv4/xfrm4_mode_tunnel.c | 2 -
net/ipv6/ip6_output.c | 12 ++++++++++
net/ipv6/output_core.c | 23 --------------------
net/netfilter/ipvs/ip_vs_xmit.c | 2 -
17 files changed, 65 insertions(+), 141 deletions(-)
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -281,7 +281,7 @@ static int pptp_xmit(struct ppp_channel
nf_reset(skb);
skb->ip_summed = CHECKSUM_NONE;
- ip_select_ident(skb, &rt->dst, NULL);
+ ip_select_ident(skb, NULL);
ip_send_check(iph);
ip_local_out(skb);
--- a/include/net/inetpeer.h
+++ b/include/net/inetpeer.h
@@ -41,14 +41,13 @@ struct inet_peer {
struct rcu_head gc_rcu;
};
/*
- * Once inet_peer is queued for deletion (refcnt == -1), following fields
- * are not available: rid, ip_id_count
+ * Once inet_peer is queued for deletion (refcnt == -1), following field
+ * is not available: rid
* We can share memory with rcu_head to help keep inet_peer small.
*/
union {
struct {
atomic_t rid; /* Frag reception counter */
- atomic_t ip_id_count; /* IP ID for the next packet */
};
struct rcu_head rcu;
struct inet_peer *gc_next;
@@ -165,7 +164,7 @@ bool inet_peer_xrlim_allow(struct inet_p
void inetpeer_invalidate_tree(struct inet_peer_base *);
/*
- * temporary check to make sure we dont access rid, ip_id_count, tcp_ts,
+ * temporary check to make sure we dont access rid, tcp_ts,
* tcp_ts_stamp if no refcount is taken on inet_peer
*/
static inline void inet_peer_refcheck(const struct inet_peer *p)
@@ -173,13 +172,4 @@ static inline void inet_peer_refcheck(co
WARN_ON_ONCE(atomic_read(&p->refcnt) <= 0);
}
-
-/* can be called with or without local BH being disabled */
-static inline int inet_getid(struct inet_peer *p, int more)
-{
- more++;
- inet_peer_refcheck(p);
- return atomic_add_return(more, &p->ip_id_count) - more;
-}
-
#endif /* _NET_INETPEER_H */
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -297,9 +297,19 @@ static inline unsigned int ip_skb_dst_mt
}
}
-void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst, int more);
+#define IP_IDENTS_SZ 2048u
+extern atomic_t *ip_idents;
-static inline void ip_select_ident(struct sk_buff *skb, struct dst_entry *dst, struct sock *sk)
+static inline u32 ip_idents_reserve(u32 hash, int segs)
+{
+ atomic_t *id_ptr = ip_idents + hash % IP_IDENTS_SZ;
+
+ return atomic_add_return(segs, id_ptr) - segs;
+}
+
+void __ip_select_ident(struct iphdr *iph, int segs);
+
+static inline void ip_select_ident_segs(struct sk_buff *skb, struct sock *sk, int segs)
{
struct iphdr *iph = ip_hdr(skb);
@@ -309,24 +319,20 @@ static inline void ip_select_ident(struc
* does not change, they drop every other packet in
* a TCP stream using header compression.
*/
- iph->id = (sk && inet_sk(sk)->inet_daddr) ?
- htons(inet_sk(sk)->inet_id++) : 0;
- } else
- __ip_select_ident(iph, dst, 0);
-}
-
-static inline void ip_select_ident_more(struct sk_buff *skb, struct dst_entry *dst, struct sock *sk, int more)
-{
- struct iphdr *iph = ip_hdr(skb);
-
- if ((iph->frag_off & htons(IP_DF)) && !skb->local_df) {
if (sk && inet_sk(sk)->inet_daddr) {
iph->id = htons(inet_sk(sk)->inet_id);
- inet_sk(sk)->inet_id += 1 + more;
- } else
+ inet_sk(sk)->inet_id += segs;
+ } else {
iph->id = 0;
- } else
- __ip_select_ident(iph, dst, more);
+ }
+ } else {
+ __ip_select_ident(iph, segs);
+ }
+}
+
+static inline void ip_select_ident(struct sk_buff *skb, struct sock *sk)
+{
+ ip_select_ident_segs(skb, sk, 1);
}
/*
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -660,8 +660,6 @@ static inline int ipv6_addr_diff(const s
return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr));
}
-void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt);
-
int ip6_dst_hoplimit(struct dst_entry *dst);
/*
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -3,8 +3,6 @@
#include <linux/types.h>
-__u32 secure_ip_id(__be32 daddr);
-__u32 secure_ipv6_id(const __be32 daddr[4]);
u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
__be16 dport);
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -85,31 +85,6 @@ EXPORT_SYMBOL(secure_ipv6_port_ephemeral
#endif
#ifdef CONFIG_INET
-__u32 secure_ip_id(__be32 daddr)
-{
- u32 hash[MD5_DIGEST_WORDS];
-
- net_secret_init();
- hash[0] = (__force __u32) daddr;
- hash[1] = net_secret[13];
- hash[2] = net_secret[14];
- hash[3] = net_secret[15];
-
- md5_transform(hash, net_secret);
-
- return hash[0];
-}
-
-__u32 secure_ipv6_id(const __be32 daddr[4])
-{
- __u32 hash[4];
-
- net_secret_init();
- memcpy(hash, daddr, 16);
- md5_transform(hash, net_secret);
-
- return hash[0];
-}
__u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
__be16 sport, __be16 dport)
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -369,7 +369,7 @@ static struct sk_buff *igmpv3_newpack(st
pip->saddr = fl4.saddr;
pip->protocol = IPPROTO_IGMP;
pip->tot_len = 0; /* filled in later */
- ip_select_ident(skb, &rt->dst, NULL);
+ ip_select_ident(skb, NULL);
((u8 *)&pip[1])[0] = IPOPT_RA;
((u8 *)&pip[1])[1] = 4;
((u8 *)&pip[1])[2] = 0;
@@ -714,7 +714,7 @@ static int igmp_send_report(struct in_de
iph->daddr = dst;
iph->saddr = fl4.saddr;
iph->protocol = IPPROTO_IGMP;
- ip_select_ident(skb, &rt->dst, NULL);
+ ip_select_ident(skb, NULL);
((u8 *)&iph[1])[0] = IPOPT_RA;
((u8 *)&iph[1])[1] = 4;
((u8 *)&iph[1])[2] = 0;
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -26,20 +26,7 @@
* Theory of operations.
* We keep one entry for each peer IP address. The nodes contains long-living
* information about the peer which doesn't depend on routes.
- * At this moment this information consists only of ID field for the next
- * outgoing IP packet. This field is incremented with each packet as encoded
- * in inet_getid() function (include/net/inetpeer.h).
- * At the moment of writing this notes identifier of IP packets is generated
- * to be unpredictable using this code only for packets subjected
- * (actually or potentially) to defragmentation. I.e. DF packets less than
- * PMTU in size when local fragmentation is disabled use a constant ID and do
- * not use this code (see ip_select_ident() in include/net/ip.h).
*
- * Route cache entries hold references to our nodes.
- * New cache entries get references via lookup by destination IP address in
- * the avl tree. The reference is grabbed only when it's needed i.e. only
- * when we try to output IP packet which needs an unpredictable ID (see
- * __ip_select_ident() in net/ipv4/route.c).
* Nodes are removed only when reference counter goes to 0.
* When it's happened the node may be removed when a sufficient amount of
* time has been passed since its last use. The less-recently-used entry can
@@ -62,7 +49,6 @@
* refcnt: atomically against modifications on other CPU;
* usually under some other lock to prevent node disappearing
* daddr: unchangeable
- * ip_id_count: atomic value (no lock needed)
*/
static struct kmem_cache *peer_cachep __read_mostly;
@@ -497,10 +483,6 @@ relookup:
p->daddr = *daddr;
atomic_set(&p->refcnt, 1);
atomic_set(&p->rid, 0);
- atomic_set(&p->ip_id_count,
- (daddr->family == AF_INET) ?
- secure_ip_id(daddr->addr.a4) :
- secure_ipv6_id(daddr->addr.a6));
p->metrics[RTAX_LOCK-1] = INETPEER_METRICS_NEW;
p->rate_tokens = 0;
/* 60*HZ is arbitrary, but chosen enough high so that the first
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -148,7 +148,7 @@ int ip_build_and_send_pkt(struct sk_buff
iph->daddr = (opt && opt->opt.srr ? opt->opt.faddr : daddr);
iph->saddr = saddr;
iph->protocol = sk->sk_protocol;
- ip_select_ident(skb, &rt->dst, sk);
+ ip_select_ident(skb, sk);
if (opt && opt->opt.optlen) {
iph->ihl += opt->opt.optlen>>2;
@@ -386,8 +386,7 @@ packet_routed:
ip_options_build(skb, &inet_opt->opt, inet->inet_daddr, rt, 0);
}
- ip_select_ident_more(skb, &rt->dst, sk,
- (skb_shinfo(skb)->gso_segs ?: 1) - 1);
+ ip_select_ident_segs(skb, sk, skb_shinfo(skb)->gso_segs ?: 1);
skb->priority = sk->sk_priority;
skb->mark = sk->sk_mark;
@@ -1338,7 +1337,7 @@ struct sk_buff *__ip_make_skb(struct soc
iph->ttl = ttl;
iph->protocol = sk->sk_protocol;
ip_copy_addrs(iph, fl4);
- ip_select_ident(skb, &rt->dst, sk);
+ ip_select_ident(skb, sk);
if (opt) {
iph->ihl += opt->optlen>>2;
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -74,7 +74,7 @@ int iptunnel_xmit(struct rtable *rt, str
iph->daddr = dst;
iph->saddr = src;
iph->ttl = ttl;
- __ip_select_ident(iph, &rt->dst, (skb_shinfo(skb)->gso_segs ?: 1) - 1);
+ __ip_select_ident(iph, skb_shinfo(skb)->gso_segs ?: 1);
err = ip_local_out(skb);
if (unlikely(net_xmit_eval(err)))
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1663,7 +1663,7 @@ static void ip_encap(struct sk_buff *skb
iph->protocol = IPPROTO_IPIP;
iph->ihl = 5;
iph->tot_len = htons(skb->len);
- ip_select_ident(skb, skb_dst(skb), NULL);
+ ip_select_ident(skb, NULL);
ip_send_check(iph);
memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -389,7 +389,7 @@ static int raw_send_hdrinc(struct sock *
iph->check = 0;
iph->tot_len = htons(length);
if (!iph->id)
- ip_select_ident(skb, &rt->dst, NULL);
+ ip_select_ident(skb, NULL);
iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
}
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -89,6 +89,7 @@
#include <linux/rcupdate.h>
#include <linux/times.h>
#include <linux/slab.h>
+#include <linux/jhash.h>
#include <net/dst.h>
#include <net/net_namespace.h>
#include <net/protocol.h>
@@ -462,39 +463,19 @@ static struct neighbour *ipv4_neigh_look
return neigh_create(&arp_tbl, pkey, dev);
}
-/*
- * Peer allocation may fail only in serious out-of-memory conditions. However
- * we still can generate some output.
- * Random ID selection looks a bit dangerous because we have no chances to
- * select ID being unique in a reasonable period of time.
- * But broken packet identifier may be better than no packet at all.
- */
-static void ip_select_fb_ident(struct iphdr *iph)
-{
- static DEFINE_SPINLOCK(ip_fb_id_lock);
- static u32 ip_fallback_id;
- u32 salt;
+atomic_t *ip_idents __read_mostly;
+EXPORT_SYMBOL(ip_idents);
- spin_lock_bh(&ip_fb_id_lock);
- salt = secure_ip_id((__force __be32)ip_fallback_id ^ iph->daddr);
- iph->id = htons(salt & 0xFFFF);
- ip_fallback_id = salt;
- spin_unlock_bh(&ip_fb_id_lock);
-}
-
-void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst, int more)
+void __ip_select_ident(struct iphdr *iph, int segs)
{
- struct net *net = dev_net(dst->dev);
- struct inet_peer *peer;
+ static u32 ip_idents_hashrnd __read_mostly;
+ u32 hash, id;
- peer = inet_getpeer_v4(net->ipv4.peers, iph->daddr, 1);
- if (peer) {
- iph->id = htons(inet_getid(peer, more));
- inet_putpeer(peer);
- return;
- }
+ net_get_random_once(&ip_idents_hashrnd, sizeof(ip_idents_hashrnd));
- ip_select_fb_ident(iph);
+ hash = jhash_1word((__force u32)iph->daddr, ip_idents_hashrnd);
+ id = ip_idents_reserve(hash, segs);
+ iph->id = htons(id);
}
EXPORT_SYMBOL(__ip_select_ident);
@@ -2718,6 +2699,12 @@ int __init ip_rt_init(void)
{
int rc = 0;
+ ip_idents = kmalloc(IP_IDENTS_SZ * sizeof(*ip_idents), GFP_KERNEL);
+ if (!ip_idents)
+ panic("IP: failed to allocate ip_idents\n");
+
+ prandom_bytes(ip_idents, IP_IDENTS_SZ * sizeof(*ip_idents));
+
#ifdef CONFIG_IP_ROUTE_CLASSID
ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct), __alignof__(struct ip_rt_acct));
if (!ip_rt_acct)
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -117,12 +117,12 @@ static int xfrm4_mode_tunnel_output(stru
top_iph->frag_off = (flags & XFRM_STATE_NOPMTUDISC) ?
0 : (XFRM_MODE_SKB_CB(skb)->frag_off & htons(IP_DF));
- ip_select_ident(skb, dst->child, NULL);
top_iph->ttl = ip4_dst_hoplimit(dst->child);
top_iph->saddr = x->props.saddr.a4;
top_iph->daddr = x->id.daddr.a4;
+ ip_select_ident(skb, NULL);
return 0;
}
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -537,6 +537,18 @@ static void ip6_copy_metadata(struct sk_
skb_copy_secmark(to, from);
}
+static void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt)
+{
+ static u32 ip6_idents_hashrnd __read_mostly;
+ u32 hash, id;
+
+ net_get_random_once(&ip6_idents_hashrnd, sizeof(ip6_idents_hashrnd));
+
+ hash = __ipv6_addr_jhash(&rt->rt6i_dst.addr, ip6_idents_hashrnd);
+ id = ip_idents_reserve(hash, 1);
+ fhdr->identification = htonl(id);
+}
+
int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
{
struct sk_buff *frag;
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -7,29 +7,6 @@
#include <net/ip6_fib.h>
#include <net/addrconf.h>
-void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt)
-{
- static atomic_t ipv6_fragmentation_id;
- int ident;
-
-#if IS_ENABLED(CONFIG_IPV6)
- if (rt && !(rt->dst.flags & DST_NOPEER)) {
- struct inet_peer *peer;
- struct net *net;
-
- net = dev_net(rt->dst.dev);
- peer = inet_getpeer_v6(net->ipv6.peers, &rt->rt6i_dst.addr, 1);
- if (peer) {
- fhdr->identification = htonl(inet_getid(peer, 0));
- inet_putpeer(peer);
- return;
- }
- }
-#endif
- ident = atomic_inc_return(&ipv6_fragmentation_id);
- fhdr->identification = htonl(ident);
-}
-EXPORT_SYMBOL(ipv6_select_ident);
int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
{
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -883,7 +883,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, s
iph->daddr = cp->daddr.ip;
iph->saddr = saddr;
iph->ttl = old_iph->ttl;
- ip_select_ident(skb, &rt->dst, NULL);
+ ip_select_ident(skb, NULL);
/* Another hack: avoid icmp_send in ip_fragment */
skb->local_df = 1;
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 04/39] ip: make IP identifiers less predictable
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (2 preceding siblings ...)
2014-08-08 21:34 ` [PATCH 3.14 03/39] inetpeer: get rid of ip_id_count Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 05/39] net: sendmsg: fix NULL pointer dereference Greg Kroah-Hartman
` (36 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Jeffrey Knockel,
Jedidiah R. Crandall, Willy Tarreau, Hannes Frederic Sowa,
David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit 04ca6973f7c1a0d8537f2d9906a0cf8e69886d75 ]
In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.
With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.
This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.
Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.
This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.
We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.
For IPv6, adds saddr into the hash as well, but not nexthdr.
If I ping the patched target, we can see ID are now hard to predict.
21:57:11.008086 IP (...)
A > target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
target > A: ICMP echo reply, seq 1, length 64
21:57:12.013133 IP (...)
A > target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
target > A: ICMP echo reply, seq 2, length 64
21:57:13.016580 IP (...)
A > target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
target > A: ICMP echo reply, seq 3, length 64
[1] TCP sessions uses a per flow ID generator not changed by this patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu>
Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/net/ip.h | 11 +----------
net/ipv4/route.c | 32 +++++++++++++++++++++++++++++---
net/ipv6/ip6_output.c | 2 ++
3 files changed, 32 insertions(+), 13 deletions(-)
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -297,16 +297,7 @@ static inline unsigned int ip_skb_dst_mt
}
}
-#define IP_IDENTS_SZ 2048u
-extern atomic_t *ip_idents;
-
-static inline u32 ip_idents_reserve(u32 hash, int segs)
-{
- atomic_t *id_ptr = ip_idents + hash % IP_IDENTS_SZ;
-
- return atomic_add_return(segs, id_ptr) - segs;
-}
-
+u32 ip_idents_reserve(u32 hash, int segs);
void __ip_select_ident(struct iphdr *iph, int segs);
static inline void ip_select_ident_segs(struct sk_buff *skb, struct sock *sk, int segs)
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -463,8 +463,31 @@ static struct neighbour *ipv4_neigh_look
return neigh_create(&arp_tbl, pkey, dev);
}
-atomic_t *ip_idents __read_mostly;
-EXPORT_SYMBOL(ip_idents);
+#define IP_IDENTS_SZ 2048u
+struct ip_ident_bucket {
+ atomic_t id;
+ u32 stamp32;
+};
+
+static struct ip_ident_bucket *ip_idents __read_mostly;
+
+/* In order to protect privacy, we add a perturbation to identifiers
+ * if one generator is seldom used. This makes hard for an attacker
+ * to infer how many packets were sent between two points in time.
+ */
+u32 ip_idents_reserve(u32 hash, int segs)
+{
+ struct ip_ident_bucket *bucket = ip_idents + hash % IP_IDENTS_SZ;
+ u32 old = ACCESS_ONCE(bucket->stamp32);
+ u32 now = (u32)jiffies;
+ u32 delta = 0;
+
+ if (old != now && cmpxchg(&bucket->stamp32, old, now) == old)
+ delta = prandom_u32_max(now - old);
+
+ return atomic_add_return(segs + delta, &bucket->id) - segs;
+}
+EXPORT_SYMBOL(ip_idents_reserve);
void __ip_select_ident(struct iphdr *iph, int segs)
{
@@ -473,7 +496,10 @@ void __ip_select_ident(struct iphdr *iph
net_get_random_once(&ip_idents_hashrnd, sizeof(ip_idents_hashrnd));
- hash = jhash_1word((__force u32)iph->daddr, ip_idents_hashrnd);
+ hash = jhash_3words((__force u32)iph->daddr,
+ (__force u32)iph->saddr,
+ iph->protocol,
+ ip_idents_hashrnd);
id = ip_idents_reserve(hash, segs);
iph->id = htons(id);
}
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -545,6 +545,8 @@ static void ipv6_select_ident(struct fra
net_get_random_once(&ip6_idents_hashrnd, sizeof(ip6_idents_hashrnd));
hash = __ipv6_addr_jhash(&rt->rt6i_dst.addr, ip6_idents_hashrnd);
+ hash = __ipv6_addr_jhash(&rt->rt6i_src.addr, hash);
+
id = ip_idents_reserve(hash, 1);
fhdr->identification = htonl(id);
}
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 05/39] net: sendmsg: fix NULL pointer dereference
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (3 preceding siblings ...)
2014-08-08 21:34 ` [PATCH 3.14 04/39] ip: make IP identifiers less predictable Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 06/39] net: phy: re-apply PHY fixups during phy_register_device Greg Kroah-Hartman
` (35 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Hannes Frederic Sowa, Eric Dumazet,
Sasha Levin, Andrey Ryabinin, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Ryabinin <ryabinin.a.a@gmail.com>
[ Upstream commit 40eea803c6b2cfaab092f053248cbeab3f368412 ]
Sasha's report:
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel with the KASAN patchset, I've stumbled on the following spew:
>
> [ 4448.949424] ==================================================================
> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
> [ 4448.952988] Read of size 2 by thread T19638:
> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
> [ 4448.956823] ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
> [ 4448.958233] ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
> [ 4448.959552] 0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
> [ 4448.961266] Call Trace:
> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
> [ 4448.970103] sock_sendmsg (net/socket.c:654)
> [ 4448.971584] ? might_fault (mm/memory.c:3741)
> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
> [ 4448.988929] ==================================================================
This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.
After this report there was no usual "Unable to handle kernel NULL pointer dereference"
and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.
This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
(net: rework recvmsg handler msg_name and msg_namelen logic).
Commit message states that:
"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address."
But in fact this affects sendto when address 0 is mapped and contains
socket address structure in it. In such case copy-in address will succeed,
verify_iovec() function will successfully exit with msg->msg_namelen > 0
and msg->msg_name == NULL.
This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/compat.c | 9 +++++----
net/core/iovec.c | 6 +++---
2 files changed, 8 insertions(+), 7 deletions(-)
--- a/net/compat.c
+++ b/net/compat.c
@@ -85,7 +85,7 @@ int verify_compat_iovec(struct msghdr *k
{
int tot_len;
- if (kern_msg->msg_namelen) {
+ if (kern_msg->msg_name && kern_msg->msg_namelen) {
if (mode == VERIFY_READ) {
int err = move_addr_to_kernel(kern_msg->msg_name,
kern_msg->msg_namelen,
@@ -93,10 +93,11 @@ int verify_compat_iovec(struct msghdr *k
if (err < 0)
return err;
}
- if (kern_msg->msg_name)
- kern_msg->msg_name = kern_address;
- } else
+ kern_msg->msg_name = kern_address;
+ } else {
kern_msg->msg_name = NULL;
+ kern_msg->msg_namelen = 0;
+ }
tot_len = iov_from_user_compat_to_kern(kern_iov,
(struct compat_iovec __user *)kern_msg->msg_iov,
--- a/net/core/iovec.c
+++ b/net/core/iovec.c
@@ -39,7 +39,7 @@ int verify_iovec(struct msghdr *m, struc
{
int size, ct, err;
- if (m->msg_namelen) {
+ if (m->msg_name && m->msg_namelen) {
if (mode == VERIFY_READ) {
void __user *namep;
namep = (void __user __force *) m->msg_name;
@@ -48,10 +48,10 @@ int verify_iovec(struct msghdr *m, struc
if (err < 0)
return err;
}
- if (m->msg_name)
- m->msg_name = address;
+ m->msg_name = address;
} else {
m->msg_name = NULL;
+ m->msg_namelen = 0;
}
size = m->msg_iovlen * sizeof(struct iovec);
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 06/39] net: phy: re-apply PHY fixups during phy_register_device
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (4 preceding siblings ...)
2014-08-08 21:34 ` [PATCH 3.14 05/39] net: sendmsg: fix NULL pointer dereference Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 07/39] ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip" Greg Kroah-Hartman
` (34 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Hauke Merthens, Jonas Gorski,
Florian Fainelli, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Fainelli <f.fainelli@gmail.com>
[ Upstream commit d92f5dec6325079c550889883af51db1b77d5623 ]
Commit 87aa9f9c61ad ("net: phy: consolidate PHY reset in phy_init_hw()")
moved the call to phy_scan_fixups() in phy_init_hw() after a software
reset is performed.
By the time phy_init_hw() is called in phy_device_register(), no driver
has been bound to this PHY yet, so all the checks in phy_init_hw()
against the PHY driver and the PHY driver's config_init function will
return 0. We will therefore never call phy_scan_fixups() as we should.
Fix this by calling phy_scan_fixups() and check for its return value to
restore the intended functionality.
This broke PHY drivers which do register an early PHY fixup callback to
intercept the PHY probing and do things like changing the 32-bits unique
PHY identifier when a pseudo-PHY address has been used, as well as
board-specific PHY fixups that need to be applied during driver probe
time.
Reported-by: Hauke Merthens <hauke-m@hauke-m.de>
Reported-by: Jonas Gorski <jogo@openwrt.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/net/phy/phy_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -353,7 +353,7 @@ int phy_device_register(struct phy_devic
phydev->bus->phy_map[phydev->addr] = phydev;
/* Run all of the fixups for this PHY */
- err = phy_init_hw(phydev);
+ err = phy_scan_fixups(phydev);
if (err) {
pr_err("PHY %d failed to initialize\n", phydev->addr);
goto out;
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 07/39] ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip"
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (5 preceding siblings ...)
2014-08-08 21:34 ` [PATCH 3.14 06/39] net: phy: re-apply PHY fixups during phy_register_device Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 08/39] tcp: Fix integer-overflows in TCP veno Greg Kroah-Hartman
` (33 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Sergey Popov, Dmitry Popov,
David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Popov <ixaphire@qrator.net>
[ Upstream commit 95cb5745983c222867cc9ac593aebb2ad67d72c0 ]
Ipv4 tunnels created with "local any remote $ip" didn't work properly since
7d442fab0 (ipv4: Cache dst in tunnels). 99% of packets sent via those tunnels
had src addr = 0.0.0.0. That was because only dst_entry was cached, although
fl4.saddr has to be cached too. Every time ip_tunnel_xmit used cached dst_entry
(tunnel_rtable_get returned non-NULL), fl4.saddr was initialized with
tnl_params->saddr (= 0 in our case), and wasn't changed until iptunnel_xmit().
This patch adds saddr to ip_tunnel->dst_cache, fixing this issue.
Reported-by: Sergey Popov <pinkbyte@gentoo.org>
Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/net/ip_tunnels.h | 1 +
net/ipv4/ip_tunnel.c | 29 ++++++++++++++++++-----------
2 files changed, 19 insertions(+), 11 deletions(-)
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -40,6 +40,7 @@ struct ip_tunnel_prl_entry {
struct ip_tunnel_dst {
struct dst_entry __rcu *dst;
+ __be32 saddr;
};
struct ip_tunnel {
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -69,23 +69,25 @@ static unsigned int ip_tunnel_hash(__be3
}
static void __tunnel_dst_set(struct ip_tunnel_dst *idst,
- struct dst_entry *dst)
+ struct dst_entry *dst, __be32 saddr)
{
struct dst_entry *old_dst;
dst_clone(dst);
old_dst = xchg((__force struct dst_entry **)&idst->dst, dst);
dst_release(old_dst);
+ idst->saddr = saddr;
}
-static void tunnel_dst_set(struct ip_tunnel *t, struct dst_entry *dst)
+static void tunnel_dst_set(struct ip_tunnel *t,
+ struct dst_entry *dst, __be32 saddr)
{
- __tunnel_dst_set(this_cpu_ptr(t->dst_cache), dst);
+ __tunnel_dst_set(this_cpu_ptr(t->dst_cache), dst, saddr);
}
static void tunnel_dst_reset(struct ip_tunnel *t)
{
- tunnel_dst_set(t, NULL);
+ tunnel_dst_set(t, NULL, 0);
}
void ip_tunnel_dst_reset_all(struct ip_tunnel *t)
@@ -93,20 +95,25 @@ void ip_tunnel_dst_reset_all(struct ip_t
int i;
for_each_possible_cpu(i)
- __tunnel_dst_set(per_cpu_ptr(t->dst_cache, i), NULL);
+ __tunnel_dst_set(per_cpu_ptr(t->dst_cache, i), NULL, 0);
}
EXPORT_SYMBOL(ip_tunnel_dst_reset_all);
-static struct rtable *tunnel_rtable_get(struct ip_tunnel *t, u32 cookie)
+static struct rtable *tunnel_rtable_get(struct ip_tunnel *t,
+ u32 cookie, __be32 *saddr)
{
+ struct ip_tunnel_dst *idst;
struct dst_entry *dst;
rcu_read_lock();
- dst = rcu_dereference(this_cpu_ptr(t->dst_cache)->dst);
+ idst = this_cpu_ptr(t->dst_cache);
+ dst = rcu_dereference(idst->dst);
if (dst && !atomic_inc_not_zero(&dst->__refcnt))
dst = NULL;
if (dst) {
- if (dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
+ if (!dst->obsolete || dst->ops->check(dst, cookie)) {
+ *saddr = idst->saddr;
+ } else {
tunnel_dst_reset(t);
dst_release(dst);
dst = NULL;
@@ -362,7 +369,7 @@ static int ip_tunnel_bind_dev(struct net
if (!IS_ERR(rt)) {
tdev = rt->dst.dev;
- tunnel_dst_set(tunnel, &rt->dst);
+ tunnel_dst_set(tunnel, &rt->dst, fl4.saddr);
ip_rt_put(rt);
}
if (dev->type != ARPHRD_ETHER)
@@ -606,7 +613,7 @@ void ip_tunnel_xmit(struct sk_buff *skb,
init_tunnel_flow(&fl4, protocol, dst, tnl_params->saddr,
tunnel->parms.o_key, RT_TOS(tos), tunnel->parms.link);
- rt = connected ? tunnel_rtable_get(tunnel, 0) : NULL;
+ rt = connected ? tunnel_rtable_get(tunnel, 0, &fl4.saddr) : NULL;
if (!rt) {
rt = ip_route_output_key(tunnel->net, &fl4);
@@ -616,7 +623,7 @@ void ip_tunnel_xmit(struct sk_buff *skb,
goto tx_error;
}
if (connected)
- tunnel_dst_set(tunnel, &rt->dst);
+ tunnel_dst_set(tunnel, &rt->dst, fl4.saddr);
}
if (rt->dst.dev == dev) {
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 08/39] tcp: Fix integer-overflows in TCP veno
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (6 preceding siblings ...)
2014-08-08 21:34 ` [PATCH 3.14 07/39] ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip" Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 09/39] tcp: Fix integer-overflow in TCP vegas Greg Kroah-Hartman
` (32 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Christoph Paasch, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Paasch <christoph.paasch@uclouvain.be>
[ Upstream commit 45a07695bc64b3ab5d6d2215f9677e5b8c05a7d0 ]
In veno we do a multiplication of the cwnd and the rtt. This
may overflow and thus their result is stored in a u64. However, we first
need to cast the cwnd so that actually 64-bit arithmetic is done.
A first attempt at fixing 76f1017757aa0 ([TCP]: TCP Veno congestion
control) was made by 159131149c2 (tcp: Overflow bug in Vegas), but it
failed to add the required cast in tcp_veno_cong_avoid().
Fixes: 76f1017757aa0 ([TCP]: TCP Veno congestion control)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/tcp_veno.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/tcp_veno.c
+++ b/net/ipv4/tcp_veno.c
@@ -145,7 +145,7 @@ static void tcp_veno_cong_avoid(struct s
rtt = veno->minrtt;
- target_cwnd = (tp->snd_cwnd * veno->basertt);
+ target_cwnd = (u64)tp->snd_cwnd * veno->basertt;
target_cwnd <<= V_PARAM_SHIFT;
do_div(target_cwnd, rtt);
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 09/39] tcp: Fix integer-overflow in TCP vegas
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (7 preceding siblings ...)
2014-08-08 21:34 ` [PATCH 3.14 08/39] tcp: Fix integer-overflows in TCP veno Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 10/39] bna: fix performance regression Greg Kroah-Hartman
` (31 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Stephen Hemminger, Neal Cardwell,
Eric Dumazet, David Laight, Doug Leith, Christoph Paasch,
David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Paasch <christoph.paasch@uclouvain.be>
[ Upstream commit 1f74e613ded11517db90b2bd57e9464d9e0fb161 ]
In vegas we do a multiplication of the cwnd and the rtt. This
may overflow and thus their result is stored in a u64. However, we first
need to cast the cwnd so that actually 64-bit arithmetic is done.
Then, we need to do do_div to allow this to be used on 32-bit arches.
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Doug Leith <doug.leith@nuim.ie>
Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/tcp_vegas.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv4/tcp_vegas.c
+++ b/net/ipv4/tcp_vegas.c
@@ -219,7 +219,8 @@ static void tcp_vegas_cong_avoid(struct
* This is:
* (actual rate in segments) * baseRTT
*/
- target_cwnd = tp->snd_cwnd * vegas->baseRTT / rtt;
+ target_cwnd = (u64)tp->snd_cwnd * vegas->baseRTT;
+ do_div(target_cwnd, rtt);
/* Calculate the difference between the window we had,
* and the window we would like to have. This quantity
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 10/39] bna: fix performance regression
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (8 preceding siblings ...)
2014-08-08 21:34 ` [PATCH 3.14 09/39] tcp: Fix integer-overflow in TCP vegas Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 11/39] net: sctp: inherit auth_capable on INIT collisions Greg Kroah-Hartman
` (30 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ivan Vecera, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera <ivecera@redhat.com>
[ Upstream commit c36c9d50cc6af5c5bfcc195f21b73f55520c15f9 ]
The recent commit "e29aa33 bna: Enable Multi Buffer RX" is causing
a performance regression. It does not properly update 'cmpl' pointer
at the end of the loop in NAPI handler bnad_cq_process(). The result is
only one packet / per NAPI-schedule is processed.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/net/ethernet/brocade/bna/bnad.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -600,9 +600,9 @@ bnad_cq_process(struct bnad *bnad, struc
prefetch(bnad->netdev);
cq = ccb->sw_q;
- cmpl = &cq[ccb->producer_index];
while (packets < budget) {
+ cmpl = &cq[ccb->producer_index];
if (!cmpl->valid)
break;
/* The 'valid' field is set by the adapter, only after writing
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 11/39] net: sctp: inherit auth_capable on INIT collisions
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (9 preceding siblings ...)
2014-08-08 21:34 ` [PATCH 3.14 10/39] bna: fix performance regression Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 12/39] macvlan: Initialize vlan_features to turn on offload support Greg Kroah-Hartman
` (29 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Jason Gunthorpe, Daniel Borkmann,
Vlad Yasevich, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann <dborkman@redhat.com>
[ Upstream commit 1be9a950c646c9092fb3618197f7b6bfb50e82aa ]
Jason reported an oops caused by SCTP on his ARM machine with
SCTP authentication enabled:
Internal error: Oops: 17 [#1] ARM
CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1
task: c6eefa40 ti: c6f52000 task.ti: c6f52000
PC is at sctp_auth_calculate_hmac+0xc4/0x10c
LR is at sg_init_table+0x20/0x38
pc : [<c024bb80>] lr : [<c00f32dc>] psr: 40000013
sp : c6f538e8 ip : 00000000 fp : c6f53924
r10: c6f50d80 r9 : 00000000 r8 : 00010000
r7 : 00000000 r6 : c7be4000 r5 : 00000000 r4 : c6f56254
r3 : c00c8170 r2 : 00000001 r1 : 00000008 r0 : c6f1e660
Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 0005397f Table: 06f28000 DAC: 00000015
Process sctp-test (pid: 104, stack limit = 0xc6f521c0)
Stack: (0xc6f538e8 to 0xc6f54000)
[...]
Backtrace:
[<c024babc>] (sctp_auth_calculate_hmac+0x0/0x10c) from [<c0249af8>] (sctp_packet_transmit+0x33c/0x5c8)
[<c02497bc>] (sctp_packet_transmit+0x0/0x5c8) from [<c023e96c>] (sctp_outq_flush+0x7fc/0x844)
[<c023e170>] (sctp_outq_flush+0x0/0x844) from [<c023ef78>] (sctp_outq_uncork+0x24/0x28)
[<c023ef54>] (sctp_outq_uncork+0x0/0x28) from [<c0234364>] (sctp_side_effects+0x1134/0x1220)
[<c0233230>] (sctp_side_effects+0x0/0x1220) from [<c02330b0>] (sctp_do_sm+0xac/0xd4)
[<c0233004>] (sctp_do_sm+0x0/0xd4) from [<c023675c>] (sctp_assoc_bh_rcv+0x118/0x160)
[<c0236644>] (sctp_assoc_bh_rcv+0x0/0x160) from [<c023d5bc>] (sctp_inq_push+0x6c/0x74)
[<c023d550>] (sctp_inq_push+0x0/0x74) from [<c024a6b0>] (sctp_rcv+0x7d8/0x888)
While we already had various kind of bugs in that area
ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if
we/peer is AUTH capable") and b14878ccb7fa ("net: sctp: cache
auth_enable per endpoint"), this one is a bit of a different
kind.
Giving a bit more background on why SCTP authentication is
needed can be found in RFC4895:
SCTP uses 32-bit verification tags to protect itself against
blind attackers. These values are not changed during the
lifetime of an SCTP association.
Looking at new SCTP extensions, there is the need to have a
method of proving that an SCTP chunk(s) was really sent by
the original peer that started the association and not by a
malicious attacker.
To cause this bug, we're triggering an INIT collision between
peers; normal SCTP handshake where both sides intent to
authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO
parameters that are being negotiated among peers:
---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
<------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
RFC4895 says that each endpoint therefore knows its own random
number and the peer's random number *after* the association
has been established. The local and peer's random number along
with the shared key are then part of the secret used for
calculating the HMAC in the AUTH chunk.
Now, in our scenario, we have 2 threads with 1 non-blocking
SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY
and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling
sctp_bindx(3), listen(2) and connect(2) against each other,
thus the handshake looks similar to this, e.g.:
---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
<------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
<--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] -----------
-------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] -------->
...
Since such collisions can also happen with verification tags,
the RFC4895 for AUTH rather vaguely says under section 6.1:
In case of INIT collision, the rules governing the handling
of this Random Number follow the same pattern as those for
the Verification Tag, as explained in Section 5.2.4 of
RFC 2960 [5]. Therefore, each endpoint knows its own Random
Number and the peer's Random Number after the association
has been established.
In RFC2960, section 5.2.4, we're eventually hitting Action B:
B) In this case, both sides may be attempting to start an
association at about the same time but the peer endpoint
started its INIT after responding to the local endpoint's
INIT. Thus it may have picked a new Verification Tag not
being aware of the previous Tag it had sent this endpoint.
The endpoint should stay in or enter the ESTABLISHED
state but it MUST update its peer's Verification Tag from
the State Cookie, stop any init or cookie timers that may
running and send a COOKIE ACK.
In other words, the handling of the Random parameter is the
same as behavior for the Verification Tag as described in
Action B of section 5.2.4.
Looking at the code, we exactly hit the sctp_sf_do_dupcook_b()
case which triggers an SCTP_CMD_UPDATE_ASSOC command to the
side effect interpreter, and in fact it properly copies over
peer_{random, hmacs, chunks} parameters from the newly created
association to update the existing one.
Also, the old asoc_shared_key is being released and based on
the new params, sctp_auth_asoc_init_active_key() updated.
However, the issue observed in this case is that the previous
asoc->peer.auth_capable was 0, and has *not* been updated, so
that instead of creating a new secret, we're doing an early
return from the function sctp_auth_asoc_init_active_key()
leaving asoc->asoc_shared_key as NULL. However, we now have to
authenticate chunks from the updated chunk list (e.g. COOKIE-ACK).
That in fact causes the server side when responding with ...
<------------------ AUTH; COOKIE-ACK -----------------
... to trigger a NULL pointer dereference, since in
sctp_packet_transmit(), it discovers that an AUTH chunk is
being queued for xmit, and thus it calls sctp_auth_calculate_hmac().
Since the asoc->active_key_id is still inherited from the
endpoint, and the same as encoded into the chunk, it uses
asoc->asoc_shared_key, which is still NULL, as an asoc_key
and dereferences it in ...
crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len)
... causing an oops. All this happens because sctp_make_cookie_ack()
called with the *new* association has the peer.auth_capable=1
and therefore marks the chunk with auth=1 after checking
sctp_auth_send_cid(), but it is *actually* sent later on over
the then *updated* association's transport that didn't initialize
its shared key due to peer.auth_capable=0. Since control chunks
in that case are not sent by the temporary association which
are scheduled for deletion, they are issued for xmit via
SCTP_CMD_REPLY in the interpreter with the context of the
*updated* association. peer.auth_capable was 0 in the updated
association (which went from COOKIE_WAIT into ESTABLISHED state),
since all previous processing that performed sctp_process_init()
was being done on temporary associations, that we eventually
throw away each time.
The correct fix is to update to the new peer.auth_capable
value as well in the collision case via sctp_assoc_update(),
so that in case the collision migrated from 0 -> 1,
sctp_auth_asoc_init_active_key() can properly recalculate
the secret. This therefore fixes the observed server panic.
Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/sctp/associola.c | 1 +
1 file changed, 1 insertion(+)
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1151,6 +1151,7 @@ void sctp_assoc_update(struct sctp_assoc
asoc->c = new->c;
asoc->peer.rwnd = new->peer.rwnd;
asoc->peer.sack_needed = new->peer.sack_needed;
+ asoc->peer.auth_capable = new->peer.auth_capable;
asoc->peer.i = new->peer.i;
sctp_tsnmap_init(&asoc->peer.tsn_map, SCTP_TSN_MAP_INITIAL,
asoc->peer.i.initial_tsn, GFP_ATOMIC);
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 12/39] macvlan: Initialize vlan_features to turn on offload support.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (10 preceding siblings ...)
2014-08-08 21:34 ` [PATCH 3.14 11/39] net: sctp: inherit auth_capable on INIT collisions Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 13/39] net: Correctly set segment mac_len in skb_segment() Greg Kroah-Hartman
` (28 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Vlad Yasevich, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vlad Yasevich <vyasevic@redhat.com>
[ Upstream commit 081e83a78db9b0ae1f5eabc2dedecc865f509b98 ]
Macvlan devices do not initialize vlan_features. As a result,
any vlan devices configured on top of macvlans perform very poorly.
Initialize vlan_features based on the vlan features of the lower-level
device.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/net/macvlan.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -548,6 +548,7 @@ static int macvlan_init(struct net_devic
(lowerdev->state & MACVLAN_STATE_MASK);
dev->features = lowerdev->features & MACVLAN_FEATURES;
dev->features |= ALWAYS_ON_FEATURES;
+ dev->vlan_features = lowerdev->vlan_features & MACVLAN_FEATURES;
dev->gso_max_size = lowerdev->gso_max_size;
dev->iflink = lowerdev->ifindex;
dev->hard_header_len = lowerdev->hard_header_len;
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 13/39] net: Correctly set segment mac_len in skb_segment().
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (11 preceding siblings ...)
2014-08-08 21:34 ` [PATCH 3.14 12/39] macvlan: Initialize vlan_features to turn on offload support Greg Kroah-Hartman
@ 2014-08-08 21:34 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 14/39] iovec: make sure the caller actually wants anything in memcpy_fromiovecend Greg Kroah-Hartman
` (27 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:34 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Vlad Yasevich,
David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vlad Yasevich <vyasevic@redhat.com>
[ Upstream commit fcdfe3a7fa4cb74391d42b6a26dc07c20dab1d82 ]
When performing segmentation, the mac_len value is copied right
out of the original skb. However, this value is not always set correctly
(like when the packet is VLAN-tagged) and we'll end up copying a bad
value.
One way to demonstrate this is to configure a VM which tags
packets internally and turn off VLAN acceleration on the forwarding
bridge port. The packets show up corrupt like this:
16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q
(0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0,
0x0000: 8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...|...d@.....d.
0x0010: 0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0..
0x0020: 29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3
0x0030: 000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp
0x0040: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
0x0050: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
0x0060: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
...
This also leads to awful throughput as GSO packets are dropped and
cause retransmissions.
The solution is to set the mac_len using the values already available
in then new skb. We've already adjusted all of the header offset, so we
might as well correctly figure out the mac_len using skb_reset_mac_len().
After this change, packets are segmented correctly and performance
is restored.
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/core/skbuff.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2968,9 +2968,9 @@ struct sk_buff *skb_segment(struct sk_bu
tail = nskb;
__copy_skb_header(nskb, head_skb);
- nskb->mac_len = head_skb->mac_len;
skb_headers_offset_update(nskb, skb_headroom(nskb) - headroom);
+ skb_reset_mac_len(nskb);
skb_copy_from_linear_data_offset(head_skb, -tnl_hlen,
nskb->data - tnl_hlen,
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 14/39] iovec: make sure the caller actually wants anything in memcpy_fromiovecend
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (12 preceding siblings ...)
2014-08-08 21:34 ` [PATCH 3.14 13/39] net: Correctly set segment mac_len in skb_segment() Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 15/39] batman-adv: Fix out-of-order fragmentation support Greg Kroah-Hartman
` (26 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Sasha Levin, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 06ebb06d49486676272a3c030bfeef4bd969a8e6 ]
Check for cases when the caller requests 0 bytes instead of running off
and dereferencing potentially invalid iovecs.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/core/iovec.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/net/core/iovec.c
+++ b/net/core/iovec.c
@@ -107,6 +107,10 @@ EXPORT_SYMBOL(memcpy_toiovecend);
int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
int offset, int len)
{
+ /* No data? Done! */
+ if (len == 0)
+ return 0;
+
/* Skip over the finished iovecs */
while (offset >= iov->iov_len) {
offset -= iov->iov_len;
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 15/39] batman-adv: Fix out-of-order fragmentation support
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (13 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 14/39] iovec: make sure the caller actually wants anything in memcpy_fromiovecend Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 16/39] sctp: fix possible seqlock seadlock in sctp_packet_transmit() Greg Kroah-Hartman
` (25 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Sven Eckelmann, Marek Lindner,
Antonio Quartulli
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sven Eckelmann <sven@narfation.org>
[ Upstream commit d9124268d84a836f14a6ead54ff9d8eee4c43be5 ]
batadv_frag_insert_packet was unable to handle out-of-order packets because it
dropped them directly. This is caused by the way the fragmentation lists is
checked for the correct place to insert a fragmentation entry.
The fragmentation code keeps the fragments in lists. The fragmentation entries
are kept in descending order of sequence number. The list is traversed and each
entry is compared with the new fragment. If the current entry has a smaller
sequence number than the new fragment then the new one has to be inserted
before the current entry. This ensures that the list is still in descending
order.
An out-of-order packet with a smaller sequence number than all entries in the
list still has to be added to the end of the list. The used hlist has no
information about the last entry in the list inside hlist_head and thus the
last entry has to be calculated differently. Currently the code assumes that
the iterator variable of hlist_for_each_entry can be used for this purpose
after the hlist_for_each_entry finished. This is obviously wrong because the
iterator variable is always NULL when the list was completely traversed.
Instead the information about the last entry has to be stored in a different
variable.
This problem was introduced in 610bfc6bc99bc83680d190ebc69359a05fc7f605
("batman-adv: Receive fragmented packets and merge").
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/batman-adv/fragmentation.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
--- a/net/batman-adv/fragmentation.c
+++ b/net/batman-adv/fragmentation.c
@@ -128,6 +128,7 @@ static bool batadv_frag_insert_packet(st
{
struct batadv_frag_table_entry *chain;
struct batadv_frag_list_entry *frag_entry_new = NULL, *frag_entry_curr;
+ struct batadv_frag_list_entry *frag_entry_last = NULL;
struct batadv_frag_packet *frag_packet;
uint8_t bucket;
uint16_t seqno, hdr_size = sizeof(struct batadv_frag_packet);
@@ -180,11 +181,14 @@ static bool batadv_frag_insert_packet(st
ret = true;
goto out;
}
+
+ /* store current entry because it could be the last in list */
+ frag_entry_last = frag_entry_curr;
}
- /* Reached the end of the list, so insert after 'frag_entry_curr'. */
- if (likely(frag_entry_curr)) {
- hlist_add_after(&frag_entry_curr->list, &frag_entry_new->list);
+ /* Reached the end of the list, so insert after 'frag_entry_last'. */
+ if (likely(frag_entry_last)) {
+ hlist_add_after(&frag_entry_last->list, &frag_entry_new->list);
chain->size += skb->len - hdr_size;
chain->timestamp = jiffies;
ret = true;
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 16/39] sctp: fix possible seqlock seadlock in sctp_packet_transmit()
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (14 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 15/39] batman-adv: Fix out-of-order fragmentation support Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 17/39] sparc64: Fix argument sign extension for compat_sys_futex() Greg Kroah-Hartman
` (24 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Hannes Frederic Sowa,
Dave Jones, Neil Horman, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit 757efd32d5ce31f67193cc0e6a56e4dffcc42fb1 ]
Dave reported following splat, caused by improper use of
IP_INC_STATS_BH() in process context.
BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551
caller is __this_cpu_preempt_check+0x13/0x20
CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33
ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207
0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40
ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3
Call Trace:
[<ffffffff9e7ee207>] dump_stack+0x4e/0x7a
[<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100
[<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20
[<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp]
[<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp]
[<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0
[<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
[<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp]
[<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp]
[<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp]
[<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100
[<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp]
[<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp]
[<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp]
[<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160
[<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30
[<ffffffff9e73b624>] inet_sendmsg+0x104/0x220
[<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220
[<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0
[<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0
[<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0
[<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0
[<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330
[<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10
[<ffffffff9e7f9be4>] tracesys+0xdd/0xe2
This is a followup of commits f1d8cba61c3c4b ("inet: fix possible
seqlock deadlocks") and 7f88c6b23afbd315 ("ipv6: fix possible seqlock
deadlock in ip6_finish_output2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/sctp/output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -599,7 +599,7 @@ out:
return err;
no_route:
kfree_skb(nskb);
- IP_INC_STATS_BH(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);
+ IP_INC_STATS(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);
/* FIXME: Returning the 'err' will effect all the associations
* associated with a socket, although only one of the paths of the
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 17/39] sparc64: Fix argument sign extension for compat_sys_futex().
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (15 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 16/39] sctp: fix possible seqlock seadlock in sctp_packet_transmit() Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 18/39] sparc64: Make itc_sync_lock raw Greg Kroah-Hartman
` (23 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit aa3449ee9c87d9b7660dd1493248abcc57769e31 ]
Only the second argument, 'op', is signed.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/kernel/sys32.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/sparc/kernel/sys32.S
+++ b/arch/sparc/kernel/sys32.S
@@ -44,7 +44,7 @@ SIGN1(sys32_timer_settime, compat_sys_ti
SIGN1(sys32_io_submit, compat_sys_io_submit, %o1)
SIGN1(sys32_mq_open, compat_sys_mq_open, %o1)
SIGN1(sys32_select, compat_sys_select, %o0)
-SIGN3(sys32_futex, compat_sys_futex, %o1, %o2, %o5)
+SIGN1(sys32_futex, compat_sys_futex, %o1)
SIGN1(sys32_recvfrom, compat_sys_recvfrom, %o0)
SIGN1(sys32_recvmsg, compat_sys_recvmsg, %o0)
SIGN1(sys32_sendmsg, compat_sys_sendmsg, %o0)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 18/39] sparc64: Make itc_sync_lock raw
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (16 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 17/39] sparc64: Fix argument sign extension for compat_sys_futex() Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 19/39] sparc64: Fix executable bit testing in set_pmd_at() paths Greg Kroah-Hartman
` (22 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Kirill Tkhai, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kirill Tkhai <tkhai@yandex.ru>
[ Upstream commit 49b6c01f4c1de3b5e5427ac5aba80f9f6d27837a ]
One more place where we must not be able
to be preempted or to be interrupted in RT.
Always actually disable interrupts during
synchronization cycle.
Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/kernel/smp_64.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -151,7 +151,7 @@ void cpu_panic(void)
#define NUM_ROUNDS 64 /* magic value */
#define NUM_ITERS 5 /* likewise */
-static DEFINE_SPINLOCK(itc_sync_lock);
+static DEFINE_RAW_SPINLOCK(itc_sync_lock);
static unsigned long go[SLAVE + 1];
#define DEBUG_TICK_SYNC 0
@@ -259,7 +259,7 @@ static void smp_synchronize_one_tick(int
go[MASTER] = 0;
membar_safe("#StoreLoad");
- spin_lock_irqsave(&itc_sync_lock, flags);
+ raw_spin_lock_irqsave(&itc_sync_lock, flags);
{
for (i = 0; i < NUM_ROUNDS*NUM_ITERS; i++) {
while (!go[MASTER])
@@ -270,7 +270,7 @@ static void smp_synchronize_one_tick(int
membar_safe("#StoreLoad");
}
}
- spin_unlock_irqrestore(&itc_sync_lock, flags);
+ raw_spin_unlock_irqrestore(&itc_sync_lock, flags);
}
#if defined(CONFIG_SUN_LDOMS) && defined(CONFIG_HOTPLUG_CPU)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 19/39] sparc64: Fix executable bit testing in set_pmd_at() paths.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (17 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 18/39] sparc64: Make itc_sync_lock raw Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 20/39] sparc64: Fix huge PMD invalidation Greg Kroah-Hartman
` (21 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit 5b1e94fa439a3227beefad58c28c17f68287a8e9 ]
This code was mistakenly using the exec bit from the PMD in all
cases, even when the PMD isn't a huge PMD.
If it's not a huge PMD, test the exec bit in the individual ptes down
in tlb_batch_pmd_scan().
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/mm/tlb.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -134,7 +134,7 @@ no_cache_flush:
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static void tlb_batch_pmd_scan(struct mm_struct *mm, unsigned long vaddr,
- pmd_t pmd, bool exec)
+ pmd_t pmd)
{
unsigned long end;
pte_t *pte;
@@ -142,8 +142,11 @@ static void tlb_batch_pmd_scan(struct mm
pte = pte_offset_map(&pmd, vaddr);
end = vaddr + HPAGE_SIZE;
while (vaddr < end) {
- if (pte_val(*pte) & _PAGE_VALID)
+ if (pte_val(*pte) & _PAGE_VALID) {
+ bool exec = pte_exec(*pte);
+
tlb_batch_add_one(mm, vaddr, exec);
+ }
pte++;
vaddr += PAGE_SIZE;
}
@@ -177,15 +180,15 @@ void set_pmd_at(struct mm_struct *mm, un
}
if (!pmd_none(orig)) {
- pte_t orig_pte = __pte(pmd_val(orig));
- bool exec = pte_exec(orig_pte);
-
addr &= HPAGE_MASK;
if (pmd_trans_huge(orig)) {
+ pte_t orig_pte = __pte(pmd_val(orig));
+ bool exec = pte_exec(orig_pte);
+
tlb_batch_add_one(mm, addr, exec);
tlb_batch_add_one(mm, addr + REAL_HPAGE_SIZE, exec);
} else {
- tlb_batch_pmd_scan(mm, addr, orig, exec);
+ tlb_batch_pmd_scan(mm, addr, orig);
}
}
}
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 20/39] sparc64: Fix huge PMD invalidation.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (18 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 19/39] sparc64: Fix executable bit testing in set_pmd_at() paths Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 21/39] sparc64: Fix bugs in get_user_pages_fast() wrt. THP Greg Kroah-Hartman
` (20 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit 51e5ef1bb7ab0e5fa7de4e802da5ab22fe35f0bf ]
On sparc64 "present" and "valid" are seperate PTE bits, this allows us to
naturally distinguish between the user explicitly asking for PROT_NONE
with mprotect() and other situations.
However we weren't handling this properly in the huge PMD paths.
First of all, the page table walker in the TSB miss path only checks
for _PAGE_PMD_HUGE. So the generic pmdp_invalidate() would clear
_PAGE_PRESENT but the TLB miss paths would still load it into the TLB
as a valid huge PMD.
Fix this by clearing the valid bit in pmdp_invalidate(), and also
checking the valid bit in USER_PGTABLE_CHECK_PMD_HUGE using "brgez"
since _PAGE_VALID is bit 63 in both the sun4u and sun4v pte layouts.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/include/asm/pgtable_64.h | 18 ++++--------------
arch/sparc/include/asm/tsb.h | 3 ++-
arch/sparc/mm/tlb.c | 11 +++++++++++
3 files changed, 17 insertions(+), 15 deletions(-)
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -719,20 +719,6 @@ static inline pmd_t pmd_mkwrite(pmd_t pm
return __pmd(pte_val(pte));
}
-static inline pmd_t pmd_mknotpresent(pmd_t pmd)
-{
- unsigned long mask;
-
- if (tlb_type == hypervisor)
- mask = _PAGE_PRESENT_4V;
- else
- mask = _PAGE_PRESENT_4U;
-
- pmd_val(pmd) &= ~mask;
-
- return pmd;
-}
-
static inline pmd_t pmd_mksplitting(pmd_t pmd)
{
pte_t pte = __pte(pmd_val(pmd));
@@ -893,6 +879,10 @@ extern void update_mmu_cache(struct vm_a
extern void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd);
+#define __HAVE_ARCH_PMDP_INVALIDATE
+extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp);
+
#define __HAVE_ARCH_PGTABLE_DEPOSIT
extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pgtable);
--- a/arch/sparc/include/asm/tsb.h
+++ b/arch/sparc/include/asm/tsb.h
@@ -171,7 +171,8 @@ extern struct tsb_phys_patch_entry __tsb
andcc REG1, REG2, %g0; \
be,pt %xcc, 700f; \
sethi %hi(4 * 1024 * 1024), REG2; \
- andn REG1, REG2, REG1; \
+ brgez,pn REG1, FAIL_LABEL; \
+ andn REG1, REG2, REG1; \
and VADDR, REG2, REG2; \
brlz,pt REG1, PTE_LABEL; \
or REG1, REG2, REG1; \
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -193,6 +193,17 @@ void set_pmd_at(struct mm_struct *mm, un
}
}
+void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp)
+{
+ pmd_t entry = *pmdp;
+
+ pmd_val(entry) &= ~_PAGE_VALID;
+
+ set_pmd_at(vma->vm_mm, address, pmdp, entry);
+ flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+}
+
void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pgtable)
{
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 21/39] sparc64: Fix bugs in get_user_pages_fast() wrt. THP.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (19 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 20/39] sparc64: Fix huge PMD invalidation Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 22/39] sparc64: Fix hex values in comment above pte_modify() Greg Kroah-Hartman
` (19 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit 04df419de34104d8818b8c5cffaa062fa36d20ea ]
The large PMD path needs to check _PAGE_VALID not _PAGE_PRESENT, to
decide if it needs to bail and return 0.
pmd_large() should therefore just check _PAGE_PMD_HUGE.
Calls to gup_huge_pmd() are guarded with a check of pmd_large(), so we
just need to add a valid bit check.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/include/asm/pgtable_64.h | 2 +-
arch/sparc/mm/gup.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -633,7 +633,7 @@ static inline unsigned long pmd_large(pm
{
pte_t pte = __pte(pmd_val(pmd));
- return (pte_val(pte) & _PAGE_PMD_HUGE) && pte_present(pte);
+ return pte_val(pte) & _PAGE_PMD_HUGE;
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -73,7 +73,7 @@ static int gup_huge_pmd(pmd_t *pmdp, pmd
struct page *head, *page, *tail;
int refs;
- if (!pmd_large(pmd))
+ if (!(pmd_val(pmd) & _PAGE_VALID))
return 0;
if (write && !pmd_write(pmd))
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 22/39] sparc64: Fix hex values in comment above pte_modify().
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (20 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 21/39] sparc64: Fix bugs in get_user_pages_fast() wrt. THP Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 23/39] sparc64: Dont use _PAGE_PRESENT in pte_modify() mask Greg Kroah-Hartman
` (18 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit c2e4e676adb40ea764af79d3e08be954e14a0f4c ]
When _PAGE_SPECIAL and _PAGE_PMD_HUGE were added to the mask, the
comment was not updated.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/include/asm/pgtable_64.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -258,8 +258,8 @@ static inline pte_t pte_modify(pte_t pte
{
unsigned long mask, tmp;
- /* SUN4U: 0x600307ffffffecb8 (negated == 0x9ffcf80000001347)
- * SUN4V: 0x30ffffffffffee17 (negated == 0xcf000000000011e8)
+ /* SUN4U: 0x630107ffffffecb8 (negated == 0x9cfef80000001347)
+ * SUN4V: 0x33ffffffffffee17 (negated == 0xcc000000000011e8)
*
* Even if we use negation tricks the result is still a 6
* instruction sequence, so don't try to play fancy and just
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 23/39] sparc64: Dont use _PAGE_PRESENT in pte_modify() mask.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (21 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 22/39] sparc64: Fix hex values in comment above pte_modify() Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 24/39] sparc64: Handle 32-bit tasks properly in compute_effective_address() Greg Kroah-Hartman
` (17 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit eaf85da82669b057f20c4e438dc2566b51a83af6 ]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/include/asm/pgtable_64.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -258,8 +258,8 @@ static inline pte_t pte_modify(pte_t pte
{
unsigned long mask, tmp;
- /* SUN4U: 0x630107ffffffecb8 (negated == 0x9cfef80000001347)
- * SUN4V: 0x33ffffffffffee17 (negated == 0xcc000000000011e8)
+ /* SUN4U: 0x630107ffffffec38 (negated == 0x9cfef800000013c7)
+ * SUN4V: 0x33ffffffffffee07 (negated == 0xcc000000000011f8)
*
* Even if we use negation tricks the result is still a 6
* instruction sequence, so don't try to play fancy and just
@@ -289,10 +289,10 @@ static inline pte_t pte_modify(pte_t pte
" .previous\n"
: "=r" (mask), "=r" (tmp)
: "i" (_PAGE_PADDR_4U | _PAGE_MODIFIED_4U | _PAGE_ACCESSED_4U |
- _PAGE_CP_4U | _PAGE_CV_4U | _PAGE_E_4U | _PAGE_PRESENT_4U |
+ _PAGE_CP_4U | _PAGE_CV_4U | _PAGE_E_4U |
_PAGE_SPECIAL | _PAGE_PMD_HUGE | _PAGE_SZALL_4U),
"i" (_PAGE_PADDR_4V | _PAGE_MODIFIED_4V | _PAGE_ACCESSED_4V |
- _PAGE_CP_4V | _PAGE_CV_4V | _PAGE_E_4V | _PAGE_PRESENT_4V |
+ _PAGE_CP_4V | _PAGE_CV_4V | _PAGE_E_4V |
_PAGE_SPECIAL | _PAGE_PMD_HUGE | _PAGE_SZALL_4V));
return __pte((pte_val(pte) & mask) | (pgprot_val(prot) & ~mask));
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 24/39] sparc64: Handle 32-bit tasks properly in compute_effective_address().
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (22 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 23/39] sparc64: Dont use _PAGE_PRESENT in pte_modify() mask Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 25/39] sparc64: Fix top-level fault handling bugs Greg Kroah-Hartman
` (16 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit d037d16372bbe4d580342bebbb8826821ad9edf0 ]
If we have a 32-bit task we must chop off the top 32-bits of the
64-bit value just as the cpu would.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/kernel/unaligned_64.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
--- a/arch/sparc/kernel/unaligned_64.c
+++ b/arch/sparc/kernel/unaligned_64.c
@@ -166,17 +166,23 @@ static unsigned long *fetch_reg_addr(uns
unsigned long compute_effective_address(struct pt_regs *regs,
unsigned int insn, unsigned int rd)
{
+ int from_kernel = (regs->tstate & TSTATE_PRIV) != 0;
unsigned int rs1 = (insn >> 14) & 0x1f;
unsigned int rs2 = insn & 0x1f;
- int from_kernel = (regs->tstate & TSTATE_PRIV) != 0;
+ unsigned long addr;
if (insn & 0x2000) {
maybe_flush_windows(rs1, 0, rd, from_kernel);
- return (fetch_reg(rs1, regs) + sign_extend_imm13(insn));
+ addr = (fetch_reg(rs1, regs) + sign_extend_imm13(insn));
} else {
maybe_flush_windows(rs1, rs2, rd, from_kernel);
- return (fetch_reg(rs1, regs) + fetch_reg(rs2, regs));
+ addr = (fetch_reg(rs1, regs) + fetch_reg(rs2, regs));
}
+
+ if (!from_kernel && test_thread_flag(TIF_32BIT))
+ addr &= 0xffffffff;
+
+ return addr;
}
/* This is just to make gcc think die_if_kernel does return... */
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 25/39] sparc64: Fix top-level fault handling bugs.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (23 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 24/39] sparc64: Handle 32-bit tasks properly in compute_effective_address() Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 26/39] sparc64: Fix range check in kern_addr_valid() Greg Kroah-Hartman
` (15 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit 70ffc6ebaead783ac8dafb1e87df0039bb043596 ]
Make get_user_insn() able to cope with huge PMDs.
Next, make do_fault_siginfo() more robust when get_user_insn() can't
actually fetch the instruction. In particular, use the MMU announced
fault address when that happens, instead of calling
compute_effective_address() and computing garbage.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/mm/fault_64.c | 84 +++++++++++++++++++++++++++++------------------
1 file changed, 53 insertions(+), 31 deletions(-)
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -96,38 +96,51 @@ static unsigned int get_user_insn(unsign
pte_t *ptep, pte;
unsigned long pa;
u32 insn = 0;
- unsigned long pstate;
- if (pgd_none(*pgdp))
- goto outret;
+ if (pgd_none(*pgdp) || unlikely(pgd_bad(*pgdp)))
+ goto out;
pudp = pud_offset(pgdp, tpc);
- if (pud_none(*pudp))
- goto outret;
- pmdp = pmd_offset(pudp, tpc);
- if (pmd_none(*pmdp))
- goto outret;
-
- /* This disables preemption for us as well. */
- __asm__ __volatile__("rdpr %%pstate, %0" : "=r" (pstate));
- __asm__ __volatile__("wrpr %0, %1, %%pstate"
- : : "r" (pstate), "i" (PSTATE_IE));
- ptep = pte_offset_map(pmdp, tpc);
- pte = *ptep;
- if (!pte_present(pte))
+ if (pud_none(*pudp) || unlikely(pud_bad(*pudp)))
goto out;
- pa = (pte_pfn(pte) << PAGE_SHIFT);
- pa += (tpc & ~PAGE_MASK);
+ /* This disables preemption for us as well. */
+ local_irq_disable();
- /* Use phys bypass so we don't pollute dtlb/dcache. */
- __asm__ __volatile__("lduwa [%1] %2, %0"
- : "=r" (insn)
- : "r" (pa), "i" (ASI_PHYS_USE_EC));
+ pmdp = pmd_offset(pudp, tpc);
+ if (pmd_none(*pmdp) || unlikely(pmd_bad(*pmdp)))
+ goto out_irq_enable;
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ if (pmd_trans_huge(*pmdp)) {
+ if (pmd_trans_splitting(*pmdp))
+ goto out_irq_enable;
+
+ pa = pmd_pfn(*pmdp) << PAGE_SHIFT;
+ pa += tpc & ~HPAGE_MASK;
+
+ /* Use phys bypass so we don't pollute dtlb/dcache. */
+ __asm__ __volatile__("lduwa [%1] %2, %0"
+ : "=r" (insn)
+ : "r" (pa), "i" (ASI_PHYS_USE_EC));
+ } else
+#endif
+ {
+ ptep = pte_offset_map(pmdp, tpc);
+ pte = *ptep;
+ if (pte_present(pte)) {
+ pa = (pte_pfn(pte) << PAGE_SHIFT);
+ pa += (tpc & ~PAGE_MASK);
+
+ /* Use phys bypass so we don't pollute dtlb/dcache. */
+ __asm__ __volatile__("lduwa [%1] %2, %0"
+ : "=r" (insn)
+ : "r" (pa), "i" (ASI_PHYS_USE_EC));
+ }
+ pte_unmap(ptep);
+ }
+out_irq_enable:
+ local_irq_enable();
out:
- pte_unmap(ptep);
- __asm__ __volatile__("wrpr %0, 0x0, %%pstate" : : "r" (pstate));
-outret:
return insn;
}
@@ -153,7 +166,8 @@ show_signal_msg(struct pt_regs *regs, in
}
static void do_fault_siginfo(int code, int sig, struct pt_regs *regs,
- unsigned int insn, int fault_code)
+ unsigned long fault_addr, unsigned int insn,
+ int fault_code)
{
unsigned long addr;
siginfo_t info;
@@ -161,10 +175,18 @@ static void do_fault_siginfo(int code, i
info.si_code = code;
info.si_signo = sig;
info.si_errno = 0;
- if (fault_code & FAULT_CODE_ITLB)
+ if (fault_code & FAULT_CODE_ITLB) {
addr = regs->tpc;
- else
- addr = compute_effective_address(regs, insn, 0);
+ } else {
+ /* If we were able to probe the faulting instruction, use it
+ * to compute a precise fault address. Otherwise use the fault
+ * time provided address which may only have page granularity.
+ */
+ if (insn)
+ addr = compute_effective_address(regs, insn, 0);
+ else
+ addr = fault_addr;
+ }
info.si_addr = (void __user *) addr;
info.si_trapno = 0;
@@ -239,7 +261,7 @@ static void __kprobes do_kernel_fault(st
/* The si_code was set to make clear whether
* this was a SEGV_MAPERR or SEGV_ACCERR fault.
*/
- do_fault_siginfo(si_code, SIGSEGV, regs, insn, fault_code);
+ do_fault_siginfo(si_code, SIGSEGV, regs, address, insn, fault_code);
return;
}
@@ -525,7 +547,7 @@ do_sigbus:
* Send a sigbus, regardless of whether we were in kernel
* or user mode.
*/
- do_fault_siginfo(BUS_ADRERR, SIGBUS, regs, insn, fault_code);
+ do_fault_siginfo(BUS_ADRERR, SIGBUS, regs, address, insn, fault_code);
/* Kernel mode? Handle exceptions or die */
if (regs->tstate & TSTATE_PRIV)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 26/39] sparc64: Fix range check in kern_addr_valid().
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (24 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 25/39] sparc64: Fix top-level fault handling bugs Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 27/39] sparc64: Use ILOG2_4MB instead of constant 22 Greg Kroah-Hartman
` (14 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit ee73887e92a69ae0a5cda21c68ea75a27804c944 ]
In commit b2d438348024b75a1ee8b66b85d77f569a5dfed8 ("sparc64: Make
PAGE_OFFSET variable."), the MAX_PHYS_ADDRESS_BITS value was increased
(to 47).
This constant reference to '41UL' was missed.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/include/asm/pgtable_64.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -916,7 +916,7 @@ static inline bool kern_addr_valid(unsig
{
unsigned long paddr = __pa(addr);
- if ((paddr >> 41UL) != 0UL)
+ if ((paddr >> MAX_PHYS_ADDRESS_BITS) != 0UL)
return false;
return test_bit(paddr >> 22, sparc64_valid_addr_bitmap);
}
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 27/39] sparc64: Use ILOG2_4MB instead of constant 22.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (25 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 26/39] sparc64: Fix range check in kern_addr_valid() Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 28/39] sparc64: Add basic validations to {pud,pmd}_bad() Greg Kroah-Hartman
` (13 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit 0eef331a3d0ee970dcbebd1bd5fcb57ca33ece01 ]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/include/asm/pgtable_64.h | 2 +-
arch/sparc/kernel/head_64.S | 4 ++--
arch/sparc/kernel/ktlb.S | 2 +-
arch/sparc/mm/init_64.c | 12 ++++++------
4 files changed, 10 insertions(+), 10 deletions(-)
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -918,7 +918,7 @@ static inline bool kern_addr_valid(unsig
if ((paddr >> MAX_PHYS_ADDRESS_BITS) != 0UL)
return false;
- return test_bit(paddr >> 22, sparc64_valid_addr_bitmap);
+ return test_bit(paddr >> ILOG2_4MB, sparc64_valid_addr_bitmap);
}
extern int page_in_phys_avail(unsigned long paddr);
--- a/arch/sparc/kernel/head_64.S
+++ b/arch/sparc/kernel/head_64.S
@@ -282,8 +282,8 @@ sun4v_chip_type:
stx %l2, [%l4 + 0x0]
ldx [%sp + 2047 + 128 + 0x50], %l3 ! physaddr low
/* 4MB align */
- srlx %l3, 22, %l3
- sllx %l3, 22, %l3
+ srlx %l3, ILOG2_4MB, %l3
+ sllx %l3, ILOG2_4MB, %l3
stx %l3, [%l4 + 0x8]
/* Leave service as-is, "call-method" */
--- a/arch/sparc/kernel/ktlb.S
+++ b/arch/sparc/kernel/ktlb.S
@@ -277,7 +277,7 @@ kvmap_dtlb_load:
#ifdef CONFIG_SPARSEMEM_VMEMMAP
kvmap_vmemmap:
sub %g4, %g5, %g5
- srlx %g5, 22, %g5
+ srlx %g5, ILOG2_4MB, %g5
sethi %hi(vmemmap_table), %g1
sllx %g5, 3, %g5
or %g1, %lo(vmemmap_table), %g1
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -588,7 +588,7 @@ static void __init remap_kernel(void)
int i, tlb_ent = sparc64_highest_locked_tlbent();
tte_vaddr = (unsigned long) KERNBASE;
- phys_page = (prom_boot_mapping_phys_low >> 22UL) << 22UL;
+ phys_page = (prom_boot_mapping_phys_low >> ILOG2_4MB) << ILOG2_4MB;
tte_data = kern_large_tte(phys_page);
kern_locked_tte_data = tte_data;
@@ -1881,7 +1881,7 @@ void __init paging_init(void)
BUILD_BUG_ON(NR_CPUS > 4096);
- kern_base = (prom_boot_mapping_phys_low >> 22UL) << 22UL;
+ kern_base = (prom_boot_mapping_phys_low >> ILOG2_4MB) << ILOG2_4MB;
kern_size = (unsigned long)&_end - (unsigned long)KERNBASE;
/* Invalidate both kernel TSBs. */
@@ -1937,7 +1937,7 @@ void __init paging_init(void)
shift = kern_base + PAGE_OFFSET - ((unsigned long)KERNBASE);
real_end = (unsigned long)_end;
- num_kernel_image_mappings = DIV_ROUND_UP(real_end - KERNBASE, 1 << 22);
+ num_kernel_image_mappings = DIV_ROUND_UP(real_end - KERNBASE, 1 << ILOG2_4MB);
printk("Kernel: Using %d locked TLB entries for main kernel image.\n",
num_kernel_image_mappings);
@@ -2094,7 +2094,7 @@ static void __init setup_valid_addr_bitm
if (new_start <= old_start &&
new_end >= (old_start + PAGE_SIZE)) {
- set_bit(old_start >> 22, bitmap);
+ set_bit(old_start >> ILOG2_4MB, bitmap);
goto do_next_page;
}
}
@@ -2143,7 +2143,7 @@ void __init mem_init(void)
addr = PAGE_OFFSET + kern_base;
last = PAGE_ALIGN(kern_size) + addr;
while (addr < last) {
- set_bit(__pa(addr) >> 22, sparc64_valid_addr_bitmap);
+ set_bit(__pa(addr) >> ILOG2_4MB, sparc64_valid_addr_bitmap);
addr += PAGE_SIZE;
}
@@ -2267,7 +2267,7 @@ int __meminit vmemmap_populate(unsigned
void *block;
if (!(*vmem_pp & _PAGE_VALID)) {
- block = vmemmap_alloc_block(1UL << 22, node);
+ block = vmemmap_alloc_block(1UL << ILOG2_4MB, node);
if (!block)
return -ENOMEM;
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 28/39] sparc64: Add basic validations to {pud,pmd}_bad().
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (26 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 27/39] sparc64: Use ILOG2_4MB instead of constant 22 Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 29/39] sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR() Greg Kroah-Hartman
` (12 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit 26cf432551d749e7d581db33529507a711c6eaab ]
Instead of returning false we should at least check the most basic
things, otherwise page table corruptions will be very difficult to
debug.
PMD and PTE tables are of size PAGE_SIZE, so none of the sub-PAGE_SIZE
bits should be set.
We also complement this with a check that the physical address the
pud/pmd points to is valid memory.
PowerPC was used as a guide while implementating this.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/include/asm/pgtable_64.h | 46 ++++++++++++++++++++++++------------
1 file changed, 31 insertions(+), 15 deletions(-)
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -71,6 +71,23 @@
#include <linux/sched.h>
+extern unsigned long sparc64_valid_addr_bitmap[];
+
+/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
+static inline bool __kern_addr_valid(unsigned long paddr)
+{
+ if ((paddr >> MAX_PHYS_ADDRESS_BITS) != 0UL)
+ return false;
+ return test_bit(paddr >> ILOG2_4MB, sparc64_valid_addr_bitmap);
+}
+
+static inline bool kern_addr_valid(unsigned long addr)
+{
+ unsigned long paddr = __pa(addr);
+
+ return __kern_addr_valid(paddr);
+}
+
/* Entries per page directory level. */
#define PTRS_PER_PTE (1UL << (PAGE_SHIFT-3))
#define PTRS_PER_PMD (1UL << PMD_BITS)
@@ -743,6 +760,20 @@ static inline int pmd_present(pmd_t pmd)
#define pmd_none(pmd) (!pmd_val(pmd))
+/* pmd_bad() is only called on non-trans-huge PMDs. Our encoding is
+ * very simple, it's just the physical address. PTE tables are of
+ * size PAGE_SIZE so make sure the sub-PAGE_SIZE bits are clear and
+ * the top bits outside of the range of any physical address size we
+ * support are clear as well. We also validate the physical itself.
+ */
+#define pmd_bad(pmd) ((pmd_val(pmd) & ~PAGE_MASK) || \
+ !__kern_addr_valid(pmd_val(pmd)))
+
+#define pud_none(pud) (!pud_val(pud))
+
+#define pud_bad(pud) ((pud_val(pud) & ~PAGE_MASK) || \
+ !__kern_addr_valid(pud_val(pud)))
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd);
@@ -776,10 +807,7 @@ static inline unsigned long __pmd_page(p
#define pud_page_vaddr(pud) \
((unsigned long) __va(pud_val(pud)))
#define pud_page(pud) virt_to_page((void *)pud_page_vaddr(pud))
-#define pmd_bad(pmd) (0)
#define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0UL)
-#define pud_none(pud) (!pud_val(pud))
-#define pud_bad(pud) (0)
#define pud_present(pud) (pud_val(pud) != 0U)
#define pud_clear(pudp) (pud_val(*(pudp)) = 0UL)
@@ -909,18 +937,6 @@ extern unsigned long pte_file(pte_t);
extern pte_t pgoff_to_pte(unsigned long);
#define PTE_FILE_MAX_BITS (64UL - PAGE_SHIFT - 1UL)
-extern unsigned long sparc64_valid_addr_bitmap[];
-
-/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
-static inline bool kern_addr_valid(unsigned long addr)
-{
- unsigned long paddr = __pa(addr);
-
- if ((paddr >> MAX_PHYS_ADDRESS_BITS) != 0UL)
- return false;
- return test_bit(paddr >> ILOG2_4MB, sparc64_valid_addr_bitmap);
-}
-
extern int page_in_phys_avail(unsigned long paddr);
/*
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 29/39] sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR().
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (27 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 28/39] sparc64: Add basic validations to {pud,pmd}_bad() Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 30/39] sparc64: Dont bark so loudly about 32-bit tasks generating 64-bit fault addresses Greg Kroah-Hartman
` (11 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit fe866433f843b080246ce729b5e6b27b5f5d9a58 ]
pte_ERROR() is not used anywhere, delete it.
For pgd_ERROR() and pmd_ERROR(), output something similar to x86, giving the address
of the pgd/pmd as well as it's value.
Also provide the caller, since these macros are invoked from pgd_clear_bad() and
pmd_clear_bad() which provides little context as to what high level operation was
occuring when the BAD state was detected.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/include/asm/pgtable_64.h | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -96,9 +96,12 @@ static inline bool kern_addr_valid(unsig
/* Kernel has a separate 44bit address space. */
#define FIRST_USER_ADDRESS 0
-#define pte_ERROR(e) __builtin_trap()
-#define pmd_ERROR(e) __builtin_trap()
-#define pgd_ERROR(e) __builtin_trap()
+#define pmd_ERROR(e) \
+ pr_err("%s:%d: bad pmd %p(%016lx) seen at (%pS)\n", \
+ __FILE__, __LINE__, &(e), pmd_val(e), __builtin_return_address(0))
+#define pgd_ERROR(e) \
+ pr_err("%s:%d: bad pgd %p(%016lx) seen at (%pS)\n", \
+ __FILE__, __LINE__, &(e), pgd_val(e), __builtin_return_address(0))
#endif /* !(__ASSEMBLY__) */
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 30/39] sparc64: Dont bark so loudly about 32-bit tasks generating 64-bit fault addresses.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (28 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 29/39] sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR() Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 31/39] sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus Greg Kroah-Hartman
` (10 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit e5c460f46ae7ee94831cb55cb980f942aa9e5a85 ]
This was found using Dave Jone's trinity tool.
When a user process which is 32-bit performs a load or a store, the
cpu chops off the top 32-bits of the effective address before
translating it.
This is because we run 32-bit tasks with the PSTATE_AM (address
masking) bit set.
We can't run the kernel with that bit set, so when the kernel accesses
userspace no address masking occurs.
Since a 32-bit process will have no mappings in that region we will
properly fault, so we don't try to handle this using access_ok(),
which can safely just be a NOP on sparc64.
Real faults from 32-bit processes should never generate such addresses
so a bug check was added long ago, and it barks in the logs if this
happens.
But it also barks when a kernel user access causes this condition, and
that _can_ happen. For example, if a pointer passed into a system call
is "0xfffffffc" and the kernel access 4 bytes offset from that pointer.
Just handle such faults normally via the exception entries.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/mm/fault_64.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -281,18 +281,6 @@ static void noinline __kprobes bogus_32b
show_regs(regs);
}
-static void noinline __kprobes bogus_32bit_fault_address(struct pt_regs *regs,
- unsigned long addr)
-{
- static int times;
-
- if (times++ < 10)
- printk(KERN_ERR "FAULT[%s:%d]: 32-bit process "
- "reports 64-bit fault address [%lx]\n",
- current->comm, current->pid, addr);
- show_regs(regs);
-}
-
asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
{
enum ctx_state prev_state = exception_enter();
@@ -322,10 +310,8 @@ asmlinkage void __kprobes do_sparc64_fau
goto intr_or_no_mm;
}
}
- if (unlikely((address >> 32) != 0)) {
- bogus_32bit_fault_address(regs, address);
+ if (unlikely((address >> 32) != 0))
goto intr_or_no_mm;
- }
}
if (regs->tstate & TSTATE_PRIV) {
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 31/39] sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (29 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 30/39] sparc64: Dont bark so loudly about 32-bit tasks generating 64-bit fault addresses Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 32/39] sparc64: Add membar to Niagara2 memcpy code Greg Kroah-Hartman
` (9 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit b18eb2d779240631a098626cb6841ee2dd34fda0 ]
Access to the TSB hash tables during TLB misses requires that there be
an atomic 128-bit quad load available so that we fetch a matching TAG
and DATA field at the same time.
On cpus prior to UltraSPARC-III only virtual address based quad loads
are available. UltraSPARC-III and later provide physical address
based variants which are easier to use.
When we only have virtual address based quad loads available this
means that we have to lock the TSB into the TLB at a fixed virtual
address on each cpu when it runs that process. We can't just access
the PAGE_OFFSET based aliased mapping of these TSBs because we cannot
take a recursive TLB miss inside of the TLB miss handler without
risking running out of hardware trap levels (some trap combinations
can be deep, such as those generated by register window spill and fill
traps).
Without huge pages it's working perfectly fine, but when the huge TSB
got added another chunk of fixed virtual address space was not
allocated for this second TSB mapping.
So we were mapping both the 8K and 4MB TSBs to the same exact virtual
address, causing multiple TLB matches which gives undefined behavior.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/include/asm/pgtable_64.h | 6 ++++--
arch/sparc/mm/tsb.c | 14 +++++++++++++-
2 files changed, 17 insertions(+), 3 deletions(-)
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -24,7 +24,8 @@
/* The kernel image occupies 0x4000000 to 0x6000000 (4MB --> 96MB).
* The page copy blockops can use 0x6000000 to 0x8000000.
- * The TSB is mapped in the 0x8000000 to 0xa000000 range.
+ * The 8K TSB is mapped in the 0x8000000 to 0x8400000 range.
+ * The 4M TSB is mapped in the 0x8400000 to 0x8800000 range.
* The PROM resides in an area spanning 0xf0000000 to 0x100000000.
* The vmalloc area spans 0x100000000 to 0x200000000.
* Since modules need to be in the lowest 32-bits of the address space,
@@ -33,7 +34,8 @@
* 0x400000000.
*/
#define TLBTEMP_BASE _AC(0x0000000006000000,UL)
-#define TSBMAP_BASE _AC(0x0000000008000000,UL)
+#define TSBMAP_8K_BASE _AC(0x0000000008000000,UL)
+#define TSBMAP_4M_BASE _AC(0x0000000008400000,UL)
#define MODULES_VADDR _AC(0x0000000010000000,UL)
#define MODULES_LEN _AC(0x00000000e0000000,UL)
#define MODULES_END _AC(0x00000000f0000000,UL)
--- a/arch/sparc/mm/tsb.c
+++ b/arch/sparc/mm/tsb.c
@@ -133,7 +133,19 @@ static void setup_tsb_params(struct mm_s
mm->context.tsb_block[tsb_idx].tsb_nentries =
tsb_bytes / sizeof(struct tsb);
- base = TSBMAP_BASE;
+ switch (tsb_idx) {
+ case MM_TSB_BASE:
+ base = TSBMAP_8K_BASE;
+ break;
+#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
+ case MM_TSB_HUGE:
+ base = TSBMAP_4M_BASE;
+ break;
+#endif
+ default:
+ BUG();
+ }
+
tte = pgprot_val(PAGE_KERNEL_LOCKED);
tsb_paddr = __pa(mm->context.tsb_block[tsb_idx].tsb);
BUG_ON(tsb_paddr & (tsb_bytes - 1UL));
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 32/39] sparc64: Add membar to Niagara2 memcpy code.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (30 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 31/39] sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 33/39] sparc64: Do not insert non-valid PTEs into the TSB hash table Greg Kroah-Hartman
` (8 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit 5aa4ecfd0ddb1e6dcd1c886e6c49677550f581aa ]
This is the prevent previous stores from overlapping the block stores
done by the memcpy loop.
Based upon a glibc patch by Jose E. Marchesi
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/lib/NG2memcpy.S | 1 +
1 file changed, 1 insertion(+)
--- a/arch/sparc/lib/NG2memcpy.S
+++ b/arch/sparc/lib/NG2memcpy.S
@@ -236,6 +236,7 @@ FUNC_NAME: /* %o0=dst, %o1=src, %o2=len
*/
VISEntryHalf
+ membar #Sync
alignaddr %o1, %g0, %g0
add %o1, (64 - 1), %o4
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 33/39] sparc64: Do not insert non-valid PTEs into the TSB hash table.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (31 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 32/39] sparc64: Add membar to Niagara2 memcpy code Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 34/39] sparc64: Guard against flushing openfirmware mappings Greg Kroah-Hartman
` (7 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit 18f38132528c3e603c66ea464727b29e9bbcb91b ]
The assumption was that update_mmu_cache() (and the equivalent for PMDs) would
only be called when the PTE being installed will be accessible by the user.
This is not true for code paths originating from remove_migration_pte().
There are dire consequences for placing a non-valid PTE into the TSB. The TLB
miss frramework assumes thatwhen a TSB entry matches we can just load it into
the TLB and return from the TLB miss trap.
So if a non-valid PTE is in there, we will deadlock taking the TLB miss over
and over, never satisfying the miss.
Just exit early from update_mmu_cache() and friends in this situation.
Based upon a report and patch from Christopher Alexander Tobias Schulze.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/mm/init_64.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -350,6 +350,10 @@ void update_mmu_cache(struct vm_area_str
mm = vma->vm_mm;
+ /* Don't insert a non-valid PTE into the TSB, we'll deadlock. */
+ if (!pte_accessible(mm, pte))
+ return;
+
spin_lock_irqsave(&mm->context.lock, flags);
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
@@ -2614,6 +2618,10 @@ void update_mmu_cache_pmd(struct vm_area
pte = pmd_val(entry);
+ /* Don't insert a non-valid PMD into the TSB, we'll deadlock. */
+ if (!(pte & _PAGE_VALID))
+ return;
+
/* We are fabricating 8MB pages using 4MB real hw pages. */
pte |= (addr & (1UL << REAL_HPAGE_SHIFT));
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 34/39] sparc64: Guard against flushing openfirmware mappings.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (32 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 33/39] sparc64: Do not insert non-valid PTEs into the TSB hash table Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 35/39] bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000 Greg Kroah-Hartman
` (6 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "David S. Miller" <davem@davemloft.net>
[ Upstream commit 4ca9a23765da3260058db3431faf5b4efd8cf926 ]
Based almost entirely upon a patch by Christopher Alexander Tobias
Schulze.
In commit db64fe02258f1507e13fe5212a989922323685ce ("mm: rewrite vmap
layer") lazy VMAP tlb flushing was added to the vmalloc layer. This
causes problems on sparc64.
Sparc64 has two VMAP mapped regions and they are not contiguous with
eachother. First we have the malloc mapping area, then another
unrelated region, then the vmalloc region.
This "another unrelated region" is where the firmware is mapped.
If the lazy TLB flushing logic in the vmalloc code triggers after
we've had both a module unload and a vfree or similar, it will pass an
address range that goes from somewhere inside the malloc region to
somewhere inside the vmalloc region, and thus covering the
openfirmware area entirely.
The sparc64 kernel learns about openfirmware's dynamic mappings in
this region early in the boot, and then services TLB misses in this
area. But openfirmware has some locked TLB entries which are not
mentioned in those dynamic mappings and we should thus not disturb
them.
These huge lazy TLB flush ranges causes those openfirmware locked TLB
entries to be removed, resulting in all kinds of problems including
hard hangs and crashes during reboot/reset.
Besides causing problems like this, such huge TLB flush ranges are
also incredibly inefficient. A plea has been made with the author of
the VMAP lazy TLB flushing code, but for now we'll put a safety guard
into our flush_tlb_kernel_range() implementation.
Since the implementation has become non-trivial, stop defining it as a
macro and instead make it a function in a C source file.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/include/asm/tlbflush_64.h | 12 ++----------
arch/sparc/mm/init_64.c | 23 +++++++++++++++++++++++
2 files changed, 25 insertions(+), 10 deletions(-)
--- a/arch/sparc/include/asm/tlbflush_64.h
+++ b/arch/sparc/include/asm/tlbflush_64.h
@@ -34,6 +34,8 @@ static inline void flush_tlb_range(struc
{
}
+void flush_tlb_kernel_range(unsigned long start, unsigned long end);
+
#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
extern void flush_tlb_pending(void);
@@ -48,11 +50,6 @@ extern void __flush_tlb_kernel_range(uns
#ifndef CONFIG_SMP
-#define flush_tlb_kernel_range(start,end) \
-do { flush_tsb_kernel_range(start,end); \
- __flush_tlb_kernel_range(start,end); \
-} while (0)
-
static inline void global_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr)
{
__flush_tlb_page(CTX_HWBITS(mm->context), vaddr);
@@ -63,11 +60,6 @@ static inline void global_flush_tlb_page
extern void smp_flush_tlb_kernel_range(unsigned long start, unsigned long end);
extern void smp_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr);
-#define flush_tlb_kernel_range(start, end) \
-do { flush_tsb_kernel_range(start,end); \
- smp_flush_tlb_kernel_range(start, end); \
-} while (0)
-
#define global_flush_tlb_page(mm, vaddr) \
smp_flush_tlb_page(mm, vaddr)
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2702,3 +2702,26 @@ void hugetlb_setup(struct pt_regs *regs)
}
}
#endif
+
+#ifdef CONFIG_SMP
+#define do_flush_tlb_kernel_range smp_flush_tlb_kernel_range
+#else
+#define do_flush_tlb_kernel_range __flush_tlb_kernel_range
+#endif
+
+void flush_tlb_kernel_range(unsigned long start, unsigned long end)
+{
+ if (start < HI_OBP_ADDRESS && end > LOW_OBP_ADDRESS) {
+ if (start < LOW_OBP_ADDRESS) {
+ flush_tsb_kernel_range(start, LOW_OBP_ADDRESS);
+ do_flush_tlb_kernel_range(start, LOW_OBP_ADDRESS);
+ }
+ if (end > HI_OBP_ADDRESS) {
+ flush_tsb_kernel_range(end, HI_OBP_ADDRESS);
+ do_flush_tlb_kernel_range(end, HI_OBP_ADDRESS);
+ }
+ } else {
+ flush_tsb_kernel_range(start, end);
+ do_flush_tlb_kernel_range(start, end);
+ }
+}
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 35/39] bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (33 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 34/39] sparc64: Guard against flushing openfirmware mappings Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 36/39] sunsab: Fix detection of BREAK on sunsab serial console Greg Kroah-Hartman
` (5 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Christopher Alexander Tobias Schulze,
David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
[ Upstream commit 5cdceab3d5e02eb69ea0f5d8fa9181800baf6f77 ]
Fix regression in bbc i2c temperature and fan control on some Sun systems
that causes the driver to refuse to load due to the bbc_i2c_bussel resource not
being present on the (second) i2c bus where the temperature sensors and fan
control are located. (The check for the number of resources was removed when
the driver was ported to a pure OF driver in mid 2008.)
Signed-off-by: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/sbus/char/bbc_envctrl.c | 6 ++++++
drivers/sbus/char/bbc_i2c.c | 11 ++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
--- a/drivers/sbus/char/bbc_envctrl.c
+++ b/drivers/sbus/char/bbc_envctrl.c
@@ -452,6 +452,9 @@ static void attach_one_temp(struct bbc_i
if (!tp)
return;
+ INIT_LIST_HEAD(&tp->bp_list);
+ INIT_LIST_HEAD(&tp->glob_list);
+
tp->client = bbc_i2c_attach(bp, op);
if (!tp->client) {
kfree(tp);
@@ -497,6 +500,9 @@ static void attach_one_fan(struct bbc_i2
if (!fp)
return;
+ INIT_LIST_HEAD(&fp->bp_list);
+ INIT_LIST_HEAD(&fp->glob_list);
+
fp->client = bbc_i2c_attach(bp, op);
if (!fp->client) {
kfree(fp);
--- a/drivers/sbus/char/bbc_i2c.c
+++ b/drivers/sbus/char/bbc_i2c.c
@@ -300,13 +300,18 @@ static struct bbc_i2c_bus * attach_one_i
if (!bp)
return NULL;
+ INIT_LIST_HEAD(&bp->temps);
+ INIT_LIST_HEAD(&bp->fans);
+
bp->i2c_control_regs = of_ioremap(&op->resource[0], 0, 0x2, "bbc_i2c_regs");
if (!bp->i2c_control_regs)
goto fail;
- bp->i2c_bussel_reg = of_ioremap(&op->resource[1], 0, 0x1, "bbc_i2c_bussel");
- if (!bp->i2c_bussel_reg)
- goto fail;
+ if (op->num_resources == 2) {
+ bp->i2c_bussel_reg = of_ioremap(&op->resource[1], 0, 0x1, "bbc_i2c_bussel");
+ if (!bp->i2c_bussel_reg)
+ goto fail;
+ }
bp->waiting = 0;
init_waitqueue_head(&bp->wq);
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 36/39] sunsab: Fix detection of BREAK on sunsab serial console
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (34 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 35/39] bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000 Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 37/39] sparc64: ldc_connect() should not return EINVAL when handshake is in progress Greg Kroah-Hartman
` (4 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Christopher Alexander Tobias Schulze,
David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
[ Upstream commit fe418231b195c205701c0cc550a03f6c9758fd9e ]
Fix detection of BREAK on sunsab serial console: BREAK detection was only
performed when there were also serial characters received simultaneously.
To handle all BREAKs correctly, the check for BREAK and the corresponding
call to uart_handle_break() must also be done if count == 0, therefore
duplicate this code fragment and pull it out of the loop over the received
characters.
Patch applies to 3.16-rc6.
Signed-off-by: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/tty/serial/sunsab.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/drivers/tty/serial/sunsab.c
+++ b/drivers/tty/serial/sunsab.c
@@ -157,6 +157,15 @@ receive_chars(struct uart_sunsab_port *u
(up->port.line == up->port.cons->index))
saw_console_brk = 1;
+ if (count == 0) {
+ if (unlikely(stat->sreg.isr1 & SAB82532_ISR1_BRK)) {
+ stat->sreg.isr0 &= ~(SAB82532_ISR0_PERR |
+ SAB82532_ISR0_FERR);
+ up->port.icount.brk++;
+ uart_handle_break(&up->port);
+ }
+ }
+
for (i = 0; i < count; i++) {
unsigned char ch = buf[i], flag;
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 37/39] sparc64: ldc_connect() should not return EINVAL when handshake is in progress.
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (35 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 36/39] sunsab: Fix detection of BREAK on sunsab serial console Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 38/39] arch/sparc/math-emu/math_32.c: drop stray break operator Greg Kroah-Hartman
` (3 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Sowmini Varadhan, Karl Volz,
David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
[ Upstream commit 4ec1b01029b4facb651b8ef70bc20a4be4cebc63 ]
The LDC handshake could have been asynchronously triggered
after ldc_bind() enables the ldc_rx() receive interrupt-handler
(and thus intercepts incoming control packets)
and before vio_port_up() calls ldc_connect(). If that is the case,
ldc_connect() should return 0 and let the state-machine
progress.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Karl Volz <karl.volz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/kernel/ldc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/sparc/kernel/ldc.c
+++ b/arch/sparc/kernel/ldc.c
@@ -1336,7 +1336,7 @@ int ldc_connect(struct ldc_channel *lp)
if (!(lp->flags & LDC_FLAG_ALLOCED_QUEUES) ||
!(lp->flags & LDC_FLAG_REGISTERED_QUEUES) ||
lp->hs_state != LDC_HS_OPEN)
- err = -EINVAL;
+ err = ((lp->hs_state > LDC_HS_OPEN) ? 0 : -EINVAL);
else
err = start_handshake(lp);
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 38/39] arch/sparc/math-emu/math_32.c: drop stray break operator
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (36 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 37/39] sparc64: ldc_connect() should not return EINVAL when handshake is in progress Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 39/39] xfs: log vector rounding leaks log space Greg Kroah-Hartman
` (2 subsequent siblings)
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, David Binderman, Andrey Utkin,
David S. Miller
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Utkin <andrey.krieger.utkin@gmail.com>
[ Upstream commit 093758e3daede29cb4ce6aedb111becf9d4bfc57 ]
This commit is a guesswork, but it seems to make sense to drop this
break, as otherwise the following line is never executed and becomes
dead code. And that following line actually saves the result of
local calculation by the pointer given in function argument. So the
proposed change makes sense if this code in the whole makes sense (but I
am unable to analyze it in the whole).
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81641
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/sparc/math-emu/math_32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/sparc/math-emu/math_32.c
+++ b/arch/sparc/math-emu/math_32.c
@@ -499,7 +499,7 @@ static int do_one_mathemu(u32 insn, unsi
case 0: fsr = *pfsr;
if (IR == -1) IR = 2;
/* fcc is always fcc0 */
- fsr &= ~0xc00; fsr |= (IR << 10); break;
+ fsr &= ~0xc00; fsr |= (IR << 10);
*pfsr = fsr;
break;
case 1: rd->s = IR; break;
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3.14 39/39] xfs: log vector rounding leaks log space
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (37 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 38/39] arch/sparc/math-emu/math_32.c: drop stray break operator Greg Kroah-Hartman
@ 2014-08-08 21:35 ` Greg Kroah-Hartman
2014-08-09 3:01 ` [PATCH 3.14 00/39] 3.14.17-stable review Guenter Roeck
2014-08-09 14:41 ` Shuah Khan
40 siblings, 0 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-08 21:35 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Dave Chinner, Christoph Hellwig,
Brian Foster, Dave Chinner, Bill
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner <dchinner@redhat.com>
commit 110dc24ad2ae4e9b94b08632fe1eb2fcdff83045 upstream.
The addition of direct formatting of log items into the CIL
linear buffer added alignment restrictions that the start of each
vector needed to be 64 bit aligned. Hence padding was added in
xlog_finish_iovec() to round up the vector length to ensure the next
vector started with the correct alignment.
This adds a small number of bytes to the size of
the linear buffer that is otherwise unused. The issue is that we
then use the linear buffer size to determine the log space used by
the log item, and this includes the unused space. Hence when we
account for space used by the log item, it's more than is actually
written into the iclogs, and hence we slowly leak this space.
This results on log hangs when reserving space, with threads getting
stuck with these stack traces:
Call Trace:
[<ffffffff81d15989>] schedule+0x29/0x70
[<ffffffff8150d3a2>] xlog_grant_head_wait+0xa2/0x1a0
[<ffffffff8150d55d>] xlog_grant_head_check+0xbd/0x140
[<ffffffff8150ee33>] xfs_log_reserve+0x103/0x220
[<ffffffff814b7f05>] xfs_trans_reserve+0x2f5/0x310
.....
The 4 bytes is significant. Brain Foster did all the hard work in
tracking down a reproducable leak to inode chunk allocation (it went
away with the ikeep mount option). His rough numbers were that
creating 50,000 inodes leaked 11 log blocks. This turns out to be
roughly 800 inode chunks or 1600 inode cluster buffers. That
works out at roughly 4 bytes per cluster buffer logged, and at that
I started looking for a 4 byte leak in the buffer logging code.
What I found was that a struct xfs_buf_log_format structure for an
inode cluster buffer is 28 bytes in length. This gets rounded up to
32 bytes, but the vector length remains 28 bytes. Hence the CIL
ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
for that vector rather than 28 bytes which are written into the log.
The fix for this problem is to separately track the bytes used by
the log vectors in the item and use that instead of the buffer
length when accounting for the log space that will be used by the
formatted log item.
Again, thanks to Brian Foster for doing all the hard work and long
hours to isolate this leak and make finding the bug relatively
simple.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Bill <billstuff2001@sbcglobal.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/xfs/xfs_log.h | 19 +++++++++++++------
fs/xfs/xfs_log_cil.c | 7 ++++---
2 files changed, 17 insertions(+), 9 deletions(-)
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -24,7 +24,8 @@ struct xfs_log_vec {
struct xfs_log_iovec *lv_iovecp; /* iovec array */
struct xfs_log_item *lv_item; /* owner */
char *lv_buf; /* formatted buffer */
- int lv_buf_len; /* size of formatted buffer */
+ int lv_bytes; /* accounted space in buffer */
+ int lv_buf_len; /* aligned size of buffer */
int lv_size; /* size of allocated lv */
};
@@ -52,15 +53,21 @@ xlog_prepare_iovec(struct xfs_log_vec *l
return vec->i_addr;
}
+/*
+ * We need to make sure the next buffer is naturally aligned for the biggest
+ * basic data type we put into it. We already accounted for this padding when
+ * sizing the buffer.
+ *
+ * However, this padding does not get written into the log, and hence we have to
+ * track the space used by the log vectors separately to prevent log space hangs
+ * due to inaccurate accounting (i.e. a leak) of the used log space through the
+ * CIL context ticket.
+ */
static inline void
xlog_finish_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec *vec, int len)
{
- /*
- * We need to make sure the next buffer is naturally aligned for the
- * biggest basic data type we put into it. We already accounted for
- * this when sizing the buffer.
- */
lv->lv_buf_len += round_up(len, sizeof(uint64_t));
+ lv->lv_bytes += len;
vec->i_len = len;
}
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -97,7 +97,7 @@ xfs_cil_prepare_item(
{
/* Account for the new LV being passed in */
if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED) {
- *diff_len += lv->lv_buf_len;
+ *diff_len += lv->lv_bytes;
*diff_iovecs += lv->lv_niovecs;
}
@@ -111,7 +111,7 @@ xfs_cil_prepare_item(
else if (old_lv != lv) {
ASSERT(lv->lv_buf_len != XFS_LOG_VEC_ORDERED);
- *diff_len -= old_lv->lv_buf_len;
+ *diff_len -= old_lv->lv_bytes;
*diff_iovecs -= old_lv->lv_niovecs;
kmem_free(old_lv);
}
@@ -239,7 +239,7 @@ xlog_cil_insert_format_items(
* that the space reservation accounting is correct.
*/
*diff_iovecs -= lv->lv_niovecs;
- *diff_len -= lv->lv_buf_len;
+ *diff_len -= lv->lv_bytes;
} else {
/* allocate new data chunk */
lv = kmem_zalloc(buf_size, KM_SLEEP|KM_NOFS);
@@ -259,6 +259,7 @@ xlog_cil_insert_format_items(
/* The allocated data region lies beyond the iovec region */
lv->lv_buf_len = 0;
+ lv->lv_bytes = 0;
lv->lv_buf = (char *)lv + buf_size - nbytes;
ASSERT(IS_ALIGNED((unsigned long)lv->lv_buf, sizeof(uint64_t)));
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 3.14 00/39] 3.14.17-stable review
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (38 preceding siblings ...)
2014-08-08 21:35 ` [PATCH 3.14 39/39] xfs: log vector rounding leaks log space Greg Kroah-Hartman
@ 2014-08-09 3:01 ` Guenter Roeck
2014-08-09 14:41 ` Shuah Khan
40 siblings, 0 replies; 42+ messages in thread
From: Guenter Roeck @ 2014-08-09 3:01 UTC (permalink / raw)
To: Greg Kroah-Hartman, linux-kernel
Cc: torvalds, akpm, satoru.takeuchi, shuah.kh, stable
On 08/08/2014 02:34 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.14.17 release.
> There are 39 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Aug 10 21:33:45 UTC 2014.
> Anything received after that time might be too late.
>
Build results:
total: 137 pass: 137 fail: 0
Qemu tests all passed.
Details are available at http://server.roeck-us.net:8010/builders.
Guenter
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 3.14 00/39] 3.14.17-stable review
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
` (39 preceding siblings ...)
2014-08-09 3:01 ` [PATCH 3.14 00/39] 3.14.17-stable review Guenter Roeck
@ 2014-08-09 14:41 ` Shuah Khan
40 siblings, 0 replies; 42+ messages in thread
From: Shuah Khan @ 2014-08-09 14:41 UTC (permalink / raw)
To: Greg Kroah-Hartman, linux-kernel
Cc: torvalds, akpm, linux, satoru.takeuchi, stable, Shuah Khan
On 08/08/2014 03:34 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.14.17 release.
> There are 39 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Aug 10 21:33:45 UTC 2014.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.14.17-rc1.gz
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
Compiled and booted on my test system. No dmesg regressions.
-- Shuah
--
Shuah Khan
Senior Linux Kernel Developer - Open Source Group
Samsung Research America(Silicon Valley)
shuah.kh@samsung.com | (970) 672-0658
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2014-08-09 14:41 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-08 21:34 [PATCH 3.14 00/39] 3.14.17-stable review Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 01/39] xfrm: Fix installation of AH IPsec SAs Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 02/39] bnx2x: fix crash during TSO tunneling Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 03/39] inetpeer: get rid of ip_id_count Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 04/39] ip: make IP identifiers less predictable Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 05/39] net: sendmsg: fix NULL pointer dereference Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 06/39] net: phy: re-apply PHY fixups during phy_register_device Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 07/39] ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip" Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 08/39] tcp: Fix integer-overflows in TCP veno Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 09/39] tcp: Fix integer-overflow in TCP vegas Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 10/39] bna: fix performance regression Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 11/39] net: sctp: inherit auth_capable on INIT collisions Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 12/39] macvlan: Initialize vlan_features to turn on offload support Greg Kroah-Hartman
2014-08-08 21:34 ` [PATCH 3.14 13/39] net: Correctly set segment mac_len in skb_segment() Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 14/39] iovec: make sure the caller actually wants anything in memcpy_fromiovecend Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 15/39] batman-adv: Fix out-of-order fragmentation support Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 16/39] sctp: fix possible seqlock seadlock in sctp_packet_transmit() Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 17/39] sparc64: Fix argument sign extension for compat_sys_futex() Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 18/39] sparc64: Make itc_sync_lock raw Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 19/39] sparc64: Fix executable bit testing in set_pmd_at() paths Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 20/39] sparc64: Fix huge PMD invalidation Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 21/39] sparc64: Fix bugs in get_user_pages_fast() wrt. THP Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 22/39] sparc64: Fix hex values in comment above pte_modify() Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 23/39] sparc64: Dont use _PAGE_PRESENT in pte_modify() mask Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 24/39] sparc64: Handle 32-bit tasks properly in compute_effective_address() Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 25/39] sparc64: Fix top-level fault handling bugs Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 26/39] sparc64: Fix range check in kern_addr_valid() Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 27/39] sparc64: Use ILOG2_4MB instead of constant 22 Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 28/39] sparc64: Add basic validations to {pud,pmd}_bad() Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 29/39] sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR() Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 30/39] sparc64: Dont bark so loudly about 32-bit tasks generating 64-bit fault addresses Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 31/39] sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 32/39] sparc64: Add membar to Niagara2 memcpy code Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 33/39] sparc64: Do not insert non-valid PTEs into the TSB hash table Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 34/39] sparc64: Guard against flushing openfirmware mappings Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 35/39] bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000 Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 36/39] sunsab: Fix detection of BREAK on sunsab serial console Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 37/39] sparc64: ldc_connect() should not return EINVAL when handshake is in progress Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 38/39] arch/sparc/math-emu/math_32.c: drop stray break operator Greg Kroah-Hartman
2014-08-08 21:35 ` [PATCH 3.14 39/39] xfs: log vector rounding leaks log space Greg Kroah-Hartman
2014-08-09 3:01 ` [PATCH 3.14 00/39] 3.14.17-stable review Guenter Roeck
2014-08-09 14:41 ` Shuah Khan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).