* [PATCH 4.4 00/49] 4.4.18-stable review
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 01/49] tcp: make challenge acks less predictable Greg Kroah-Hartman
` (49 more replies)
0 siblings, 50 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah.kh, patches,
stable
This is the start of the stable review cycle for the 4.4.18 release.
There are 49 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Tue Aug 16 20:22:43 UTC 2016.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.18-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux 4.4.18-rc1
Vegard Nossum <vegard.nossum@oracle.com>
ext4: fix reference counting bug on block allocation error
Vegard Nossum <vegard.nossum@oracle.com>
ext4: short-cut orphan cleanup on error
Theodore Ts'o <tytso@mit.edu>
ext4: validate s_reserved_gdt_blocks on mount
Vegard Nossum <vegard.nossum@oracle.com>
ext4: don't call ext4_should_journal_data() on the journal inode
Jan Kara <jack@suse.cz>
ext4: fix deadlock during page writeback
Vegard Nossum <vegard.nossum@oracle.com>
ext4: check for extents that wrap around
Herbert Xu <herbert@gondor.apana.org.au>
crypto: scatterwalk - Fix test in scatterwalk_done
Herbert Xu <herbert@gondor.apana.org.au>
crypto: gcm - Filter out async ghash if necessary
Wei Fang <fangwei1@huawei.com>
fs/dcache.c: avoid soft-lockup in dput()
Wei Fang <fangwei1@huawei.com>
fuse: fix wrong assignment of ->flags in fuse_send_init()
Maxim Patlasov <mpatlasov@virtuozzo.com>
fuse: fuse_flush must check mapping->flags for errors
Alexey Kuznetsov <kuznet@parallels.com>
fuse: fsync() did not return IO errors
Fabian Frederick <fabf@skynet.be>
sysv, ipc: fix security-layer leaking
Vegard Nossum <vegard.nossum@oracle.com>
block: fix use-after-free in seq file
David Howells <dhowells@redhat.com>
x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace
Vladimir Davydov <vdavydov@virtuozzo.com>
mm: memcontrol: fix memcg id ref counter on swap charge move
Vladimir Davydov <vdavydov@virtuozzo.com>
mm: memcontrol: fix swap counter leak on swapout from offline cgroup
Johannes Weiner <hannes@cmpxchg.org>
mm: memcontrol: fix cgroup creation failure after many small jobs
Matt Roper <matthew.d.roper@intel.com>
drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2)
Toshi Kani <toshi.kani@hpe.com>
x86/mm/pat: Fix BUG_ON() in mmap_mem() on QEMU/i386
Toshi Kani <toshi.kani@hpe.com>
x86/pat: Document the PAT initialization sequence
Toshi Kani <toshi.kani@hpe.com>
x86/xen, pat: Remove PAT table init code from Xen
Toshi Kani <toshi.kani@hpe.com>
x86/mtrr: Fix PAT init handling when MTRR is disabled
Toshi Kani <toshi.kani@hpe.com>
x86/mtrr: Fix Xorg crashes in Qemu sessions
Toshi Kani <toshi.kani@hpe.com>
x86/mm/pat: Replace cpu_has_pat with boot_cpu_has()
Toshi Kani <toshi.kani@hpe.com>
x86/mm/pat: Add pat_disable() interface
Toshi Kani <toshi.kani@hpe.com>
x86/mm/pat: Add support of non-default PAT MSR setting
Linus Torvalds <torvalds@linux-foundation.org>
devpts: clean up interface to pty drivers
Theodore Ts'o <tytso@mit.edu>
random: strengthen input validation for RNDADDTOENTCNT
John Johansen <john.johansen@canonical.com>
apparmor: fix ref count leak when profile sha1 hash is read
Michael Holzheu <holzheu@linux.vnet.ibm.com>
Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL"
David Howells <dhowells@redhat.com>
KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace
Dave Weinstein <olorin@google.com>
arm: oabi compat: add missing access checks
Bjørn Mork <bjorn@mork.no>
cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind
Mika Westerberg <mika.westerberg@linux.intel.com>
i2c: i801: Allow ACPI SystemIO OpRegion to conflict with PCI BAR
Hector Marco-Gisbert <hecmargi@upv.es>
x86/mm/32: Enable full randomization on i386 and X86_32
Benjamin Tissoires <benjamin.tissoires@redhat.com>
HID: sony: do not bail out when the sixaxis refuses the output report
Christophe Le Roy <christophe.fish@gmail.com>
PNP: Add Broadwell to Intel MCH size workaround
Josh Boyer <jwboyer@fedoraproject.org>
PNP: Add Haswell-ULT to Intel MCH size workaround
Hannes Reinecke <hare@suse.de>
scsi: ignore errors from scsi_dh_add_device()
Ben Hutchings <ben@decadent.org.uk>
ipath: Restrict use of the write() interface
Soheil Hassas Yeganeh <soheil@google.com>
tcp: consider recv buf for the initial window scale
Manish Chopra <manish.chopra@qlogic.com>
qed: Fix setting/clearing bit in completion bitmap
Vegard Nossum <vegard.nossum@oracle.com>
net/irda: fix NULL pointer dereference on memory allocation failure
Florian Fainelli <f.fainelli@gmail.com>
net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
Beniamino Galvani <bgalvani@redhat.com>
bonding: set carrier off for devices created through netlink
Julian Anastasov <ja@ssi.bg>
ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space
Jason Baron <jbaron@akamai.com>
tcp: enable per-socket rate limiting of all 'challenge acks'
Eric Dumazet <edumazet@google.com>
tcp: make challenge acks less predictable
-------------
Diffstat:
Documentation/x86/pat.txt | 32 ++++++
Makefile | 4 +-
arch/arm/kernel/sys_oabi-compat.c | 8 +-
arch/mips/kernel/scall64-n32.S | 2 +-
arch/mips/kernel/scall64-o32.S | 2 +-
arch/s390/kernel/ipl.c | 7 --
arch/x86/entry/syscalls/syscall_32.tbl | 2 +-
arch/x86/include/asm/mtrr.h | 6 +-
arch/x86/include/asm/pat.h | 2 +-
arch/x86/kernel/cpu/mtrr/generic.c | 24 +++--
arch/x86/kernel/cpu/mtrr/main.c | 13 ++-
arch/x86/kernel/cpu/mtrr/mtrr.h | 1 +
arch/x86/mm/mmap.c | 14 +--
arch/x86/mm/pat.c | 109 +++++++++++++--------
arch/x86/xen/enlighten.c | 9 --
block/genhd.c | 1 +
crypto/gcm.c | 4 +-
crypto/scatterwalk.c | 3 +-
drivers/char/random.c | 13 +--
drivers/gpu/drm/i915/intel_pm.c | 14 ++-
drivers/hid/hid-sony.c | 6 +-
drivers/i2c/busses/i2c-i801.c | 103 ++++++++++++++++++--
drivers/net/bonding/bond_netlink.c | 6 +-
drivers/net/ethernet/broadcom/bgmac.c | 2 +-
drivers/net/ethernet/qlogic/qed/qed_spq.c | 7 +-
drivers/net/usb/cdc_ncm.c | 20 +---
drivers/pnp/quirks.c | 2 +
drivers/scsi/scsi_sysfs.c | 7 +-
drivers/staging/rdma/ipath/ipath_file_ops.c | 5 +
drivers/tty/pty.c | 63 ++++++------
fs/dcache.c | 7 +-
fs/devpts/inode.c | 49 +++++-----
fs/ext4/balloc.c | 3 +
fs/ext4/extents.c | 8 +-
fs/ext4/inode.c | 35 +++++--
fs/ext4/mballoc.c | 17 +---
fs/ext4/super.c | 17 ++++
fs/fuse/file.c | 24 +++++
fs/fuse/inode.c | 2 +-
include/linux/devpts_fs.h | 34 ++-----
include/linux/memcontrol.h | 8 ++
ipc/msg.c | 2 +-
ipc/sem.c | 12 +--
mm/memcontrol.c | 146 +++++++++++++++++++++++-----
mm/slab_common.c | 2 +-
net/ipv4/fib_semantics.c | 6 ++
net/ipv4/tcp_input.c | 54 +++++-----
net/ipv4/tcp_output.c | 3 +-
net/irda/af_irda.c | 7 +-
security/apparmor/apparmorfs.c | 1 +
50 files changed, 626 insertions(+), 302 deletions(-)
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 01/49] tcp: make challenge acks less predictable
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 02/49] tcp: enable per-socket rate limiting of all challenge acks Greg Kroah-Hartman
` (48 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Yue Cao, Eric Dumazet, Linus Torvalds,
Yuchung Cheng, Neal Cardwell, David S. Miller
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit 75ff39ccc1bd5d3c455b6822ab09e533c551f758 ]
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.
This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.
Based on initial analysis and patch from Linus.
Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.
v2: randomize the count of challenge acks per second, not the period.
Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2")
Reported-by: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/tcp_input.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -89,7 +89,7 @@ int sysctl_tcp_adv_win_scale __read_most
EXPORT_SYMBOL(sysctl_tcp_adv_win_scale);
/* rfc5961 challenge ack rate limiting */
-int sysctl_tcp_challenge_ack_limit = 100;
+int sysctl_tcp_challenge_ack_limit = 1000;
int sysctl_tcp_stdurg __read_mostly;
int sysctl_tcp_rfc1337 __read_mostly;
@@ -3427,7 +3427,7 @@ static void tcp_send_challenge_ack(struc
static u32 challenge_timestamp;
static unsigned int challenge_count;
struct tcp_sock *tp = tcp_sk(sk);
- u32 now;
+ u32 count, now;
/* First check our per-socket dupack rate limit. */
if (tcp_oow_rate_limited(sock_net(sk), skb,
@@ -3435,13 +3435,18 @@ static void tcp_send_challenge_ack(struc
&tp->last_oow_ack_time))
return;
- /* Then check the check host-wide RFC 5961 rate limit. */
+ /* Then check host-wide RFC 5961 rate limit. */
now = jiffies / HZ;
if (now != challenge_timestamp) {
+ u32 half = (sysctl_tcp_challenge_ack_limit + 1) >> 1;
+
challenge_timestamp = now;
- challenge_count = 0;
+ WRITE_ONCE(challenge_count, half +
+ prandom_u32_max(sysctl_tcp_challenge_ack_limit));
}
- if (++challenge_count <= sysctl_tcp_challenge_ack_limit) {
+ count = READ_ONCE(challenge_count);
+ if (count > 0) {
+ WRITE_ONCE(challenge_count, count - 1);
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPCHALLENGEACK);
tcp_send_ack(sk);
}
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 02/49] tcp: enable per-socket rate limiting of all challenge acks
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 01/49] tcp: make challenge acks less predictable Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 03/49] ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space Greg Kroah-Hartman
` (47 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller,
Neal Cardwell, Yuchung Cheng, Yue Cao, Jason Baron
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Baron <jbaron@akamai.com>
[ Upstream commit 083ae308280d13d187512b9babe3454342a7987e ]
The per-socket rate limit for 'challenge acks' was introduced in the
context of limiting ack loops:
commit f2b2c582e824 ("tcp: mitigate ACK loops for connections as tcp_sock")
And I think it can be extended to rate limit all 'challenge acks' on a
per-socket basis.
Since we have the global tcp_challenge_ack_limit, this patch allows for
tcp_challenge_ack_limit to be set to a large value and effectively rely on
the per-socket limit, or set tcp_challenge_ack_limit to a lower value and
still prevents a single connections from consuming the entire challenge ack
quota.
It further moves in the direction of eliminating the global limit at some
point, as Eric Dumazet has suggested. This a follow-up to:
Subject: [PATCH 4.4 02/49] tcp: make challenge acks less predictable
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/tcp_input.c | 39 ++++++++++++++++++++++-----------------
1 file changed, 22 insertions(+), 17 deletions(-)
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3390,6 +3390,23 @@ static int tcp_ack_update_window(struct
return flag;
}
+static bool __tcp_oow_rate_limited(struct net *net, int mib_idx,
+ u32 *last_oow_ack_time)
+{
+ if (*last_oow_ack_time) {
+ s32 elapsed = (s32)(tcp_time_stamp - *last_oow_ack_time);
+
+ if (0 <= elapsed && elapsed < sysctl_tcp_invalid_ratelimit) {
+ NET_INC_STATS_BH(net, mib_idx);
+ return true; /* rate-limited: don't send yet! */
+ }
+ }
+
+ *last_oow_ack_time = tcp_time_stamp;
+
+ return false; /* not rate-limited: go ahead, send dupack now! */
+}
+
/* Return true if we're currently rate-limiting out-of-window ACKs and
* thus shouldn't send a dupack right now. We rate-limit dupacks in
* response to out-of-window SYNs or ACKs to mitigate ACK loops or DoS
@@ -3403,21 +3420,9 @@ bool tcp_oow_rate_limited(struct net *ne
/* Data packets without SYNs are not likely part of an ACK loop. */
if ((TCP_SKB_CB(skb)->seq != TCP_SKB_CB(skb)->end_seq) &&
!tcp_hdr(skb)->syn)
- goto not_rate_limited;
-
- if (*last_oow_ack_time) {
- s32 elapsed = (s32)(tcp_time_stamp - *last_oow_ack_time);
+ return false;
- if (0 <= elapsed && elapsed < sysctl_tcp_invalid_ratelimit) {
- NET_INC_STATS_BH(net, mib_idx);
- return true; /* rate-limited: don't send yet! */
- }
- }
-
- *last_oow_ack_time = tcp_time_stamp;
-
-not_rate_limited:
- return false; /* not rate-limited: go ahead, send dupack now! */
+ return __tcp_oow_rate_limited(net, mib_idx, last_oow_ack_time);
}
/* RFC 5961 7 [ACK Throttling] */
@@ -3430,9 +3435,9 @@ static void tcp_send_challenge_ack(struc
u32 count, now;
/* First check our per-socket dupack rate limit. */
- if (tcp_oow_rate_limited(sock_net(sk), skb,
- LINUX_MIB_TCPACKSKIPPEDCHALLENGE,
- &tp->last_oow_ack_time))
+ if (__tcp_oow_rate_limited(sock_net(sk),
+ LINUX_MIB_TCPACKSKIPPEDCHALLENGE,
+ &tp->last_oow_ack_time))
return;
/* Then check host-wide RFC 5961 rate limit. */
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 03/49] ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 01/49] tcp: make challenge acks less predictable Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 02/49] tcp: enable per-socket rate limiting of all challenge acks Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 04/49] bonding: set carrier off for devices created through netlink Greg Kroah-Hartman
` (46 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Vegard Nossum, Julian Anastasov,
Andy Gospodarek, Dinesh Dutt, Scott Feldman, David S. Miller
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Anastasov <ja@ssi.bg>
[ Upstream commit 80610229ef7b26615dbb6cb6e873709a60bacc9f ]
Vegard Nossum is reporting for a crash in fib_dump_info
when nh_dev = NULL and fib_nhs == 1:
Pid: 50, comm: netlink.exe Not tainted 4.7.0-rc5+
RIP: 0033:[<00000000602b3d18>]
RSP: 0000000062623890 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000006261b800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000024 RDI: 000000006245ba00
RBP: 00000000626238f0 R08: 000000000000029c R09: 0000000000000000
R10: 0000000062468038 R11: 000000006245ba00 R12: 000000006245ba00
R13: 00000000625f96c0 R14: 00000000601e16f0 R15: 0000000000000000
Kernel panic - not syncing: Kernel mode fault at addr 0x2e0, ip 0x602b3d18
CPU: 0 PID: 50 Comm: netlink.exe Not tainted 4.7.0-rc5+ #581
Stack:
626238f0 960226a02 00000400 000000fe
62623910 600afca7 62623970 62623a48
62468038 00000018 00000000 00000000
Call Trace:
[<602b3e93>] rtmsg_fib+0xd3/0x190
[<602b6680>] fib_table_insert+0x260/0x500
[<602b0e5d>] inet_rtm_newroute+0x4d/0x60
[<60250def>] rtnetlink_rcv_msg+0x8f/0x270
[<60267079>] netlink_rcv_skb+0xc9/0xe0
[<60250d4b>] rtnetlink_rcv+0x3b/0x50
[<60265400>] netlink_unicast+0x1a0/0x2c0
[<60265e47>] netlink_sendmsg+0x3f7/0x470
[<6021dc9a>] sock_sendmsg+0x3a/0x90
[<6021e0d0>] ___sys_sendmsg+0x300/0x360
[<6021fa64>] __sys_sendmsg+0x54/0xa0
[<6021fac0>] SyS_sendmsg+0x10/0x20
[<6001ea68>] handle_syscall+0x88/0x90
[<600295fd>] userspace+0x3fd/0x500
[<6001ac55>] fork_handler+0x85/0x90
$ addr2line -e vmlinux -i 0x602b3d18
include/linux/inetdevice.h:222
net/ipv4/fib_semantics.c:1264
Problem happens when RTNH_F_LINKDOWN is provided from user space
when creating routes that do not use the flag, catched with
netlink fuzzer.
Currently, the kernel allows user space to set both flags
to nh_flags and fib_flags but this is not intentional, the
assumption was that they are not set. Fix this by rejecting
both flags with EINVAL.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Fixes: 0eeb075fad73 ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Cc: Dinesh Dutt <ddutt@cumulusnetworks.com>
Cc: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/fib_semantics.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -479,6 +479,9 @@ static int fib_get_nhs(struct fib_info *
if (!rtnh_ok(rtnh, remaining))
return -EINVAL;
+ if (rtnh->rtnh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
+ return -EINVAL;
+
nexthop_nh->nh_flags =
(cfg->fc_flags & ~0xFF) | rtnh->rtnh_flags;
nexthop_nh->nh_oif = rtnh->rtnh_ifindex;
@@ -1003,6 +1006,9 @@ struct fib_info *fib_create_info(struct
if (fib_props[cfg->fc_type].scope > cfg->fc_scope)
goto err_inval;
+ if (cfg->fc_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
+ goto err_inval;
+
#ifdef CONFIG_IP_ROUTE_MULTIPATH
if (cfg->fc_mp) {
nhs = fib_count_nexthops(cfg->fc_mp, cfg->fc_mp_len);
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 04/49] bonding: set carrier off for devices created through netlink
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (2 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 03/49] ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 05/49] net: bgmac: Fix infinite loop in bgmac_dma_tx_add() Greg Kroah-Hartman
` (45 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Beniamino Galvani, David S. Miller
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Beniamino Galvani <bgalvani@redhat.com>
[ Upstream commit 005db31d5f5f7c31cfdc43505d77eb3ca5cf8ec6 ]
Commit e826eafa65c6 ("bonding: Call netif_carrier_off after
register_netdevice") moved netif_carrier_off() from bond_init() to
bond_create(), but the latter is called only for initial default
devices and ones created through sysfs:
$ modprobe bonding
$ echo +bond1 > /sys/class/net/bonding_masters
$ ip link add bond2 type bond
$ grep "MII Status" /proc/net/bonding/*
/proc/net/bonding/bond0:MII Status: down
/proc/net/bonding/bond1:MII Status: down
/proc/net/bonding/bond2:MII Status: up
Ensure that carrier is initially off also for devices created through
netlink.
Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/net/bonding/bond_netlink.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -446,7 +446,11 @@ static int bond_newlink(struct net *src_
if (err < 0)
return err;
- return register_netdevice(bond_dev);
+ err = register_netdevice(bond_dev);
+
+ netif_carrier_off(bond_dev);
+
+ return err;
}
static size_t bond_get_size(const struct net_device *bond_dev)
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 05/49] net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (3 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 04/49] bonding: set carrier off for devices created through netlink Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 06/49] net/irda: fix NULL pointer dereference on memory allocation failure Greg Kroah-Hartman
` (44 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Florian Fainelli, David S. Miller
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Fainelli <f.fainelli@gmail.com>
[ Upstream commit e86663c475d384ab5f46cb5637e9b7ad08c5c505 ]
Nothing is decrementing the index "i" while we are cleaning up the
fragments we could not successful transmit.
Fixes: 9cde94506eacf ("bgmac: implement scatter/gather support")
Reported-by: coverity (CID 1352048)
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/net/ethernet/broadcom/bgmac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -219,7 +219,7 @@ err_dma:
dma_unmap_single(dma_dev, slot->dma_addr, skb_headlen(skb),
DMA_TO_DEVICE);
- while (i > 0) {
+ while (i-- > 0) {
int index = (ring->end + i) % BGMAC_TX_RING_SLOTS;
struct bgmac_slot_info *slot = &ring->slots[index];
u32 ctl1 = le32_to_cpu(ring->cpu_base[index].ctl1);
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 06/49] net/irda: fix NULL pointer dereference on memory allocation failure
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (4 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 05/49] net: bgmac: Fix infinite loop in bgmac_dma_tx_add() Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 07/49] qed: Fix setting/clearing bit in completion bitmap Greg Kroah-Hartman
` (43 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Vegard Nossum, David S. Miller
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit d3e6952cfb7ba5f4bfa29d4803ba91f96ce1204d ]
I ran into this:
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000
RIP: 0010:[<ffffffff82bbf066>] [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
RSP: 0018:ffff880111747bb8 EFLAGS: 00010286
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358
RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048
RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000
R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000
FS: 00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0
Stack:
0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220
ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232
ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000
Call Trace:
[<ffffffff82bca542>] irda_connect+0x562/0x1190
[<ffffffff825ae582>] SYSC_connect+0x202/0x2a0
[<ffffffff825b4489>] SyS_connect+0x9/0x10
[<ffffffff8100334c>] do_syscall_64+0x19c/0x410
[<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74
RIP [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
RSP <ffff880111747bb8>
---[ end trace 4cda2588bc055b30 ]---
The problem is that irda_open_tsap() can fail and leave self->tsap = NULL,
and then irttp_connect_request() almost immediately dereferences it.
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/irda/af_irda.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1024,8 +1024,11 @@ static int irda_connect(struct socket *s
}
/* Check if we have opened a local TSAP */
- if (!self->tsap)
- irda_open_tsap(self, LSAP_ANY, addr->sir_name);
+ if (!self->tsap) {
+ err = irda_open_tsap(self, LSAP_ANY, addr->sir_name);
+ if (err)
+ goto out;
+ }
/* Move to connecting socket, start sending Connect Requests */
sock->state = SS_CONNECTING;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 07/49] qed: Fix setting/clearing bit in completion bitmap
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (5 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 06/49] net/irda: fix NULL pointer dereference on memory allocation failure Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 08/49] tcp: consider recv buf for the initial window scale Greg Kroah-Hartman
` (42 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Manish Chopra, Yuval Mintz,
David S. Miller
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Manish Chopra <manish.chopra@qlogic.com>
[ Upstream commit 59d3f1ceb69b54569685d0c34dff16a1e0816b19 ]
Slowpath completion handling is incorrectly changing
SPQ_RING_SIZE bits instead of a single one.
Fixes: 76a9a3642a0b ("qed: fix handling of concurrent ramrods")
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/net/ethernet/qlogic/qed/qed_spq.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/qlogic/qed/qed_spq.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c
@@ -794,13 +794,12 @@ int qed_spq_completion(struct qed_hwfn *
* in a bitmap and increasing the chain consumer only
* for the first successive completed entries.
*/
- bitmap_set(p_spq->p_comp_bitmap, pos, SPQ_RING_SIZE);
+ __set_bit(pos, p_spq->p_comp_bitmap);
while (test_bit(p_spq->comp_bitmap_idx,
p_spq->p_comp_bitmap)) {
- bitmap_clear(p_spq->p_comp_bitmap,
- p_spq->comp_bitmap_idx,
- SPQ_RING_SIZE);
+ __clear_bit(p_spq->comp_bitmap_idx,
+ p_spq->p_comp_bitmap);
p_spq->comp_bitmap_idx++;
qed_chain_return_produced(&p_spq->chain);
}
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 08/49] tcp: consider recv buf for the initial window scale
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (6 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 07/49] qed: Fix setting/clearing bit in completion bitmap Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 09/49] ipath: Restrict use of the write() interface Greg Kroah-Hartman
` (41 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Soheil Hassas Yeganeh, Neal Cardwell,
David S. Miller
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Soheil Hassas Yeganeh <soheil@google.com>
[ Upstream commit f626300a3e776ccc9671b0dd94698fb3aa315966 ]
tcp_select_initial_window() intends to advertise a window
scaling for the maximum possible window size. To do so,
it considers the maximum of net.ipv4.tcp_rmem[2] and
net.core.rmem_max as the only possible upper-bounds.
However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
to set the socket's receive buffer size to values
larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
Thus, SO_RCVBUFFORCE is effectively ignored by
tcp_select_initial_window().
To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
net.core.rmem_max and socket's initial buffer space.
Fixes: b0573dea1fb3 ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/tcp_output.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -239,7 +239,8 @@ void tcp_select_initial_window(int __spa
/* Set window scaling on max possible window
* See RFC1323 for an explanation of the limit to 14
*/
- space = max_t(u32, sysctl_tcp_rmem[2], sysctl_rmem_max);
+ space = max_t(u32, space, sysctl_tcp_rmem[2]);
+ space = max_t(u32, space, sysctl_rmem_max);
space = min_t(u32, space, *window_clamp);
while (space > 65535 && (*rcv_wscale) < 14) {
space >>= 1;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 09/49] ipath: Restrict use of the write() interface
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (7 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 08/49] tcp: consider recv buf for the initial window scale Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 10/49] scsi: ignore errors from scsi_dh_add_device() Greg Kroah-Hartman
` (40 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Greg Kroah-Hartman, Ben Hutchings
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Hutchings <ben@decadent.org.uk>
Commit e6bd18f57aad ("IB/security: Restrict use of the write()
interface") fixed a security problem with various write()
implementations in the Infiniband subsystem. In older kernel versions
the ipath_write() function has the same problem and needs the same
restriction. (The ipath driver has been completely removed upstream.)
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/staging/rdma/ipath/ipath_file_ops.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/drivers/staging/rdma/ipath/ipath_file_ops.c
+++ b/drivers/staging/rdma/ipath/ipath_file_ops.c
@@ -45,6 +45,8 @@
#include <linux/uio.h>
#include <asm/pgtable.h>
+#include <rdma/ib.h>
+
#include "ipath_kernel.h"
#include "ipath_common.h"
#include "ipath_user_sdma.h"
@@ -2243,6 +2245,9 @@ static ssize_t ipath_write(struct file *
ssize_t ret = 0;
void *dest;
+ if (WARN_ON_ONCE(!ib_safe_file_access(fp)))
+ return -EACCES;
+
if (count < sizeof(cmd.type)) {
ret = -EINVAL;
goto bail;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 10/49] scsi: ignore errors from scsi_dh_add_device()
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (8 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 09/49] ipath: Restrict use of the write() interface Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 11/49] PNP: Add Haswell-ULT to Intel MCH size workaround Greg Kroah-Hartman
` (39 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Johannes Thumshirn, Christoph Hellwig,
Hannes Reinecke, Martin K. Petersen, Laura Abbott
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hannes Reinecke <hare@suse.de>
commit 221255aee67ec1c752001080aafec0c4e9390d95 upstream.
device handler initialisation might fail due to a number of
reasons. But as device_handlers are optional this shouldn't
cause us to disable the device entirely.
So just ignore errors from scsi_dh_add_device().
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/scsi/scsi_sysfs.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1058,11 +1058,12 @@ int scsi_sysfs_add_sdev(struct scsi_devi
}
error = scsi_dh_add_device(sdev);
- if (error) {
+ if (error)
+ /*
+ * device_handler is optional, so any error can be ignored
+ */
sdev_printk(KERN_INFO, sdev,
"failed to add device handler: %d\n", error);
- return error;
- }
device_enable_async_suspend(&sdev->sdev_dev);
error = device_add(&sdev->sdev_dev);
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 11/49] PNP: Add Haswell-ULT to Intel MCH size workaround
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (9 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 10/49] scsi: ignore errors from scsi_dh_add_device() Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 12/49] PNP: Add Broadwell " Greg Kroah-Hartman
` (38 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, robo, Josh Boyer, Rafael J. Wysocki,
Laura Abbott
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Boyer <jwboyer@fedoraproject.org>
commit ed1f0eeebaeeb7f790e9e7642116a208581e5bfc upstream.
Add device ID 0x0a04 for Haswell-ULT to the list of devices with MCH
problems.
>>From a Lenovo ThinkPad T440S:
[ 0.188604] pnp: PnP ACPI init
[ 0.189044] system 00:00: [mem 0x00000000-0x0009ffff] could not be reserved
[ 0.189048] system 00:00: [mem 0x000c0000-0x000c3fff] could not be reserved
[ 0.189050] system 00:00: [mem 0x000c4000-0x000c7fff] could not be reserved
[ 0.189052] system 00:00: [mem 0x000c8000-0x000cbfff] could not be reserved
[ 0.189054] system 00:00: [mem 0x000cc000-0x000cffff] could not be reserved
[ 0.189056] system 00:00: [mem 0x000d0000-0x000d3fff] has been reserved
[ 0.189058] system 00:00: [mem 0x000d4000-0x000d7fff] has been reserved
[ 0.189060] system 00:00: [mem 0x000d8000-0x000dbfff] has been reserved
[ 0.189061] system 00:00: [mem 0x000dc000-0x000dffff] has been reserved
[ 0.189063] system 00:00: [mem 0x000e0000-0x000e3fff] could not be reserved
[ 0.189065] system 00:00: [mem 0x000e4000-0x000e7fff] could not be reserved
[ 0.189067] system 00:00: [mem 0x000e8000-0x000ebfff] could not be reserved
[ 0.189069] system 00:00: [mem 0x000ec000-0x000effff] could not be reserved
[ 0.189071] system 00:00: [mem 0x000f0000-0x000fffff] could not be reserved
[ 0.189073] system 00:00: [mem 0x00100000-0xdf9fffff] could not be reserved
[ 0.189075] system 00:00: [mem 0xfec00000-0xfed3ffff] could not be reserved
[ 0.189078] system 00:00: [mem 0xfed4c000-0xffffffff] could not be reserved
[ 0.189082] system 00:00: Plug and Play ACPI device, IDs PNP0c01 (active)
[ 0.189216] system 00:01: [io 0x1800-0x189f] could not be reserved
[ 0.189220] system 00:01: [io 0x0800-0x087f] has been reserved
[ 0.189222] system 00:01: [io 0x0880-0x08ff] has been reserved
[ 0.189224] system 00:01: [io 0x0900-0x097f] has been reserved
[ 0.189226] system 00:01: [io 0x0980-0x09ff] has been reserved
[ 0.189229] system 00:01: [io 0x0a00-0x0a7f] has been reserved
[ 0.189231] system 00:01: [io 0x0a80-0x0aff] has been reserved
[ 0.189233] system 00:01: [io 0x0b00-0x0b7f] has been reserved
[ 0.189235] system 00:01: [io 0x0b80-0x0bff] has been reserved
[ 0.189238] system 00:01: [io 0x15e0-0x15ef] has been reserved
[ 0.189240] system 00:01: [io 0x1600-0x167f] has been reserved
[ 0.189242] system 00:01: [io 0x1640-0x165f] has been reserved
[ 0.189246] system 00:01: [mem 0xf8000000-0xfbffffff] could not be reserved
[ 0.189249] system 00:01: [mem 0x00000000-0x00000fff] could not be reserved
[ 0.189251] system 00:01: [mem 0xfed1c000-0xfed1ffff] has been reserved
[ 0.189254] system 00:01: [mem 0xfed10000-0xfed13fff] has been reserved
[ 0.189256] system 00:01: [mem 0xfed18000-0xfed18fff] has been reserved
[ 0.189258] system 00:01: [mem 0xfed19000-0xfed19fff] has been reserved
[ 0.189261] system 00:01: [mem 0xfed45000-0xfed4bfff] has been reserved
[ 0.189264] system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
[....]
[ 0.583653] resource sanity check: requesting [mem 0xfed10000-0xfed15fff], which spans more than pnp 00:01 [mem 0xfed10000-0xfed13fff]
[ 0.583654] ------------[ cut here ]------------
[ 0.583660] WARNING: CPU: 0 PID: 1 at arch/x86/mm/ioremap.c:198 __ioremap_caller+0x2c5/0x380()
[ 0.583661] Info: mapping multiple BARs. Your kernel is fine.
[ 0.583662] Modules linked in:
[ 0.583666] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.3.3-303.fc23.x86_64 #1
[ 0.583668] Hardware name: LENOVO 20AR001GXS/20AR001GXS, BIOS GJET86WW (2.36 ) 12/04/2015
[ 0.583670] 0000000000000000 0000000014cf7e59 ffff880214a1baf8 ffffffff813a625f
[ 0.583673] ffff880214a1bb40 ffff880214a1bb30 ffffffff810a07c2 00000000fed10000
[ 0.583675] ffffc90000cb8000 0000000000006000 0000000000000000 ffff8800d6381040
[ 0.583678] Call Trace:
[ 0.583683] [<ffffffff813a625f>] dump_stack+0x44/0x55
[ 0.583686] [<ffffffff810a07c2>] warn_slowpath_common+0x82/0xc0
[ 0.583688] [<ffffffff810a085c>] warn_slowpath_fmt+0x5c/0x80
[ 0.583692] [<ffffffff810a6fba>] ? iomem_map_sanity_check+0xba/0xd0
[ 0.583695] [<ffffffff81065835>] __ioremap_caller+0x2c5/0x380
[ 0.583698] [<ffffffff81065907>] ioremap_nocache+0x17/0x20
[ 0.583701] [<ffffffff8103a119>] snb_uncore_imc_init_box+0x79/0xb0
[ 0.583705] [<ffffffff81038900>] uncore_pci_probe+0xd0/0x1b0
[ 0.583707] [<ffffffff813efda5>] local_pci_probe+0x45/0xa0
[ 0.583710] [<ffffffff813f118d>] pci_device_probe+0xfd/0x140
[ 0.583713] [<ffffffff814d9b52>] driver_probe_device+0x222/0x480
[ 0.583715] [<ffffffff814d9e34>] __driver_attach+0x84/0x90
[ 0.583717] [<ffffffff814d9db0>] ? driver_probe_device+0x480/0x480
[ 0.583720] [<ffffffff814d762c>] bus_for_each_dev+0x6c/0xc0
[ 0.583722] [<ffffffff814d930e>] driver_attach+0x1e/0x20
[ 0.583724] [<ffffffff814d8e4b>] bus_add_driver+0x1eb/0x280
[ 0.583727] [<ffffffff81d6af1a>] ? uncore_cpu_setup+0x12/0x12
[ 0.583729] [<ffffffff814da680>] driver_register+0x60/0xe0
[ 0.583733] [<ffffffff813ef78c>] __pci_register_driver+0x4c/0x50
[ 0.583736] [<ffffffff81d6affc>] intel_uncore_init+0xe2/0x2e6
[ 0.583738] [<ffffffff81d6af1a>] ? uncore_cpu_setup+0x12/0x12
[ 0.583741] [<ffffffff81002123>] do_one_initcall+0xb3/0x200
[ 0.583745] [<ffffffff810be500>] ? parse_args+0x1a0/0x4a0
[ 0.583749] [<ffffffff81d5c1c8>] kernel_init_freeable+0x189/0x223
[ 0.583752] [<ffffffff81775c40>] ? rest_init+0x80/0x80
[ 0.583754] [<ffffffff81775c4e>] kernel_init+0xe/0xe0
[ 0.583758] [<ffffffff81781adf>] ret_from_fork+0x3f/0x70
[ 0.583760] [<ffffffff81775c40>] ? rest_init+0x80/0x80
[ 0.583765] ---[ end trace 077c426a39e018aa ]---
00:00.0 Host bridge [0600]: Intel Corporation Haswell-ULT DRAM Controller [8086:0a04] (rev 0b)
Subsystem: Lenovo Device [17aa:220c]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
Latency: 0
Capabilities: <access denied>
Kernel driver in use: hsw_uncore
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1300955
Tested-by: <robo@tcp.sk>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/pnp/quirks.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -342,6 +342,7 @@ static void quirk_amd_mmconfig_area(stru
/* Device IDs of parts that have 32KB MCH space */
static const unsigned int mch_quirk_devices[] = {
0x0154, /* Ivy Bridge */
+ 0x0a04, /* Haswell-ULT */
0x0c00, /* Haswell */
};
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 12/49] PNP: Add Broadwell to Intel MCH size workaround
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (10 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 11/49] PNP: Add Haswell-ULT to Intel MCH size workaround Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 13/49] HID: sony: do not bail out when the sixaxis refuses the output report Greg Kroah-Hartman
` (37 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Christophe Le Roy, Rafael J. Wysocki,
Laura Abbott
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Le Roy <christophe.fish@gmail.com>
commit a77060f07ffc6ac978e280e738302f3e5572a99e upstream.
Add device ID 0x1604 for Broadwell to commit cb171f7abb9a ("PNP:
Work around BIOS defects in Intel MCH area reporting").
>>From a Lenovo ThinkPad T550:
system 00:01: [io 0x1800-0x189f] could not be reserved
system 00:01: [io 0x0800-0x087f] has been reserved
system 00:01: [io 0x0880-0x08ff] has been reserved
system 00:01: [io 0x0900-0x097f] has been reserved
system 00:01: [io 0x0980-0x09ff] has been reserved
system 00:01: [io 0x0a00-0x0a7f] has been reserved
system 00:01: [io 0x0a80-0x0aff] has been reserved
system 00:01: [io 0x0b00-0x0b7f] has been reserved
system 00:01: [io 0x0b80-0x0bff] has been reserved
system 00:01: [io 0x15e0-0x15ef] has been reserved
system 00:01: [io 0x1600-0x167f] has been reserved
system 00:01: [io 0x1640-0x165f] has been reserved
system 00:01: [mem 0xf8000000-0xfbffffff] could not be reserved
system 00:01: [mem 0xfed1c000-0xfed1ffff] has been reserved
system 00:01: [mem 0xfed10000-0xfed13fff] has been reserved
system 00:01: [mem 0xfed18000-0xfed18fff] has been reserved
system 00:01: [mem 0xfed19000-0xfed19fff] has been reserved
system 00:01: [mem 0xfed45000-0xfed4bfff] has been reserved
system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
[...]
resource sanity check: requesting [mem 0xfed10000-0xfed15fff], which spans more than pnp 00:01 [mem 0xfed10000-0xfed13fff]
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1 at /build/linux-CrHvZ_/linux-4.2.6/arch/x86/mm/ioremap.c:198 __ioremap_caller+0x2ee/0x360()
Info: mapping multiple BARs. Your kernel is fine.
Modules linked in:
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.2.0-1-amd64 #1 Debian 4.2.6-1
Hardware name: LENOVO 20CKCTO1WW/20CKCTO1WW, BIOS N11ET34W (1.10 ) 08/20/2015
0000000000000000 ffffffff817e6868 ffffffff8154e2f6 ffff8802241efbf8
ffffffff8106e5b1 ffffc90000e98000 0000000000006000 ffffc90000e98000
0000000000006000 0000000000000000 ffffffff8106e62a ffffffff817e68c8
Call Trace:
[<ffffffff8154e2f6>] ? dump_stack+0x40/0x50
[<ffffffff8106e5b1>] ? warn_slowpath_common+0x81/0xb0
[<ffffffff8106e62a>] ? warn_slowpath_fmt+0x4a/0x50
[<ffffffff810742a3>] ? iomem_map_sanity_check+0xb3/0xc0
[<ffffffff8105dade>] ? __ioremap_caller+0x2ee/0x360
[<ffffffff81036ae6>] ? snb_uncore_imc_init_box+0x66/0x90
[<ffffffff810351a8>] ? uncore_pci_probe+0xc8/0x1a0
[<ffffffff81302d7f>] ? local_pci_probe+0x3f/0xa0
[<ffffffff81303ea4>] ? pci_device_probe+0xc4/0x110
[<ffffffff813d9b1e>] ? driver_probe_device+0x1ee/0x450
[<ffffffff813d9dfb>] ? __driver_attach+0x7b/0x80
[<ffffffff813d9d80>] ? driver_probe_device+0x450/0x450
[<ffffffff813d796a>] ? bus_for_each_dev+0x5a/0x90
[<ffffffff813d9091>] ? bus_add_driver+0x1f1/0x290
[<ffffffff81b37fa8>] ? uncore_cpu_setup+0xc/0xc
[<ffffffff813da73f>] ? driver_register+0x5f/0xe0
[<ffffffff81b38074>] ? intel_uncore_init+0xcc/0x2b0
[<ffffffff81b37fa8>] ? uncore_cpu_setup+0xc/0xc
[<ffffffff8100213e>] ? do_one_initcall+0xce/0x200
[<ffffffff8108a100>] ? parse_args+0x140/0x4e0
[<ffffffff81b2b0cb>] ? kernel_init_freeable+0x162/0x1e8
[<ffffffff815443f0>] ? rest_init+0x80/0x80
[<ffffffff815443fe>] ? kernel_init+0xe/0xf0
[<ffffffff81553e5f>] ? ret_from_fork+0x3f/0x70
[<ffffffff815443f0>] ? rest_init+0x80/0x80
---[ end trace 472e7959536abf12 ]---
00:00.0 Host bridge: Intel Corporation Broadwell-U Host Bridge -OPI (rev 09)
Subsystem: Lenovo Device 2223
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
Latency: 0
Capabilities: [e0] Vendor Specific Information: Len=0c <?>
Kernel driver in use: bdw_uncore
00: 86 80 04 16 06 00 90 20 09 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 aa 17 23 22
30: 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00
Signed-off-by: Christophe Le Roy <christophe.fish@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/pnp/quirks.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -344,6 +344,7 @@ static const unsigned int mch_quirk_devi
0x0154, /* Ivy Bridge */
0x0a04, /* Haswell-ULT */
0x0c00, /* Haswell */
+ 0x1604, /* Broadwell */
};
static struct pci_dev *get_intel_host(void)
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 13/49] HID: sony: do not bail out when the sixaxis refuses the output report
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (11 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 12/49] PNP: Add Broadwell " Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 14/49] x86/mm/32: Enable full randomization on i386 and X86_32 Greg Kroah-Hartman
` (36 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Benjamin Tissoires, Jiri Kosina,
Laura Abbott
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Tissoires <benjamin.tissoires@redhat.com>
commit 19f4c2ba869517048add62c202f9645b6adf5dfb upstream.
When setting the operational mode, some third party (Speedlink Strike-FX)
gamepads refuse the output report. Failing here means we refuse to
initialize the gamepad while this should be harmless.
The weird part is that the initial commit that added this: a7de9b8
("HID: sony: Enable Gasia third-party PS3 controllers") mentions this
very same controller as one requiring this output report.
Anyway, it's broken for one user at least, so let's change it.
We will report an error, but at least the controller should work.
And no, these devices present themselves as legacy Sony controllers
(VID:PID of 054C:0268, as in the official ones) so there are no ways
of discriminating them from the official ones.
https://bugzilla.redhat.com/show_bug.cgi?id=1255325
Reported-and-tested-by: Max Fedotov <thesourcehim@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/hid/hid-sony.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/hid/hid-sony.c
+++ b/drivers/hid/hid-sony.c
@@ -1418,8 +1418,10 @@ static int sixaxis_set_operational_usb(s
}
ret = hid_hw_output_report(hdev, buf, 1);
- if (ret < 0)
- hid_err(hdev, "can't set operational mode: step 3\n");
+ if (ret < 0) {
+ hid_info(hdev, "can't set operational mode: step 3, ignoring\n");
+ ret = 0;
+ }
out:
kfree(buf);
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 14/49] x86/mm/32: Enable full randomization on i386 and X86_32
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (12 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 13/49] HID: sony: do not bail out when the sixaxis refuses the output report Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 17/49] arm: oabi compat: add missing access checks Greg Kroah-Hartman
` (35 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Hector Marco-Gisbert,
Ismael Ripoll Ripoll, Kees Cook, Arjan van de Ven, Linus Torvalds,
Peter Zijlstra, Thomas Gleixner, akpm, Ingo Molnar, Laura Abbott
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hector Marco-Gisbert <hecmargi@upv.es>
commit 8b8addf891de8a00e4d39fc32f93f7c5eb8feceb upstream.
Currently on i386 and on X86_64 when emulating X86_32 in legacy mode, only
the stack and the executable are randomized but not other mmapped files
(libraries, vDSO, etc.). This patch enables randomization for the
libraries, vDSO and mmap requests on i386 and in X86_32 in legacy mode.
By default on i386 there are 8 bits for the randomization of the libraries,
vDSO and mmaps which only uses 1MB of VA.
This patch preserves the original randomness, using 1MB of VA out of 3GB or
4GB. We think that 1MB out of 3GB is not a big cost for having the ASLR.
The first obvious security benefit is that all objects are randomized (not
only the stack and the executable) in legacy mode which highly increases
the ASLR effectiveness, otherwise the attackers may use these
non-randomized areas. But also sensitive setuid/setgid applications are
more secure because currently, attackers can disable the randomization of
these applications by setting the ulimit stack to "unlimited". This is a
very old and widely known trick to disable the ASLR in i386 which has been
allowed for too long.
Another trick used to disable the ASLR was to set the ADDR_NO_RANDOMIZE
personality flag, but fortunately this doesn't work on setuid/setgid
applications because there is security checks which clear Security-relevant
flags.
This patch always randomizes the mmap_legacy_base address, removing the
possibility to disable the ASLR by setting the stack to "unlimited".
Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Acked-by: Ismael Ripoll Ripoll <iripoll@upv.es>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1457639460-5242-1-git-send-email-hecmargi@upv.es
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/mm/mmap.c | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -94,18 +94,6 @@ static unsigned long mmap_base(unsigned
}
/*
- * Bottom-up (legacy) layout on X86_32 did not support randomization, X86_64
- * does, but not when emulating X86_32
- */
-static unsigned long mmap_legacy_base(unsigned long rnd)
-{
- if (mmap_is_ia32())
- return TASK_UNMAPPED_BASE;
- else
- return TASK_UNMAPPED_BASE + rnd;
-}
-
-/*
* This function, called very early during the creation of a new
* process VM image, sets up which VM layout function to use:
*/
@@ -116,7 +104,7 @@ void arch_pick_mmap_layout(struct mm_str
if (current->flags & PF_RANDOMIZE)
random_factor = arch_mmap_rnd();
- mm->mmap_legacy_base = mmap_legacy_base(random_factor);
+ mm->mmap_legacy_base = TASK_UNMAPPED_BASE + random_factor;
if (mmap_is_legacy()) {
mm->mmap_base = mm->mmap_legacy_base;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 17/49] arm: oabi compat: add missing access checks
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (13 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 14/49] x86/mm/32: Enable full randomization on i386 and X86_32 Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 18/49] KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace Greg Kroah-Hartman
` (34 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Chiachih Wu, Kees Cook, Nicolas Pitre,
Dave Weinstein, Linus Torvalds
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Weinstein <olorin@google.com>
commit 7de249964f5578e67b99699c5f0b405738d820a2 upstream.
Add access checks to sys_oabi_epoll_wait() and sys_oabi_semtimedop().
This fixes CVE-2016-3857, a local privilege escalation under
CONFIG_OABI_COMPAT.
Reported-by: Chiachih Wu <wuchiachih@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dave Weinstein <olorin@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/arm/kernel/sys_oabi-compat.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -279,8 +279,12 @@ asmlinkage long sys_oabi_epoll_wait(int
mm_segment_t fs;
long ret, err, i;
- if (maxevents <= 0 || maxevents > (INT_MAX/sizeof(struct epoll_event)))
+ if (maxevents <= 0 ||
+ maxevents > (INT_MAX/sizeof(*kbuf)) ||
+ maxevents > (INT_MAX/sizeof(*events)))
return -EINVAL;
+ if (!access_ok(VERIFY_WRITE, events, sizeof(*events) * maxevents))
+ return -EFAULT;
kbuf = kmalloc(sizeof(*kbuf) * maxevents, GFP_KERNEL);
if (!kbuf)
return -ENOMEM;
@@ -317,6 +321,8 @@ asmlinkage long sys_oabi_semtimedop(int
if (nsops < 1 || nsops > SEMOPM)
return -EINVAL;
+ if (!access_ok(VERIFY_READ, tsops, sizeof(*tsops) * nsops))
+ return -EFAULT;
sops = kmalloc(sizeof(*sops) * nsops, GFP_KERNEL);
if (!sops)
return -ENOMEM;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 18/49] KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (14 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 17/49] arm: oabi compat: add missing access checks Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 19/49] Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL" Greg Kroah-Hartman
` (33 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Stephan Mueller, David Howells,
linux-mips, linux-security-module, keyrings, Ralf Baechle
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells <dhowells@redhat.com>
commit 20f06ed9f61a185c6dabd662c310bed6189470df upstream.
MIPS64 needs to use compat_sys_keyctl for 32-bit userspace rather than
calling sys_keyctl. The latter will work in a lot of cases, thereby hiding
the issue.
Reported-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: keyrings@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13832/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/mips/kernel/scall64-n32.S | 2 +-
arch/mips/kernel/scall64-o32.S | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -344,7 +344,7 @@ EXPORT(sysn32_call_table)
PTR sys_ni_syscall /* available, was setaltroot */
PTR sys_add_key
PTR sys_request_key
- PTR sys_keyctl /* 6245 */
+ PTR compat_sys_keyctl /* 6245 */
PTR sys_set_thread_area
PTR sys_inotify_init
PTR sys_inotify_add_watch
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -500,7 +500,7 @@ EXPORT(sys32_call_table)
PTR sys_ni_syscall /* available, was setaltroot */
PTR sys_add_key /* 4280 */
PTR sys_request_key
- PTR sys_keyctl
+ PTR compat_sys_keyctl
PTR sys_set_thread_area
PTR sys_inotify_init
PTR sys_inotify_add_watch /* 4285 */
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 19/49] Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL"
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (15 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 18/49] KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 20/49] apparmor: fix ref count leak when profile sha1 hash is read Greg Kroah-Hartman
` (32 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Steffen Maier, Michael Holzheu,
Heiko Carstens, Martin Schwidefsky
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
commit 5419447e2142d6ed68c9f5c1a28630b3a290a845 upstream.
This reverts commit 852ffd0f4e23248b47531058e531066a988434b5.
There are use cases where an intermediate boot kernel (1) uses kexec
to boot the final production kernel (2). For this scenario we should
provide the original boot information to the production kernel (2).
Therefore clearing the boot information during kexec() should not
be done.
Reported-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/s390/kernel/ipl.c | 7 -------
1 file changed, 7 deletions(-)
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -2070,13 +2070,6 @@ void s390_reset_system(void (*fn_pre)(vo
S390_lowcore.program_new_psw.addr =
PSW_ADDR_AMODE | (unsigned long) s390_base_pgm_handler;
- /*
- * Clear subchannel ID and number to signal new kernel that no CCW or
- * SCSI IPL has been done (for kexec and kdump)
- */
- S390_lowcore.subchannel_id = 0;
- S390_lowcore.subchannel_nr = 0;
-
/* Store status at absolute zero */
store_status();
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 20/49] apparmor: fix ref count leak when profile sha1 hash is read
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (16 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 19/49] Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL" Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 21/49] random: strengthen input validation for RNDADDTOENTCNT Greg Kroah-Hartman
` (31 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, John Johansen, Seth Arnold
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Johansen <john.johansen@canonical.com>
commit 0b938a2e2cf0b0a2c8bac9769111545aff0fee97 upstream.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
security/apparmor/apparmorfs.c | 1 +
1 file changed, 1 insertion(+)
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -331,6 +331,7 @@ static int aa_fs_seq_hash_show(struct se
seq_printf(seq, "%.2x", profile->hash[i]);
seq_puts(seq, "\n");
}
+ aa_put_profile(profile);
return 0;
}
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 21/49] random: strengthen input validation for RNDADDTOENTCNT
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (17 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 20/49] apparmor: fix ref count leak when profile sha1 hash is read Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 22/49] devpts: clean up interface to pty drivers Greg Kroah-Hartman
` (30 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dmitry Vyukov, Theodore Tso
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o <tytso@mit.edu>
commit 86a574de4590ffe6fd3f3ca34cdcf655a78e36ec upstream.
Don't allow RNDADDTOENTCNT or RNDADDENTROPY to accept a negative
entropy value. It doesn't make any sense to subtract from the entropy
counter, and it can trigger a warning:
random: negative entropy/overflow: pool input count -40000
------------[ cut here ]------------
WARNING: CPU: 3 PID: 6828 at drivers/char/random.c:670[< none
>] credit_entropy_bits+0x21e/0xad0 drivers/char/random.c:670
Modules linked in:
CPU: 3 PID: 6828 Comm: a.out Not tainted 4.7.0-rc4+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffffffff880b58e0 ffff88005dd9fcb0 ffffffff82cc838f ffffffff87158b40
fffffbfff1016b1c 0000000000000000 0000000000000000 ffffffff87158b40
ffffffff83283dae 0000000000000009 ffff88005dd9fcf8 ffffffff8136d27f
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82cc838f>] dump_stack+0x12e/0x18f lib/dump_stack.c:51
[<ffffffff8136d27f>] __warn+0x19f/0x1e0 kernel/panic.c:516
[<ffffffff8136d48c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:551
[<ffffffff83283dae>] credit_entropy_bits+0x21e/0xad0 drivers/char/random.c:670
[< inline >] credit_entropy_bits_safe drivers/char/random.c:734
[<ffffffff8328785d>] random_ioctl+0x21d/0x250 drivers/char/random.c:1546
[< inline >] vfs_ioctl fs/ioctl.c:43
[<ffffffff8185316c>] do_vfs_ioctl+0x18c/0xff0 fs/ioctl.c:674
[< inline >] SYSC_ioctl fs/ioctl.c:689
[<ffffffff8185405f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:680
[<ffffffff86a995c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
arch/x86/entry/entry_64.S:207
---[ end trace 5d4902b2ba842f1f ]---
This was triggered using the test program:
// autogenerated by syzkaller (http://github.com/google/syzkaller)
int main() {
int fd = open("/dev/random", O_RDWR);
int val = -5000;
ioctl(fd, RNDADDTOENTCNT, &val);
return 0;
}
It's harmless in that (a) only root can trigger it, and (b) after
complaining the code never does let the entropy count go negative, but
it's better to simply not allow this userspace from passing in a
negative entropy value altogether.
Google-Bug-Id: #29575089
Reported-By: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/char/random.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -722,15 +722,18 @@ retry:
}
}
-static void credit_entropy_bits_safe(struct entropy_store *r, int nbits)
+static int credit_entropy_bits_safe(struct entropy_store *r, int nbits)
{
const int nbits_max = (int)(~0U >> (ENTROPY_SHIFT + 1));
+ if (nbits < 0)
+ return -EINVAL;
+
/* Cap the value to avoid overflows */
nbits = min(nbits, nbits_max);
- nbits = max(nbits, -nbits_max);
credit_entropy_bits(r, nbits);
+ return 0;
}
/*********************************************************************
@@ -1542,8 +1545,7 @@ static long random_ioctl(struct file *f,
return -EPERM;
if (get_user(ent_count, p))
return -EFAULT;
- credit_entropy_bits_safe(&input_pool, ent_count);
- return 0;
+ return credit_entropy_bits_safe(&input_pool, ent_count);
case RNDADDENTROPY:
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -1557,8 +1559,7 @@ static long random_ioctl(struct file *f,
size);
if (retval < 0)
return retval;
- credit_entropy_bits_safe(&input_pool, ent_count);
- return 0;
+ return credit_entropy_bits_safe(&input_pool, ent_count);
case RNDZAPENTCNT:
case RNDCLEARPOOL:
/*
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 22/49] devpts: clean up interface to pty drivers
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (18 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 21/49] random: strengthen input validation for RNDADDTOENTCNT Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 23/49] x86/mm/pat: Add support of non-default PAT MSR setting Greg Kroah-Hartman
` (29 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Eric Biederman, Peter Anvin,
Andy Lutomirski, Al Viro, Peter Hurley, Serge Hallyn,
Willy Tarreau, Aurelien Jarno, Alan Cox, Jann Horn, Greg KH,
Jiri Slaby, Florian Weimer, Linus Torvalds, Francesco Ruggeri,
Herton R. Krzesinski
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds <torvalds@linux-foundation.org>
commit 67245ff332064c01b760afa7a384ccda024bfd24 upstream.
This gets rid of the horrible notion of having that
struct inode *ptmx_inode
be the linchpin of the interface between the pty code and devpts.
By de-emphasizing the ptmx inode, a lot of things actually get cleaner,
and we will have a much saner way forward. In particular, this will
allow us to associate with any particular devpts instance at open-time,
and not be artificially tied to one particular ptmx inode.
The patch itself is actually fairly straightforward, and apart from some
locking and return path cleanups it's pretty mechanical:
- the interfaces that devpts exposes all take "struct pts_fs_info *"
instead of "struct inode *ptmx_inode" now.
NOTE! The "struct pts_fs_info" thing is a completely opaque structure
as far as the pty driver is concerned: it's still declared entirely
internally to devpts. So the pty code can't actually access it in any
way, just pass it as a "cookie" to the devpts code.
- the "look up the pts fs info" is now a single clear operation, that
also does the reference count increment on the pts superblock.
So "devpts_add/del_ref()" is gone, and replaced by a "lookup and get
ref" operation (devpts_get_ref(inode)), along with a "put ref" op
(devpts_put_ref()).
- the pty master "tty->driver_data" field now contains the pts_fs_info,
not the ptmx inode.
- because we don't care about the ptmx inode any more as some kind of
base index, the ref counting can now drop the inode games - it just
gets the ref on the superblock.
- the pts_fs_info now has a back-pointer to the super_block. That's so
that we can easily look up the information we actually need. Although
quite often, the pts fs info was actually all we wanted, and not having
to look it up based on some magical inode makes things more
straightforward.
In particular, now that "devpts_get_ref(inode)" operation should really
be the *only* place we need to look up what devpts instance we're
associated with, and we do it exactly once, at ptmx_open() time.
The other side of this is that one ptmx node could now be associated
with multiple different devpts instances - you could have a single
/dev/ptmx node, and then have multiple mount namespaces with their own
instances of devpts mounted on /dev/pts/. And that's all perfectly sane
in a model where we just look up the pts instance at open time.
This will eventually allow us to get rid of our odd single-vs-multiple
pts instance model, but this patch in itself changes no semantics, only
an internal binding model.
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jann Horn <jann@thejh.net>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Francesco Ruggeri <fruggeri@arista.com>
Cc: "Herton R. Krzesinski" <herton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/tty/pty.c | 63 +++++++++++++++++++++-------------------------
fs/devpts/inode.c | 49 +++++++++++++++++------------------
include/linux/devpts_fs.h | 34 +++++++-----------------
3 files changed, 64 insertions(+), 82 deletions(-)
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -679,14 +679,14 @@ static void pty_unix98_remove(struct tty
/* this is called once with whichever end is closed last */
static void pty_unix98_shutdown(struct tty_struct *tty)
{
- struct inode *ptmx_inode;
+ struct pts_fs_info *fsi;
if (tty->driver->subtype == PTY_TYPE_MASTER)
- ptmx_inode = tty->driver_data;
+ fsi = tty->driver_data;
else
- ptmx_inode = tty->link->driver_data;
- devpts_kill_index(ptmx_inode, tty->index);
- devpts_del_ref(ptmx_inode);
+ fsi = tty->link->driver_data;
+ devpts_kill_index(fsi, tty->index);
+ devpts_put_ref(fsi);
}
static const struct tty_operations ptm_unix98_ops = {
@@ -738,6 +738,7 @@ static const struct tty_operations pty_u
static int ptmx_open(struct inode *inode, struct file *filp)
{
+ struct pts_fs_info *fsi;
struct tty_struct *tty;
struct inode *slave_inode;
int retval;
@@ -752,47 +753,41 @@ static int ptmx_open(struct inode *inode
if (retval)
return retval;
+ fsi = devpts_get_ref(inode, filp);
+ retval = -ENODEV;
+ if (!fsi)
+ goto out_free_file;
+
/* find a device that is not in use. */
mutex_lock(&devpts_mutex);
- index = devpts_new_index(inode);
- if (index < 0) {
- retval = index;
- mutex_unlock(&devpts_mutex);
- goto err_file;
- }
-
+ index = devpts_new_index(fsi);
mutex_unlock(&devpts_mutex);
- mutex_lock(&tty_mutex);
- tty = tty_init_dev(ptm_driver, index);
+ retval = index;
+ if (index < 0)
+ goto out_put_ref;
- if (IS_ERR(tty)) {
- retval = PTR_ERR(tty);
- goto out;
- }
+ mutex_lock(&tty_mutex);
+ tty = tty_init_dev(ptm_driver, index);
/* The tty returned here is locked so we can safely
drop the mutex */
mutex_unlock(&tty_mutex);
- set_bit(TTY_PTY_LOCK, &tty->flags); /* LOCK THE SLAVE */
- tty->driver_data = inode;
+ retval = PTR_ERR(tty);
+ if (IS_ERR(tty))
+ goto out;
/*
- * In the case where all references to ptmx inode are dropped and we
- * still have /dev/tty opened pointing to the master/slave pair (ptmx
- * is closed/released before /dev/tty), we must make sure that the inode
- * is still valid when we call the final pty_unix98_shutdown, thus we
- * hold an additional reference to the ptmx inode. For the same /dev/tty
- * last close case, we also need to make sure the super_block isn't
- * destroyed (devpts instance unmounted), before /dev/tty is closed and
- * on its release devpts_kill_index is called.
+ * From here on out, the tty is "live", and the index and
+ * fsi will be killed/put by the tty_release()
*/
- devpts_add_ref(inode);
+ set_bit(TTY_PTY_LOCK, &tty->flags); /* LOCK THE SLAVE */
+ tty->driver_data = fsi;
tty_add_file(tty, filp);
- slave_inode = devpts_pty_new(inode,
+ slave_inode = devpts_pty_new(fsi,
MKDEV(UNIX98_PTY_SLAVE_MAJOR, index), index,
tty->link);
if (IS_ERR(slave_inode)) {
@@ -811,12 +806,14 @@ static int ptmx_open(struct inode *inode
return 0;
err_release:
tty_unlock(tty);
+ // This will also put-ref the fsi
tty_release(inode, filp);
return retval;
out:
- mutex_unlock(&tty_mutex);
- devpts_kill_index(inode, index);
-err_file:
+ devpts_kill_index(fsi, index);
+out_put_ref:
+ devpts_put_ref(fsi);
+out_free_file:
tty_free_file(filp);
return retval;
}
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -128,6 +128,7 @@ static const match_table_t tokens = {
struct pts_fs_info {
struct ida allocated_ptys;
struct pts_mount_opts mount_opts;
+ struct super_block *sb;
struct dentry *ptmx_dentry;
};
@@ -358,7 +359,7 @@ static const struct super_operations dev
.show_options = devpts_show_options,
};
-static void *new_pts_fs_info(void)
+static void *new_pts_fs_info(struct super_block *sb)
{
struct pts_fs_info *fsi;
@@ -369,6 +370,7 @@ static void *new_pts_fs_info(void)
ida_init(&fsi->allocated_ptys);
fsi->mount_opts.mode = DEVPTS_DEFAULT_MODE;
fsi->mount_opts.ptmxmode = DEVPTS_DEFAULT_PTMX_MODE;
+ fsi->sb = sb;
return fsi;
}
@@ -384,7 +386,7 @@ devpts_fill_super(struct super_block *s,
s->s_op = &devpts_sops;
s->s_time_gran = 1;
- s->s_fs_info = new_pts_fs_info();
+ s->s_fs_info = new_pts_fs_info(s);
if (!s->s_fs_info)
goto fail;
@@ -524,17 +526,14 @@ static struct file_system_type devpts_fs
* to the System V naming convention
*/
-int devpts_new_index(struct inode *ptmx_inode)
+int devpts_new_index(struct pts_fs_info *fsi)
{
- struct super_block *sb = pts_sb_from_inode(ptmx_inode);
- struct pts_fs_info *fsi;
int index;
int ida_ret;
- if (!sb)
+ if (!fsi)
return -ENODEV;
- fsi = DEVPTS_SB(sb);
retry:
if (!ida_pre_get(&fsi->allocated_ptys, GFP_KERNEL))
return -ENOMEM;
@@ -564,11 +563,8 @@ retry:
return index;
}
-void devpts_kill_index(struct inode *ptmx_inode, int idx)
+void devpts_kill_index(struct pts_fs_info *fsi, int idx)
{
- struct super_block *sb = pts_sb_from_inode(ptmx_inode);
- struct pts_fs_info *fsi = DEVPTS_SB(sb);
-
mutex_lock(&allocated_ptys_lock);
ida_remove(&fsi->allocated_ptys, idx);
pty_count--;
@@ -578,21 +574,25 @@ void devpts_kill_index(struct inode *ptm
/*
* pty code needs to hold extra references in case of last /dev/tty close
*/
-
-void devpts_add_ref(struct inode *ptmx_inode)
+struct pts_fs_info *devpts_get_ref(struct inode *ptmx_inode, struct file *file)
{
- struct super_block *sb = pts_sb_from_inode(ptmx_inode);
+ struct super_block *sb;
+ struct pts_fs_info *fsi;
+
+ sb = pts_sb_from_inode(ptmx_inode);
+ if (!sb)
+ return NULL;
+ fsi = DEVPTS_SB(sb);
+ if (!fsi)
+ return NULL;
atomic_inc(&sb->s_active);
- ihold(ptmx_inode);
+ return fsi;
}
-void devpts_del_ref(struct inode *ptmx_inode)
+void devpts_put_ref(struct pts_fs_info *fsi)
{
- struct super_block *sb = pts_sb_from_inode(ptmx_inode);
-
- iput(ptmx_inode);
- deactivate_super(sb);
+ deactivate_super(fsi->sb);
}
/**
@@ -604,22 +604,21 @@ void devpts_del_ref(struct inode *ptmx_i
*
* The created inode is returned. Remove it from /dev/pts/ by devpts_pty_kill.
*/
-struct inode *devpts_pty_new(struct inode *ptmx_inode, dev_t device, int index,
+struct inode *devpts_pty_new(struct pts_fs_info *fsi, dev_t device, int index,
void *priv)
{
struct dentry *dentry;
- struct super_block *sb = pts_sb_from_inode(ptmx_inode);
+ struct super_block *sb;
struct inode *inode;
struct dentry *root;
- struct pts_fs_info *fsi;
struct pts_mount_opts *opts;
char s[12];
- if (!sb)
+ if (!fsi)
return ERR_PTR(-ENODEV);
+ sb = fsi->sb;
root = sb->s_root;
- fsi = DEVPTS_SB(sb);
opts = &fsi->mount_opts;
inode = new_inode(sb);
--- a/include/linux/devpts_fs.h
+++ b/include/linux/devpts_fs.h
@@ -15,38 +15,24 @@
#include <linux/errno.h>
+struct pts_fs_info;
+
#ifdef CONFIG_UNIX98_PTYS
-int devpts_new_index(struct inode *ptmx_inode);
-void devpts_kill_index(struct inode *ptmx_inode, int idx);
-void devpts_add_ref(struct inode *ptmx_inode);
-void devpts_del_ref(struct inode *ptmx_inode);
+/* Look up a pts fs info and get a ref to it */
+struct pts_fs_info *devpts_get_ref(struct inode *, struct file *);
+void devpts_put_ref(struct pts_fs_info *);
+
+int devpts_new_index(struct pts_fs_info *);
+void devpts_kill_index(struct pts_fs_info *, int);
+
/* mknod in devpts */
-struct inode *devpts_pty_new(struct inode *ptmx_inode, dev_t device, int index,
- void *priv);
+struct inode *devpts_pty_new(struct pts_fs_info *, dev_t, int, void *);
/* get private structure */
void *devpts_get_priv(struct inode *pts_inode);
/* unlink */
void devpts_pty_kill(struct inode *inode);
-#else
-
-/* Dummy stubs in the no-pty case */
-static inline int devpts_new_index(struct inode *ptmx_inode) { return -EINVAL; }
-static inline void devpts_kill_index(struct inode *ptmx_inode, int idx) { }
-static inline void devpts_add_ref(struct inode *ptmx_inode) { }
-static inline void devpts_del_ref(struct inode *ptmx_inode) { }
-static inline struct inode *devpts_pty_new(struct inode *ptmx_inode,
- dev_t device, int index, void *priv)
-{
- return ERR_PTR(-EINVAL);
-}
-static inline void *devpts_get_priv(struct inode *pts_inode)
-{
- return NULL;
-}
-static inline void devpts_pty_kill(struct inode *inode) { }
-
#endif
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 23/49] x86/mm/pat: Add support of non-default PAT MSR setting
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (19 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 22/49] devpts: clean up interface to pty drivers Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 24/49] x86/mm/pat: Add pat_disable() interface Greg Kroah-Hartman
` (28 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Toshi Kani, Thomas Gleixner,
Andrew Morton, Andy Lutomirski, Borislav Petkov, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Juergen Gross,
Linus Torvalds, Luis R. Rodriguez, Peter Zijlstra, Toshi Kani,
elliott, konrad.wilk, paul.gortmaker, xen-devel, Ingo Molnar
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshi Kani <toshi.kani@hpe.com>
commit 02f037d641dc6672be5cfe7875a48ab99b95b154 upstream.
In preparation for fixing a regression caused by:
9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is disabled")'
... PAT needs to support a case that PAT MSR is initialized with a
non-default value.
When pat_init() is called and PAT is disabled, it initializes the
PAT table with the BIOS default value. Xen, however, sets PAT MSR
with a non-default value to enable WC. This causes inconsistency
between the PAT table and PAT MSR when PAT is set to disable on Xen.
Change pat_init() to handle the PAT disable cases properly. Add
init_cache_modes() to handle two cases when PAT is set to disable.
1. CPU supports PAT: Set PAT table to be consistent with PAT MSR.
2. CPU does not support PAT: Set PAT table to be consistent with
PWT and PCD bits in a PTE.
Note, __init_cache_modes(), renamed from pat_init_cache_modes(),
will be changed to a static function in a later patch.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/include/asm/pat.h | 2 -
arch/x86/mm/pat.c | 73 ++++++++++++++++++++++++++++++++-------------
arch/x86/xen/enlighten.c | 2 -
3 files changed, 55 insertions(+), 22 deletions(-)
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -6,7 +6,7 @@
bool pat_enabled(void);
extern void pat_init(void);
-void pat_init_cache_modes(u64);
+void __init_cache_modes(u64);
extern int reserve_memtype(u64 start, u64 end,
enum page_cache_mode req_pcm, enum page_cache_mode *ret_pcm);
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -180,7 +180,7 @@ static enum page_cache_mode pat_get_cach
* configuration.
* Using lower indices is preferred, so we start with highest index.
*/
-void pat_init_cache_modes(u64 pat)
+void __init_cache_modes(u64 pat)
{
enum page_cache_mode cache;
char pat_msg[33];
@@ -206,9 +206,6 @@ static void pat_bsp_init(u64 pat)
return;
}
- if (!pat_enabled())
- goto done;
-
rdmsrl(MSR_IA32_CR_PAT, tmp_pat);
if (!tmp_pat) {
pat_disable("PAT MSR is 0, disabled.");
@@ -217,15 +214,11 @@ static void pat_bsp_init(u64 pat)
wrmsrl(MSR_IA32_CR_PAT, pat);
-done:
- pat_init_cache_modes(pat);
+ __init_cache_modes(pat);
}
static void pat_ap_init(u64 pat)
{
- if (!pat_enabled())
- return;
-
if (!cpu_has_pat) {
/*
* If this happens we are on a secondary CPU, but switched to
@@ -237,18 +230,32 @@ static void pat_ap_init(u64 pat)
wrmsrl(MSR_IA32_CR_PAT, pat);
}
-void pat_init(void)
+static void init_cache_modes(void)
{
- u64 pat;
- struct cpuinfo_x86 *c = &boot_cpu_data;
+ u64 pat = 0;
+ static int init_cm_done;
- if (!pat_enabled()) {
+ if (init_cm_done)
+ return;
+
+ if (boot_cpu_has(X86_FEATURE_PAT)) {
+ /*
+ * CPU supports PAT. Set PAT table to be consistent with
+ * PAT MSR. This case supports "nopat" boot option, and
+ * virtual machine environments which support PAT without
+ * MTRRs. In specific, Xen has unique setup to PAT MSR.
+ *
+ * If PAT MSR returns 0, it is considered invalid and emulates
+ * as No PAT.
+ */
+ rdmsrl(MSR_IA32_CR_PAT, pat);
+ }
+
+ if (!pat) {
/*
* No PAT. Emulate the PAT table that corresponds to the two
- * cache bits, PWT (Write Through) and PCD (Cache Disable). This
- * setup is the same as the BIOS default setup when the system
- * has PAT but the "nopat" boot option has been specified. This
- * emulated PAT table is used when MSR_IA32_CR_PAT returns 0.
+ * cache bits, PWT (Write Through) and PCD (Cache Disable).
+ * This setup is also the same as the BIOS default setup.
*
* PTE encoding:
*
@@ -265,10 +272,36 @@ void pat_init(void)
*/
pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+ }
+
+ __init_cache_modes(pat);
+
+ init_cm_done = 1;
+}
+
+/**
+ * pat_init - Initialize PAT MSR and PAT table
+ *
+ * This function initializes PAT MSR and PAT table with an OS-defined value
+ * to enable additional cache attributes, WC and WT.
+ *
+ * This function must be called on all CPUs using the specific sequence of
+ * operations defined in Intel SDM. mtrr_rendezvous_handler() provides this
+ * procedure for PAT.
+ */
+void pat_init(void)
+{
+ u64 pat;
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ if (!pat_enabled()) {
+ init_cache_modes();
+ return;
+ }
- } else if ((c->x86_vendor == X86_VENDOR_INTEL) &&
- (((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
- ((c->x86 == 0xf) && (c->x86_model <= 0x6)))) {
+ if ((c->x86_vendor == X86_VENDOR_INTEL) &&
+ (((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
+ ((c->x86 == 0xf) && (c->x86_model <= 0x6)))) {
/*
* PAT support with the lower four entries. Intel Pentium 2,
* 3, M, and 4 are affected by PAT errata, which makes the
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1632,7 +1632,7 @@ asmlinkage __visible void __init xen_sta
* configuration.
*/
rdmsrl(MSR_IA32_CR_PAT, pat);
- pat_init_cache_modes(pat);
+ __init_cache_modes(pat);
/* keep using Xen gdt for now; no urgent need to change it */
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 24/49] x86/mm/pat: Add pat_disable() interface
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (20 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 23/49] x86/mm/pat: Add support of non-default PAT MSR setting Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 25/49] x86/mm/pat: Replace cpu_has_pat with boot_cpu_has() Greg Kroah-Hartman
` (27 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Toshi Kani, Thomas Gleixner,
Andrew Morton, Andy Lutomirski, Borislav Petkov, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Juergen Gross,
Linus Torvalds, Luis R. Rodriguez, Peter Zijlstra, Robert Elliott,
Toshi Kani, konrad.wilk, paul.gortmaker, xen-devel, Ingo Molnar
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshi Kani <toshi.kani@hpe.com>
commit 224bb1e5d67ba0f2872c98002d6a6f991ac6fd4a upstream.
In preparation for fixing a regression caused by:
9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is disabled")
... PAT needs to provide an interface that prevents the OS from
initializing the PAT MSR.
PAT MSR initialization must be done on all CPUs using the specific
sequence of operations defined in the Intel SDM. This requires MTRRs
to be enabled since pat_init() is called as part of MTRR init
from mtrr_rendezvous_handler().
Make pat_disable() as the interface that prevents the OS from
initializing the PAT MSR. MTRR will call this interface when it
cannot provide the SDM-defined sequence to initialize PAT.
This also assures that pat_disable() called from pat_bsp_init()
will set the PAT table properly when CPU does not support PAT.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-3-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/include/asm/pat.h | 1 +
arch/x86/mm/pat.c | 13 ++++++++++++-
2 files changed, 13 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -5,6 +5,7 @@
#include <asm/pgtable_types.h>
bool pat_enabled(void);
+void pat_disable(const char *reason);
extern void pat_init(void);
void __init_cache_modes(u64);
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -39,11 +39,22 @@
static bool boot_cpu_done;
static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
+static void init_cache_modes(void);
-static inline void pat_disable(const char *reason)
+void pat_disable(const char *reason)
{
+ if (!__pat_enabled)
+ return;
+
+ if (boot_cpu_done) {
+ WARN_ONCE(1, "x86/PAT: PAT cannot be disabled after initialization\n");
+ return;
+ }
+
__pat_enabled = 0;
pr_info("x86/PAT: %s\n", reason);
+
+ init_cache_modes();
}
static int __init nopat(char *str)
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 25/49] x86/mm/pat: Replace cpu_has_pat with boot_cpu_has()
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (21 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 24/49] x86/mm/pat: Add pat_disable() interface Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 26/49] x86/mtrr: Fix Xorg crashes in Qemu sessions Greg Kroah-Hartman
` (26 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Borislav Petkov, Toshi Kani,
Thomas Gleixner, Andrew Morton, Andy Lutomirski, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Juergen Gross,
Linus Torvalds, Luis R. Rodriguez, Peter Zijlstra, Robert Elliott,
Toshi Kani, konrad.wilk, paul.gortmaker, xen-devel, Ingo Molnar
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshi Kani <toshi.kani@hpe.com>
commit d63dcf49cf5ae5605f4d14229e3888e104f294b1 upstream.
Borislav Petkov suggested:
> Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast
> paths static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX
> ugliness.
Replace the use of cpu_has_pat on init paths with boot_cpu_has().
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-4-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/mm/pat.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -212,7 +212,7 @@ static void pat_bsp_init(u64 pat)
{
u64 tmp_pat;
- if (!cpu_has_pat) {
+ if (!boot_cpu_has(X86_FEATURE_PAT)) {
pat_disable("PAT not supported by CPU.");
return;
}
@@ -230,7 +230,7 @@ static void pat_bsp_init(u64 pat)
static void pat_ap_init(u64 pat)
{
- if (!cpu_has_pat) {
+ if (!boot_cpu_has(X86_FEATURE_PAT)) {
/*
* If this happens we are on a secondary CPU, but switched to
* PAT on the boot CPU. We have no way to undo PAT.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 26/49] x86/mtrr: Fix Xorg crashes in Qemu sessions
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (22 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 25/49] x86/mm/pat: Replace cpu_has_pat with boot_cpu_has() Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 27/49] x86/mtrr: Fix PAT init handling when MTRR is disabled Greg Kroah-Hartman
` (25 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Toshi Kani, Thomas Gleixner,
Andrew Morton, Andy Lutomirski, Borislav Petkov, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Juergen Gross,
Linus Torvalds, Luis R. Rodriguez, Peter Zijlstra, Toshi Kani,
elliott, konrad.wilk, paul.gortmaker, xen-devel, Ingo Molnar
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshi Kani <toshi.kani@hpe.com>
commit edfe63ec97ed8d4496225f7ba54c9ce4207c5431 upstream.
A Xorg failure on qemu32 was reported as a regression [1] caused by
commit 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is disabled").
This patch fixes the Xorg crash.
Negative effects of this regression were the following two failures [2]
in Xorg on QEMU with QEMU CPU model "qemu32" (-cpu qemu32), which were
triggered by the fact that its virtual CPU does not support MTRRs.
#1. copy_process() failed in the check in reserve_pfn_range()
copy_process
copy_mm
dup_mm
dup_mmap
copy_page_range
track_pfn_copy
reserve_pfn_range
A WC map request was tracked as WC in memtype, which set a PTE as
UC (pgprot) per __cachemode2pte_tbl[]. This led to this error in
reserve_pfn_range() called from track_pfn_copy(), which obtained
a pgprot from a PTE. It converts pgprot to page_cache_mode, which
does not necessarily result in the original page_cache_mode since
__cachemode2pte_tbl[] redirects multiple types to UC.
#2. error path in copy_process() then hit WARN_ON_ONCE in
untrack_pfn().
x86/PAT: Xorg:509 map pfn expected mapping type uncached-
minus for [mem 0xfd000000-0xfdffffff], got write-combining
Call Trace:
dump_stack
warn_slowpath_common
? untrack_pfn
? untrack_pfn
warn_slowpath_null
untrack_pfn
? __kunmap_atomic
unmap_single_vma
? pagevec_move_tail_fn
unmap_vmas
exit_mmap
mmput
copy_process.part.47
_do_fork
SyS_clone
do_syscall_32_irqs_on
entry_INT80_32
These negative effects are caused by two separate bugs, but they
can be addressed in separate patches. Fixing the pat_init() issue
described below addresses the root cause, and avoids Xorg to hit
these cases.
When the CPU does not support MTRRs, MTRR does not call pat_init(),
which leaves PAT enabled without initializing PAT. This pat_init()
issue is a long-standing issue, but manifested as issue #1 (and then
hit issue #2) with the above-mentioned commit because the memtype
now tracks cache attribute with 'page_cache_mode'.
This pat_init() issue existed before the commit, but we used pgprot
in memtype. Hence, we did not have issue #1 before. But WC request
resulted in WT in effect because WC pgrot is actually WT when PAT
is not initialized. This is not how it was designed to work. When
PAT is set to disable properly, WC is converted to UC. The use of
WT can result in a system crash if the target range does not support
WT. Fortunately, nobody ran into such issue before.
To fix this pat_init() issue, PAT code has been enhanced to provide
pat_disable() interface. Call this interface when MTRRs are disabled.
By setting PAT to disable properly, PAT bypasses the memtype check,
and avoids issue #1.
[1]: https://lkml.org/lkml/2016/3/3/828
[2]: https://lkml.org/lkml/2016/3/4/775
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-5-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/include/asm/mtrr.h | 6 +++++-
arch/x86/kernel/cpu/mtrr/main.c | 10 +++++++++-
2 files changed, 14 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -24,6 +24,7 @@
#define _ASM_X86_MTRR_H
#include <uapi/asm/mtrr.h>
+#include <asm/pat.h>
/*
@@ -83,9 +84,12 @@ static inline int mtrr_trim_uncached_mem
static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
{
}
+static inline void mtrr_bp_init(void)
+{
+ pat_disable("MTRRs disabled, skipping PAT initialization too.");
+}
#define mtrr_ap_init() do {} while (0)
-#define mtrr_bp_init() do {} while (0)
#define set_mtrr_aps_delayed_init() do {} while (0)
#define mtrr_aps_init() do {} while (0)
#define mtrr_bp_restore() do {} while (0)
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -759,8 +759,16 @@ void __init mtrr_bp_init(void)
}
}
- if (!mtrr_enabled())
+ if (!mtrr_enabled()) {
pr_info("MTRR: Disabled\n");
+
+ /*
+ * PAT initialization relies on MTRR's rendezvous handler.
+ * Skip PAT init until the handler can initialize both
+ * features independently.
+ */
+ pat_disable("MTRRs disabled, skipping PAT initialization too.");
+ }
}
void mtrr_ap_init(void)
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 27/49] x86/mtrr: Fix PAT init handling when MTRR is disabled
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (23 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 26/49] x86/mtrr: Fix Xorg crashes in Qemu sessions Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 28/49] x86/xen, pat: Remove PAT table init code from Xen Greg Kroah-Hartman
` (24 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Toshi Kani, Thomas Gleixner,
Andrew Morton, Andy Lutomirski, Borislav Petkov, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Juergen Gross,
Linus Torvalds, Luis R. Rodriguez, Peter Zijlstra, Toshi Kani,
elliott, konrad.wilk, paul.gortmaker, xen-devel, Ingo Molnar
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshi Kani <toshi.kani@hpe.com>
commit ad025a73f0e9344ac73ffe1b74c184033e08e7d5 upstream.
get_mtrr_state() calls pat_init() on BSP even if MTRR is disabled.
This results in calling pat_init() on BSP only since APs do not call
pat_init() when MTRR is disabled. This inconsistency between BSP
and APs leads to undefined behavior.
Make BSP's calling condition to pat_init() consistent with AP's,
mtrr_ap_init() and mtrr_aps_init().
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-6-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/kernel/cpu/mtrr/generic.c | 24 ++++++++++++++----------
arch/x86/kernel/cpu/mtrr/main.c | 3 +++
arch/x86/kernel/cpu/mtrr/mtrr.h | 1 +
3 files changed, 18 insertions(+), 10 deletions(-)
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -444,11 +444,24 @@ static void __init print_mtrr_state(void
pr_debug("TOM2: %016llx aka %lldM\n", mtrr_tom2, mtrr_tom2>>20);
}
+/* PAT setup for BP. We need to go through sync steps here */
+void __init mtrr_bp_pat_init(void)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ prepare_set();
+
+ pat_init();
+
+ post_set();
+ local_irq_restore(flags);
+}
+
/* Grab all of the MTRR state for this CPU into *state */
bool __init get_mtrr_state(void)
{
struct mtrr_var_range *vrs;
- unsigned long flags;
unsigned lo, dummy;
unsigned int i;
@@ -481,15 +494,6 @@ bool __init get_mtrr_state(void)
mtrr_state_set = 1;
- /* PAT setup for BP. We need to go through sync steps here */
- local_irq_save(flags);
- prepare_set();
-
- pat_init();
-
- post_set();
- local_irq_restore(flags);
-
return !!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED);
}
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -752,6 +752,9 @@ void __init mtrr_bp_init(void)
/* BIOS may override */
__mtrr_enabled = get_mtrr_state();
+ if (mtrr_enabled())
+ mtrr_bp_pat_init();
+
if (mtrr_cleanup(phys_addr)) {
changed_by_mtrr_cleanup = 1;
mtrr_if->set_all();
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -52,6 +52,7 @@ void set_mtrr_prepare_save(struct set_mt
void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
bool get_mtrr_state(void);
+void mtrr_bp_pat_init(void);
extern void set_mtrr_ops(const struct mtrr_ops *ops);
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 28/49] x86/xen, pat: Remove PAT table init code from Xen
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (24 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 27/49] x86/mtrr: Fix PAT init handling when MTRR is disabled Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 29/49] x86/pat: Document the PAT initialization sequence Greg Kroah-Hartman
` (23 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Toshi Kani, Thomas Gleixner,
Juergen Gross, Andrew Morton, Andy Lutomirski, Borislav Petkov,
Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
Konrad Rzeszutek Wilk, Linus Torvalds, Luis R. Rodriguez,
Peter Zijlstra, Toshi Kani, elliott, paul.gortmaker, xen-devel,
Ingo Molnar
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshi Kani <toshi.kani@hpe.com>
commit 88ba281108ed0c25c9d292b48bd3f272fcb90dd0 upstream.
Xen supports PAT without MTRRs for its guests. In order to
enable WC attribute, it was necessary for xen_start_kernel()
to call pat_init_cache_modes() to update PAT table before
starting guest kernel.
Now that the kernel initializes PAT table to the BIOS handoff
state when MTRR is disabled, this Xen-specific PAT init code
is no longer necessary. Delete it from xen_start_kernel().
Also change __init_cache_modes() to a static function since
PAT table should not be tweaked by other modules.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-7-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/include/asm/pat.h | 1 -
arch/x86/mm/pat.c | 2 +-
arch/x86/xen/enlighten.c | 9 ---------
3 files changed, 1 insertion(+), 11 deletions(-)
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -7,7 +7,6 @@
bool pat_enabled(void);
void pat_disable(const char *reason);
extern void pat_init(void);
-void __init_cache_modes(u64);
extern int reserve_memtype(u64 start, u64 end,
enum page_cache_mode req_pcm, enum page_cache_mode *ret_pcm);
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -191,7 +191,7 @@ static enum page_cache_mode pat_get_cach
* configuration.
* Using lower indices is preferred, so we start with highest index.
*/
-void __init_cache_modes(u64 pat)
+static void __init_cache_modes(u64 pat)
{
enum page_cache_mode cache;
char pat_msg[33];
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -74,7 +74,6 @@
#include <asm/mach_traps.h>
#include <asm/mwait.h>
#include <asm/pci_x86.h>
-#include <asm/pat.h>
#include <asm/cpu.h>
#ifdef CONFIG_ACPI
@@ -1519,7 +1518,6 @@ asmlinkage __visible void __init xen_sta
{
struct physdev_set_iopl set_iopl;
unsigned long initrd_start = 0;
- u64 pat;
int rc;
if (!xen_start_info)
@@ -1627,13 +1625,6 @@ asmlinkage __visible void __init xen_sta
xen_start_info->nr_pages);
xen_reserve_special_pages();
- /*
- * Modify the cache mode translation tables to match Xen's PAT
- * configuration.
- */
- rdmsrl(MSR_IA32_CR_PAT, pat);
- __init_cache_modes(pat);
-
/* keep using Xen gdt for now; no urgent need to change it */
#ifdef CONFIG_X86_32
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 29/49] x86/pat: Document the PAT initialization sequence
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (25 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 28/49] x86/xen, pat: Remove PAT table init code from Xen Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 30/49] x86/mm/pat: Fix BUG_ON() in mmap_mem() on QEMU/i386 Greg Kroah-Hartman
` (22 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Toshi Kani, Thomas Gleixner,
Andrew Morton, Andy Lutomirski, Borislav Petkov, Borislav Petkov,
Brian Gerst, Denys Vlasenko, H. Peter Anvin, Juergen Gross,
Linus Torvalds, Luis R. Rodriguez, Peter Zijlstra, Toshi Kani,
elliott, konrad.wilk, paul.gortmaker, xen-devel, Ingo Molnar
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshi Kani <toshi.kani@hpe.com>
commit b6350c21cfe8aa9d65e189509a23c0ea4b8362c2 upstream.
Update PAT documentation to describe how PAT is initialized under
various configurations.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-8-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
Documentation/x86/pat.txt | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
--- a/Documentation/x86/pat.txt
+++ b/Documentation/x86/pat.txt
@@ -196,3 +196,35 @@ Another, more verbose way of getting PAT
"debugpat" boot parameter. With this parameter, various debug messages are
printed to dmesg log.
+PAT Initialization
+------------------
+
+The following table describes how PAT is initialized under various
+configurations. The PAT MSR must be updated by Linux in order to support WC
+and WT attributes. Otherwise, the PAT MSR has the value programmed in it
+by the firmware. Note, Xen enables WC attribute in the PAT MSR for guests.
+
+ MTRR PAT Call Sequence PAT State PAT MSR
+ =========================================================
+ E E MTRR -> PAT init Enabled OS
+ E D MTRR -> PAT init Disabled -
+ D E MTRR -> PAT disable Disabled BIOS
+ D D MTRR -> PAT disable Disabled -
+ - np/E PAT -> PAT disable Disabled BIOS
+ - np/D PAT -> PAT disable Disabled -
+ E !P/E MTRR -> PAT init Disabled BIOS
+ D !P/E MTRR -> PAT disable Disabled BIOS
+ !M !P/E MTRR stub -> PAT disable Disabled BIOS
+
+ Legend
+ ------------------------------------------------
+ E Feature enabled in CPU
+ D Feature disabled/unsupported in CPU
+ np "nopat" boot option specified
+ !P CONFIG_X86_PAT option unset
+ !M CONFIG_MTRR option unset
+ Enabled PAT state set to enabled
+ Disabled PAT state set to disabled
+ OS PAT initializes PAT MSR with OS setting
+ BIOS PAT keeps PAT MSR with BIOS setting
+
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 30/49] x86/mm/pat: Fix BUG_ON() in mmap_mem() on QEMU/i386
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (26 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 29/49] x86/pat: Document the PAT initialization sequence Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 31/49] drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2) Greg Kroah-Hartman
` (21 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, kernel test robot, Borislav Petkov,
Toshi Kani, Andrew Morton, David Vrabel, Linus Torvalds,
Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, xen-devel,
Ingo Molnar
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshi Kani <toshi.kani@hpe.com>
commit 1886297ce0c8d563a08c8a8c4c0b97743e06cd37 upstream.
The following BUG_ON() crash was reported on QEMU/i386:
kernel BUG at arch/x86/mm/physaddr.c:79!
Call Trace:
phys_mem_access_prot_allowed
mmap_mem
? mmap_region
mmap_region
do_mmap
vm_mmap_pgoff
SyS_mmap_pgoff
do_int80_syscall_32
entry_INT80_32
after commit:
edfe63ec97ed ("x86/mtrr: Fix Xorg crashes in Qemu sessions")
PAT is now set to disabled state when MTRRs are disabled.
Thus, reactivating the __pa(high_memory) check in
phys_mem_access_prot_allowed().
When CONFIG_DEBUG_VIRTUAL is set, __pa() calls __phys_addr(),
which in turn calls slow_virt_to_phys() for 'high_memory'.
Because 'high_memory' is set to (the max direct mapped virt
addr + 1), it is not a valid virtual address. Hence,
slow_virt_to_phys() returns 0 and hit the BUG_ON. Using
__pa_nodebug() instead of __pa() will fix this BUG_ON.
However, this code block, originally written for Pentiums and
earlier, is no longer adequate since a 32-bit Xen guest has
MTRRs disabled and supports ZONE_HIGHMEM. In this setup,
this code sets UC attribute for accessing RAM in high memory
range.
Delete this code block as it has been unused for a long time.
Reported-by: kernel test robot <ying.huang@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1460403360-25441-1-git-send-email-toshi.kani@hpe.com
Link: https://lkml.org/lkml/2016/4/1/608
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/mm/pat.c | 19 -------------------
1 file changed, 19 deletions(-)
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -777,25 +777,6 @@ int phys_mem_access_prot_allowed(struct
if (file->f_flags & O_DSYNC)
pcm = _PAGE_CACHE_MODE_UC_MINUS;
-#ifdef CONFIG_X86_32
- /*
- * On the PPro and successors, the MTRRs are used to set
- * memory types for physical addresses outside main memory,
- * so blindly setting UC or PWT on those pages is wrong.
- * For Pentiums and earlier, the surround logic should disable
- * caching for the high addresses through the KEN pin, but
- * we maintain the tradition of paranoia in this code.
- */
- if (!pat_enabled() &&
- !(boot_cpu_has(X86_FEATURE_MTRR) ||
- boot_cpu_has(X86_FEATURE_K6_MTRR) ||
- boot_cpu_has(X86_FEATURE_CYRIX_ARR) ||
- boot_cpu_has(X86_FEATURE_CENTAUR_MCR)) &&
- (pfn << PAGE_SHIFT) >= __pa(high_memory)) {
- pcm = _PAGE_CACHE_MODE_UC;
- }
-#endif
-
*vma_prot = __pgprot((pgprot_val(*vma_prot) & ~_PAGE_CACHE_MASK) |
cachemode2protval(pcm));
return 1;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 31/49] drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2)
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (27 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 30/49] x86/mm/pat: Fix BUG_ON() in mmap_mem() on QEMU/i386 Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 32/49] mm: memcontrol: fix cgroup creation failure after many small jobs Greg Kroah-Hartman
` (20 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, simdev11, manfred.kitzbichler,
drm-intel-fixes, Matt Roper, Jani Nikula, Jay
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matt Roper <matthew.d.roper@intel.com>
commit e2e407dc093f530b771ee8bf8fe1be41e3cea8b3 upstream.
Due to our lack of two-step watermark programming, our driver has
historically pretended that the cursor plane is always on for the
purpose of watermark calculations; this helps avoid serious flickering
when the cursor turns off/on (e.g., when the user moves the mouse
pointer to a different screen). That workaround was accidentally
dropped as we started working toward atomic watermark updates. Since we
still aren't quite there yet with two-stage updates, we need to
resurrect the workaround and treat the cursor as always active.
v2: Tweak cursor width calculations slightly to more closely match the
logic we used before the atomic overhaul began. (Ville)
Cc: simdev11@outlook.com
Cc: manfred.kitzbichler@gmail.com
Cc: drm-intel-fixes@lists.freedesktop.org
Reported-by: simdev11@outlook.com
Reported-by: manfred.kitzbichler@gmail.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93892
Fixes: 43d59eda1 ("drm/i915: Eliminate usage of plane_wm_parameters from ILK-style WM code (v2)")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1454479611-6804-1-git-send-email-matthew.d.roper@intel.com
(cherry picked from commit b2435692dbb709d4c8ff3b2f2815c9b8423b72bb)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1454958328-30129-1-git-send-email-matthew.d.roper@intel.com
Tested-by: Jay <mymailclone@t-online.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/gpu/drm/i915/intel_pm.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -1789,16 +1789,20 @@ static uint32_t ilk_compute_cur_wm(const
const struct intel_plane_state *pstate,
uint32_t mem_value)
{
- int bpp = pstate->base.fb ? pstate->base.fb->bits_per_pixel / 8 : 0;
+ /*
+ * We treat the cursor plane as always-on for the purposes of watermark
+ * calculation. Until we have two-stage watermark programming merged,
+ * this is necessary to avoid flickering.
+ */
+ int cpp = 4;
+ int width = pstate->visible ? pstate->base.crtc_w : 64;
- if (!cstate->base.active || !pstate->visible)
+ if (!cstate->base.active)
return 0;
return ilk_wm_method2(ilk_pipe_pixel_rate(cstate),
cstate->base.adjusted_mode.crtc_htotal,
- drm_rect_width(&pstate->dst),
- bpp,
- mem_value);
+ width, cpp, mem_value);
}
/* Only for WM_LP. */
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 32/49] mm: memcontrol: fix cgroup creation failure after many small jobs
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (28 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 31/49] drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2) Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 33/49] mm: memcontrol: fix swap counter leak on swapout from offline cgroup Greg Kroah-Hartman
` (19 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Johannes Weiner, John Garcia,
Vladimir Davydov, Tejun Heo, Nikolay Borisov, Andrew Morton,
Linus Torvalds, Michal Hocko
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Weiner <hannes@cmpxchg.org>
commit 73f576c04b9410ed19660f74f97521bee6e1c546 upstream.
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears. At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild. Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs. Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.
Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.
Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later. They pose no hurdle.
Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages. And those
references are under the user's control, so they are manageable.
This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that. This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.
This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:
set -e
mkdir -p pages
for x in `seq 128000`; do
[ $((x % 1000)) -eq 0 ] && echo $x
mkdir /cgroup/foo
echo $$ >/cgroup/foo/cgroup.procs
echo trex >pages/$x
echo $$ >/cgroup/cgroup.procs
rmdir /cgroup/foo
done
When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:
[root@ham ~]# ./cssidstress.sh
[...]
65000
mkdir: cannot create directory '/cgroup/foo': No space left on device
After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.
[hannes@cmpxchg.org: init the IDR]
Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: John Garcia <john.garcia@mesosphere.io>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/memcontrol.h | 8 +++
mm/memcontrol.c | 95 ++++++++++++++++++++++++++++++++++++---------
mm/slab_common.c | 2
3 files changed, 85 insertions(+), 20 deletions(-)
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -174,6 +174,11 @@ struct mem_cgroup_thresholds {
struct mem_cgroup_threshold_ary *spare;
};
+struct mem_cgroup_id {
+ int id;
+ atomic_t ref;
+};
+
/*
* The memory controller data structure. The memory controller controls both
* page cache and RSS per cgroup. We would eventually like to provide
@@ -183,6 +188,9 @@ struct mem_cgroup_thresholds {
struct mem_cgroup {
struct cgroup_subsys_state css;
+ /* Private memcg ID. Used to ID objects that outlive the cgroup */
+ struct mem_cgroup_id id;
+
/* Accounted resources */
struct page_counter memory;
struct page_counter memsw;
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -272,21 +272,7 @@ static inline bool mem_cgroup_is_root(st
static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg)
{
- return memcg->css.id;
-}
-
-/*
- * A helper function to get mem_cgroup from ID. must be called under
- * rcu_read_lock(). The caller is responsible for calling
- * css_tryget_online() if the mem_cgroup is used for charging. (dropping
- * refcnt from swap can be called against removed memcg.)
- */
-static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
-{
- struct cgroup_subsys_state *css;
-
- css = css_from_id(id, &memory_cgrp_subsys);
- return mem_cgroup_from_css(css);
+ return memcg->id.id;
}
/* Writing them here to avoid exposing memcg's inner layout */
@@ -4124,6 +4110,60 @@ static struct cftype mem_cgroup_legacy_f
{ }, /* terminate */
};
+/*
+ * Private memory cgroup IDR
+ *
+ * Swap-out records and page cache shadow entries need to store memcg
+ * references in constrained space, so we maintain an ID space that is
+ * limited to 16 bit (MEM_CGROUP_ID_MAX), limiting the total number of
+ * memory-controlled cgroups to 64k.
+ *
+ * However, there usually are many references to the oflline CSS after
+ * the cgroup has been destroyed, such as page cache or reclaimable
+ * slab objects, that don't need to hang on to the ID. We want to keep
+ * those dead CSS from occupying IDs, or we might quickly exhaust the
+ * relatively small ID space and prevent the creation of new cgroups
+ * even when there are much fewer than 64k cgroups - possibly none.
+ *
+ * Maintain a private 16-bit ID space for memcg, and allow the ID to
+ * be freed and recycled when it's no longer needed, which is usually
+ * when the CSS is offlined.
+ *
+ * The only exception to that are records of swapped out tmpfs/shmem
+ * pages that need to be attributed to live ancestors on swapin. But
+ * those references are manageable from userspace.
+ */
+
+static DEFINE_IDR(mem_cgroup_idr);
+
+static void mem_cgroup_id_get(struct mem_cgroup *memcg)
+{
+ atomic_inc(&memcg->id.ref);
+}
+
+static void mem_cgroup_id_put(struct mem_cgroup *memcg)
+{
+ if (atomic_dec_and_test(&memcg->id.ref)) {
+ idr_remove(&mem_cgroup_idr, memcg->id.id);
+ memcg->id.id = 0;
+
+ /* Memcg ID pins CSS */
+ css_put(&memcg->css);
+ }
+}
+
+/**
+ * mem_cgroup_from_id - look up a memcg from a memcg id
+ * @id: the memcg id to look up
+ *
+ * Caller must hold rcu_read_lock().
+ */
+struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
+{
+ WARN_ON_ONCE(!rcu_read_lock_held());
+ return idr_find(&mem_cgroup_idr, id);
+}
+
static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
{
struct mem_cgroup_per_node *pn;
@@ -4171,17 +4211,27 @@ static struct mem_cgroup *mem_cgroup_all
if (!memcg)
return NULL;
+ memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
+ 1, MEM_CGROUP_ID_MAX,
+ GFP_KERNEL);
+ if (memcg->id.id < 0)
+ goto out_free;
+
memcg->stat = alloc_percpu(struct mem_cgroup_stat_cpu);
if (!memcg->stat)
- goto out_free;
+ goto out_idr;
if (memcg_wb_domain_init(memcg, GFP_KERNEL))
goto out_free_stat;
+ idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
return memcg;
out_free_stat:
free_percpu(memcg->stat);
+out_idr:
+ if (memcg->id.id > 0)
+ idr_remove(&mem_cgroup_idr, memcg->id.id);
out_free:
kfree(memcg);
return NULL;
@@ -4277,8 +4327,9 @@ mem_cgroup_css_online(struct cgroup_subs
struct mem_cgroup *parent = mem_cgroup_from_css(css->parent);
int ret;
- if (css->id > MEM_CGROUP_ID_MAX)
- return -ENOSPC;
+ /* Online state pins memcg ID, memcg ID pins CSS */
+ mem_cgroup_id_get(mem_cgroup_from_css(css));
+ css_get(css);
if (!parent)
return 0;
@@ -4352,6 +4403,8 @@ static void mem_cgroup_css_offline(struc
memcg_deactivate_kmem(memcg);
wb_memcg_offline(memcg);
+
+ mem_cgroup_id_put(memcg);
}
static void mem_cgroup_css_released(struct cgroup_subsys_state *css)
@@ -5685,6 +5738,7 @@ void mem_cgroup_swapout(struct page *pag
if (!memcg)
return;
+ mem_cgroup_id_get(memcg);
oldid = swap_cgroup_record(entry, mem_cgroup_id(memcg));
VM_BUG_ON_PAGE(oldid, page);
mem_cgroup_swap_statistics(memcg, true);
@@ -5703,6 +5757,9 @@ void mem_cgroup_swapout(struct page *pag
VM_BUG_ON(!irqs_disabled());
mem_cgroup_charge_statistics(memcg, page, -1);
memcg_check_events(memcg, page);
+
+ if (!mem_cgroup_is_root(memcg))
+ css_put(&memcg->css);
}
/**
@@ -5726,7 +5783,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
if (!mem_cgroup_is_root(memcg))
page_counter_uncharge(&memcg->memsw, 1);
mem_cgroup_swap_statistics(memcg, false);
- css_put(&memcg->css);
+ mem_cgroup_id_put(memcg);
}
rcu_read_unlock();
}
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -522,7 +522,7 @@ void memcg_create_kmem_cache(struct mem_
cgroup_name(css->cgroup, memcg_name_buf, sizeof(memcg_name_buf));
cache_name = kasprintf(GFP_KERNEL, "%s(%d:%s)", root_cache->name,
- css->id, memcg_name_buf);
+ css->serial_nr, memcg_name_buf);
if (!cache_name)
goto out_unlock;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 33/49] mm: memcontrol: fix swap counter leak on swapout from offline cgroup
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (29 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 32/49] mm: memcontrol: fix cgroup creation failure after many small jobs Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 34/49] mm: memcontrol: fix memcg id ref counter on swap charge move Greg Kroah-Hartman
` (18 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Vladimir Davydov, Johannes Weiner,
Michal Hocko, Andrew Morton, Linus Torvalds
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Davydov <vdavydov@virtuozzo.com>
commit 1f47b61fb4077936465dcde872a4e5cc4fe708da upstream.
An offline memory cgroup might have anonymous memory or shmem left
charged to it and no swap. Since only swap entries pin the id of an
offline cgroup, such a cgroup will have no id and so an attempt to
swapout its anon/shmem will not store memory cgroup info in the swap
cgroup map. As a result, memcg->swap or memcg->memsw will never get
uncharged from it and any of its ascendants.
Fix this by always charging swapout to the first ancestor cgroup that
hasn't released its id yet.
[hannes@cmpxchg.org: add comment to mem_cgroup_swapout]
[vdavydov@virtuozzo.com: use WARN_ON_ONCE() in mem_cgroup_id_get_online()]
Link: http://lkml.kernel.org/r/20160803123445.GJ13263@esperanza
Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Link: http://lkml.kernel.org/r/5336daa5c9a32e776067773d9da655d2dc126491.1470219853.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org> [3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/memcontrol.c | 37 +++++++++++++++++++++++++++++++++----
1 file changed, 33 insertions(+), 4 deletions(-)
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4141,6 +4141,24 @@ static void mem_cgroup_id_get(struct mem
atomic_inc(&memcg->id.ref);
}
+static struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg)
+{
+ while (!atomic_inc_not_zero(&memcg->id.ref)) {
+ /*
+ * The root cgroup cannot be destroyed, so it's refcount must
+ * always be >= 1.
+ */
+ if (WARN_ON_ONCE(memcg == root_mem_cgroup)) {
+ VM_BUG_ON(1);
+ break;
+ }
+ memcg = parent_mem_cgroup(memcg);
+ if (!memcg)
+ memcg = root_mem_cgroup;
+ }
+ return memcg;
+}
+
static void mem_cgroup_id_put(struct mem_cgroup *memcg)
{
if (atomic_dec_and_test(&memcg->id.ref)) {
@@ -5723,7 +5741,7 @@ subsys_initcall(mem_cgroup_init);
*/
void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
{
- struct mem_cgroup *memcg;
+ struct mem_cgroup *memcg, *swap_memcg;
unsigned short oldid;
VM_BUG_ON_PAGE(PageLRU(page), page);
@@ -5738,16 +5756,27 @@ void mem_cgroup_swapout(struct page *pag
if (!memcg)
return;
- mem_cgroup_id_get(memcg);
- oldid = swap_cgroup_record(entry, mem_cgroup_id(memcg));
+ /*
+ * In case the memcg owning these pages has been offlined and doesn't
+ * have an ID allocated to it anymore, charge the closest online
+ * ancestor for the swap instead and transfer the memory+swap charge.
+ */
+ swap_memcg = mem_cgroup_id_get_online(memcg);
+ oldid = swap_cgroup_record(entry, mem_cgroup_id(swap_memcg));
VM_BUG_ON_PAGE(oldid, page);
- mem_cgroup_swap_statistics(memcg, true);
+ mem_cgroup_swap_statistics(swap_memcg, true);
page->mem_cgroup = NULL;
if (!mem_cgroup_is_root(memcg))
page_counter_uncharge(&memcg->memory, 1);
+ if (memcg != swap_memcg) {
+ if (!mem_cgroup_is_root(swap_memcg))
+ page_counter_charge(&swap_memcg->memsw, 1);
+ page_counter_uncharge(&memcg->memsw, 1);
+ }
+
/*
* Interrupts should be disabled here because the caller holds the
* mapping->tree_lock lock which is taken with interrupts-off. It is
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 34/49] mm: memcontrol: fix memcg id ref counter on swap charge move
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (30 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 33/49] mm: memcontrol: fix swap counter leak on swapout from offline cgroup Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 35/49] x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace Greg Kroah-Hartman
` (17 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Vladimir Davydov, Michal Hocko,
Johannes Weiner, Andrew Morton, Linus Torvalds
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Davydov <vdavydov@virtuozzo.com>
commit 615d66c37c755c49ce022c9e5ac0875d27d2603d upstream.
Since commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure
after many small jobs") swap entries do not pin memcg->css.refcnt
directly. Instead, they pin memcg->id.ref. So we should adjust the
reference counters accordingly when moving swap charges between cgroups.
Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Link: http://lkml.kernel.org/r/9ce297c64954a42dc90b543bc76106c4a94f07e8.1470219853.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/memcontrol.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4136,9 +4136,9 @@ static struct cftype mem_cgroup_legacy_f
static DEFINE_IDR(mem_cgroup_idr);
-static void mem_cgroup_id_get(struct mem_cgroup *memcg)
+static void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n)
{
- atomic_inc(&memcg->id.ref);
+ atomic_add(n, &memcg->id.ref);
}
static struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg)
@@ -4159,9 +4159,9 @@ static struct mem_cgroup *mem_cgroup_id_
return memcg;
}
-static void mem_cgroup_id_put(struct mem_cgroup *memcg)
+static void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n)
{
- if (atomic_dec_and_test(&memcg->id.ref)) {
+ if (atomic_sub_and_test(n, &memcg->id.ref)) {
idr_remove(&mem_cgroup_idr, memcg->id.id);
memcg->id.id = 0;
@@ -4170,6 +4170,16 @@ static void mem_cgroup_id_put(struct mem
}
}
+static inline void mem_cgroup_id_get(struct mem_cgroup *memcg)
+{
+ mem_cgroup_id_get_many(memcg, 1);
+}
+
+static inline void mem_cgroup_id_put(struct mem_cgroup *memcg)
+{
+ mem_cgroup_id_put_many(memcg, 1);
+}
+
/**
* mem_cgroup_from_id - look up a memcg from a memcg id
* @id: the memcg id to look up
@@ -4856,6 +4866,8 @@ static void __mem_cgroup_clear_mc(void)
if (!mem_cgroup_is_root(mc.from))
page_counter_uncharge(&mc.from->memsw, mc.moved_swap);
+ mem_cgroup_id_put_many(mc.from, mc.moved_swap);
+
/*
* we charged both to->memory and to->memsw, so we
* should uncharge to->memory.
@@ -4863,9 +4875,9 @@ static void __mem_cgroup_clear_mc(void)
if (!mem_cgroup_is_root(mc.to))
page_counter_uncharge(&mc.to->memory, mc.moved_swap);
- css_put_many(&mc.from->css, mc.moved_swap);
+ mem_cgroup_id_get_many(mc.to, mc.moved_swap);
+ css_put_many(&mc.to->css, mc.moved_swap);
- /* we've already done css_get(mc.to) */
mc.moved_swap = 0;
}
memcg_oom_recover(from);
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 35/49] x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (31 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 34/49] mm: memcontrol: fix memcg id ref counter on swap charge move Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 36/49] block: fix use-after-free in seq file Greg Kroah-Hartman
` (16 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Stephan Mueller, David Howells,
Andy Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
H. Peter Anvin, Josh Poimboeuf, Linus Torvalds, Peter Zijlstra,
Thomas Gleixner, keyrings, linux-security-module, Ingo Molnar
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells <dhowells@redhat.com>
commit f7d665627e103e82d34306c7d3f6f46f387c0d8b upstream.
x86_64 needs to use compat_sys_keyctl for 32-bit userspace rather than
calling sys_keyctl(). The latter will work in a lot of cases, thereby
hiding the issue.
Reported-by: Stephan Mueller <smueller@chronox.de>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keyrings@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Link: http://lkml.kernel.org/r/146961615805.14395.5581949237156769439.stgit@warthog.procyon.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/entry/syscalls/syscall_32.tbl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -294,7 +294,7 @@
# 285 sys_setaltroot
286 i386 add_key sys_add_key
287 i386 request_key sys_request_key
-288 i386 keyctl sys_keyctl
+288 i386 keyctl sys_keyctl compat_sys_keyctl
289 i386 ioprio_set sys_ioprio_set
290 i386 ioprio_get sys_ioprio_get
291 i386 inotify_init sys_inotify_init
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 36/49] block: fix use-after-free in seq file
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (32 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 35/49] x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 37/49] sysv, ipc: fix security-layer leaking Greg Kroah-Hartman
` (15 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Vegard Nossum, Tejun Heo, Jens Axboe
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vegard Nossum <vegard.nossum@oracle.com>
commit 77da160530dd1dc94f6ae15a981f24e5f0021e84 upstream.
I got a KASAN report of use-after-free:
==================================================================
BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
Read of size 8 by task trinity-c1/315
=============================================================================
BUG kmalloc-32 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
___slab_alloc+0x4f1/0x520
__slab_alloc.isra.58+0x56/0x80
kmem_cache_alloc_trace+0x260/0x2a0
disk_seqf_start+0x66/0x110
traverse+0x176/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
__slab_free+0x17a/0x2c0
kfree+0x20a/0x220
disk_seqf_stop+0x42/0x50
traverse+0x3b5/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G B 4.7.0+ #62
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
Call Trace:
[<ffffffff81d6ce81>] dump_stack+0x65/0x84
[<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
[<ffffffff814704ff>] object_err+0x2f/0x40
[<ffffffff814754d1>] kasan_report_error+0x221/0x520
[<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
[<ffffffff83888161>] klist_iter_exit+0x61/0x70
[<ffffffff82404389>] class_dev_iter_exit+0x9/0x10
[<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
[<ffffffff8151f812>] seq_read+0x4b2/0x11a0
[<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
[<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
[<ffffffff814b4c45>] do_readv_writev+0x565/0x660
[<ffffffff814b8a17>] vfs_readv+0x67/0xa0
[<ffffffff814b8de6>] do_preadv+0x126/0x170
[<ffffffff814b92ec>] SyS_preadv+0xc/0x10
This problem can occur in the following situation:
open()
- pread()
- .seq_start()
- iter = kmalloc() // succeeds
- seqf->private = iter
- .seq_stop()
- kfree(seqf->private)
- pread()
- .seq_start()
- iter = kmalloc() // fails
- .seq_stop()
- class_dev_iter_exit(seqf->private) // boom! old pointer
As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.
An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
block/genhd.c | 1 +
1 file changed, 1 insertion(+)
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -831,6 +831,7 @@ static void disk_seqf_stop(struct seq_fi
if (iter) {
class_dev_iter_exit(iter);
kfree(iter);
+ seqf->private = NULL;
}
}
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 37/49] sysv, ipc: fix security-layer leaking
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (33 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 36/49] block: fix use-after-free in seq file Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 38/49] fuse: fsync() did not return IO errors Greg Kroah-Hartman
` (14 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Fabian Frederick, Davidlohr Bueso,
Manfred Spraul, Andrew Morton, Linus Torvalds
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fabian Frederick <fabf@skynet.be>
commit 9b24fef9f0410fb5364245d6cc2bd044cc064007 upstream.
Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.
Running LTP msgsnd06 with kmemleak gives the following:
cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88003c0a11f8 (size 8):
comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
hex dump (first 8 bytes):
1b 00 00 00 01 00 00 00 ........
backtrace:
kmemleak_alloc+0x23/0x40
kmem_cache_alloc_trace+0xe1/0x180
selinux_msg_queue_alloc_security+0x3f/0xd0
security_msg_queue_alloc+0x2e/0x40
newque+0x4e/0x150
ipcget+0x159/0x1b0
SyS_msgget+0x39/0x40
entry_SYSCALL_64_fastpath+0x13/0x8f
Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
only use ipc_rcu_free in case of security allocation failure in newary()
Fixes: 53dad6d3a8e ("ipc: fix race with LSMs")
Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
ipc/msg.c | 2 +-
ipc/sem.c | 12 ++++++------
2 files changed, 7 insertions(+), 7 deletions(-)
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -680,7 +680,7 @@ long do_msgsnd(int msqid, long mtype, vo
rcu_read_lock();
ipc_lock_object(&msq->q_perm);
- ipc_rcu_putref(msq, ipc_rcu_free);
+ ipc_rcu_putref(msq, msg_rcu_free);
/* raced with RMID? */
if (!ipc_valid_object(&msq->q_perm)) {
err = -EIDRM;
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -442,7 +442,7 @@ static inline struct sem_array *sem_obta
static inline void sem_lock_and_putref(struct sem_array *sma)
{
sem_lock(sma, NULL, -1);
- ipc_rcu_putref(sma, ipc_rcu_free);
+ ipc_rcu_putref(sma, sem_rcu_free);
}
static inline void sem_rmid(struct ipc_namespace *ns, struct sem_array *s)
@@ -1385,7 +1385,7 @@ static int semctl_main(struct ipc_namesp
rcu_read_unlock();
sem_io = ipc_alloc(sizeof(ushort)*nsems);
if (sem_io == NULL) {
- ipc_rcu_putref(sma, ipc_rcu_free);
+ ipc_rcu_putref(sma, sem_rcu_free);
return -ENOMEM;
}
@@ -1419,20 +1419,20 @@ static int semctl_main(struct ipc_namesp
if (nsems > SEMMSL_FAST) {
sem_io = ipc_alloc(sizeof(ushort)*nsems);
if (sem_io == NULL) {
- ipc_rcu_putref(sma, ipc_rcu_free);
+ ipc_rcu_putref(sma, sem_rcu_free);
return -ENOMEM;
}
}
if (copy_from_user(sem_io, p, nsems*sizeof(ushort))) {
- ipc_rcu_putref(sma, ipc_rcu_free);
+ ipc_rcu_putref(sma, sem_rcu_free);
err = -EFAULT;
goto out_free;
}
for (i = 0; i < nsems; i++) {
if (sem_io[i] > SEMVMX) {
- ipc_rcu_putref(sma, ipc_rcu_free);
+ ipc_rcu_putref(sma, sem_rcu_free);
err = -ERANGE;
goto out_free;
}
@@ -1722,7 +1722,7 @@ static struct sem_undo *find_alloc_undo(
/* step 2: allocate new undo structure */
new = kzalloc(sizeof(struct sem_undo) + sizeof(short)*nsems, GFP_KERNEL);
if (!new) {
- ipc_rcu_putref(sma, ipc_rcu_free);
+ ipc_rcu_putref(sma, sem_rcu_free);
return ERR_PTR(-ENOMEM);
}
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 38/49] fuse: fsync() did not return IO errors
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (34 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 37/49] sysv, ipc: fix security-layer leaking Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 39/49] fuse: fuse_flush must check mapping->flags for errors Greg Kroah-Hartman
` (13 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Alexey Kuznetsov, Maxim Patlasov,
Miklos Szeredi
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Kuznetsov <kuznet@parallels.com>
commit ac7f052b9e1534c8248f814b6f0068ad8d4a06d2 upstream.
Due to implementation of fuse writeback filemap_write_and_wait_range() does
not catch errors. We have to do this directly after fuse_sync_writes()
Signed-off-by: Alexey Kuznetsov <kuznet@virtuozzo.com>
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/fuse/file.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -462,6 +462,21 @@ int fuse_fsync_common(struct file *file,
goto out;
fuse_sync_writes(inode);
+
+ /*
+ * Due to implementation of fuse writeback
+ * filemap_write_and_wait_range() does not catch errors.
+ * We have to do this directly after fuse_sync_writes()
+ */
+ if (test_bit(AS_ENOSPC, &file->f_mapping->flags) &&
+ test_and_clear_bit(AS_ENOSPC, &file->f_mapping->flags))
+ err = -ENOSPC;
+ if (test_bit(AS_EIO, &file->f_mapping->flags) &&
+ test_and_clear_bit(AS_EIO, &file->f_mapping->flags))
+ err = -EIO;
+ if (err)
+ goto out;
+
err = sync_inode_metadata(inode, 1);
if (err)
goto out;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 39/49] fuse: fuse_flush must check mapping->flags for errors
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (35 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 38/49] fuse: fsync() did not return IO errors Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 40/49] fuse: fix wrong assignment of ->flags in fuse_send_init() Greg Kroah-Hartman
` (12 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Maxim Patlasov, Miklos Szeredi
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maxim Patlasov <mpatlasov@virtuozzo.com>
commit 9ebce595f63a407c5cec98f98f9da8459b73740a upstream.
fuse_flush() calls write_inode_now() that triggers writeback, but actual
writeback will happen later, on fuse_sync_writes(). If an error happens,
fuse_writepage_end() will set error bit in mapping->flags. So, we have to
check mapping->flags after fuse_sync_writes().
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/fuse/file.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -417,6 +417,15 @@ static int fuse_flush(struct file *file,
fuse_sync_writes(inode);
mutex_unlock(&inode->i_mutex);
+ if (test_bit(AS_ENOSPC, &file->f_mapping->flags) &&
+ test_and_clear_bit(AS_ENOSPC, &file->f_mapping->flags))
+ err = -ENOSPC;
+ if (test_bit(AS_EIO, &file->f_mapping->flags) &&
+ test_and_clear_bit(AS_EIO, &file->f_mapping->flags))
+ err = -EIO;
+ if (err)
+ return err;
+
req = fuse_get_req_nofail_nopages(fc, file);
memset(&inarg, 0, sizeof(inarg));
inarg.fh = ff->fh;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 40/49] fuse: fix wrong assignment of ->flags in fuse_send_init()
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (36 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 39/49] fuse: fuse_flush must check mapping->flags for errors Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 41/49] fs/dcache.c: avoid soft-lockup in dput() Greg Kroah-Hartman
` (11 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Wei Fang, Miklos Szeredi
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wei Fang <fangwei1@huawei.com>
commit 9446385f05c9af25fed53dbed3cc75763730be52 upstream.
FUSE_HAS_IOCTL_DIR should be assigned to ->flags, it may be a typo.
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 69fe05c90ed5 ("fuse: add missing INIT flags")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/fuse/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -926,7 +926,7 @@ static void fuse_send_init(struct fuse_c
arg->flags |= FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC |
FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES | FUSE_DONT_MASK |
FUSE_SPLICE_WRITE | FUSE_SPLICE_MOVE | FUSE_SPLICE_READ |
- FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA |
+ FUSE_FLOCK_LOCKS | FUSE_HAS_IOCTL_DIR | FUSE_AUTO_INVAL_DATA |
FUSE_DO_READDIRPLUS | FUSE_READDIRPLUS_AUTO | FUSE_ASYNC_DIO |
FUSE_WRITEBACK_CACHE | FUSE_NO_OPEN_SUPPORT;
req->in.h.opcode = FUSE_INIT;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 41/49] fs/dcache.c: avoid soft-lockup in dput()
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (37 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 40/49] fuse: fix wrong assignment of ->flags in fuse_send_init() Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 42/49] crypto: gcm - Filter out async ghash if necessary Greg Kroah-Hartman
` (10 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Wei Fang, Al Viro
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wei Fang <fangwei1@huawei.com>
commit 47be61845c775643f1aa4d2a54343549f943c94c upstream.
We triggered soft-lockup under stress test which
open/access/write/close one file concurrently on more than
five different CPUs:
WARN: soft lockup - CPU#0 stuck for 11s! [who:30631]
...
[<ffffffc0003986f8>] dput+0x100/0x298
[<ffffffc00038c2dc>] terminate_walk+0x4c/0x60
[<ffffffc00038f56c>] path_lookupat+0x5cc/0x7a8
[<ffffffc00038f780>] filename_lookup+0x38/0xf0
[<ffffffc000391180>] user_path_at_empty+0x78/0xd0
[<ffffffc0003911f4>] user_path_at+0x1c/0x28
[<ffffffc00037d4fc>] SyS_faccessat+0xb4/0x230
->d_lock trylock may failed many times because of concurrently
operations, and dput() may execute a long time.
Fix this by replacing cpu_relax() with cond_resched().
dput() used to be sleepable, so make it sleepable again
should be safe.
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/dcache.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -578,7 +578,6 @@ static struct dentry *dentry_kill(struct
failed:
spin_unlock(&dentry->d_lock);
- cpu_relax();
return dentry; /* try again with same dentry */
}
@@ -752,6 +751,8 @@ void dput(struct dentry *dentry)
return;
repeat:
+ might_sleep();
+
rcu_read_lock();
if (likely(fast_dput(dentry))) {
rcu_read_unlock();
@@ -783,8 +784,10 @@ repeat:
kill_it:
dentry = dentry_kill(dentry);
- if (dentry)
+ if (dentry) {
+ cond_resched();
goto repeat;
+ }
}
EXPORT_SYMBOL(dput);
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 42/49] crypto: gcm - Filter out async ghash if necessary
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (38 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 41/49] fs/dcache.c: avoid soft-lockup in dput() Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 43/49] crypto: scatterwalk - Fix test in scatterwalk_done Greg Kroah-Hartman
` (9 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Herbert Xu
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herbert Xu <herbert@gondor.apana.org.au>
commit b30bdfa86431afbafe15284a3ad5ac19b49b88e3 upstream.
As it is if you ask for a sync gcm you may actually end up with
an async one because it does not filter out async implementations
of ghash.
This patch fixes this by adding the necessary filter when looking
for ghash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
crypto/gcm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -639,7 +639,9 @@ static int crypto_gcm_create_common(stru
ghash_alg = crypto_find_alg(ghash_name, &crypto_ahash_type,
CRYPTO_ALG_TYPE_HASH,
- CRYPTO_ALG_TYPE_AHASH_MASK);
+ CRYPTO_ALG_TYPE_AHASH_MASK |
+ crypto_requires_sync(algt->type,
+ algt->mask));
if (IS_ERR(ghash_alg))
return PTR_ERR(ghash_alg);
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 43/49] crypto: scatterwalk - Fix test in scatterwalk_done
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (39 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 42/49] crypto: gcm - Filter out async ghash if necessary Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 44/49] ext4: check for extents that wrap around Greg Kroah-Hartman
` (8 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Herbert Xu
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herbert Xu <herbert@gondor.apana.org.au>
commit 5f070e81bee35f1b7bd1477bb223a873ff657803 upstream.
When there is more data to be processed, the current test in
scatterwalk_done may prevent us from calling pagedone even when
we should.
In particular, if we're on an SG entry spanning multiple pages
where the last page is not a full page, we will incorrectly skip
calling pagedone on the second last page.
This patch fixes this by adding a separate test for whether we've
reached the end of a page.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
crypto/scatterwalk.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/crypto/scatterwalk.c
+++ b/crypto/scatterwalk.c
@@ -72,7 +72,8 @@ static void scatterwalk_pagedone(struct
void scatterwalk_done(struct scatter_walk *walk, int out, int more)
{
- if (!(scatterwalk_pagelen(walk) & (PAGE_SIZE - 1)) || !more)
+ if (!more || walk->offset >= walk->sg->offset + walk->sg->length ||
+ !(walk->offset & (PAGE_SIZE - 1)))
scatterwalk_pagedone(walk, out, more);
}
EXPORT_SYMBOL_GPL(scatterwalk_done);
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 44/49] ext4: check for extents that wrap around
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (40 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 43/49] crypto: scatterwalk - Fix test in scatterwalk_done Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 45/49] ext4: fix deadlock during page writeback Greg Kroah-Hartman
` (7 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Eryu Guan, Phil Turnbull,
Vegard Nossum, Theodore Tso
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vegard Nossum <vegard.nossum@oracle.com>
commit f70749ca42943faa4d4dcce46dfdcaadb1d0c4b6 upstream.
An extent with lblock = 4294967295 and len = 1 will pass the
ext4_valid_extent() test:
ext4_lblk_t last = lblock + len - 1;
if (len == 0 || lblock > last)
return 0;
since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
We can simplify it by removing the - 1 altogether and changing the test
to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
lblock and it fails, and if len > 0 then lblock + len > lblock in order
to pass (i.e. it doesn't overflow).
Fixes: 5946d0893 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Fixes: 2f974865f ("ext4: check for zero length extent explicitly")
Cc: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/ext4/extents.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -376,9 +376,13 @@ static int ext4_valid_extent(struct inod
ext4_fsblk_t block = ext4_ext_pblock(ext);
int len = ext4_ext_get_actual_len(ext);
ext4_lblk_t lblock = le32_to_cpu(ext->ee_block);
- ext4_lblk_t last = lblock + len - 1;
- if (len == 0 || lblock > last)
+ /*
+ * We allow neither:
+ * - zero length
+ * - overflow/wrap-around
+ */
+ if (lblock + len <= lblock)
return 0;
return ext4_data_block_valid(EXT4_SB(inode->i_sb), block, len);
}
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 45/49] ext4: fix deadlock during page writeback
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (41 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 44/49] ext4: check for extents that wrap around Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 46/49] ext4: dont call ext4_should_journal_data() on the journal inode Greg Kroah-Hartman
` (6 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Theodore Tso, Jan Kara
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara <jack@suse.cz>
commit 646caa9c8e196880b41cd3e3d33a2ebc752bdb85 upstream.
Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
deadlock in ext4_writepages() which was previously much harder to hit.
After this commit xfstest generic/130 reproduces the deadlock on small
filesystems.
The problem happens when ext4_do_update_inode() sets LARGE_FILE feature
and marks current inode handle as synchronous. That subsequently results
in ext4_journal_stop() called from ext4_writepages() to block waiting for
transaction commit while still holding page locks, reference to io_end,
and some prepared bio in mpd structure each of which can possibly block
transaction commit from completing and thus results in deadlock.
Fix the problem by releasing page locks, io_end reference, and
submitting prepared bio before calling ext4_journal_stop().
[ Changed to defer the call to ext4_journal_stop() only if the handle
is synchronous. --tytso ]
Reported-and-tested-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/ext4/inode.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2589,13 +2589,36 @@ retry:
done = true;
}
}
- ext4_journal_stop(handle);
+ /*
+ * Caution: If the handle is synchronous,
+ * ext4_journal_stop() can wait for transaction commit
+ * to finish which may depend on writeback of pages to
+ * complete or on page lock to be released. In that
+ * case, we have to wait until after after we have
+ * submitted all the IO, released page locks we hold,
+ * and dropped io_end reference (for extent conversion
+ * to be able to complete) before stopping the handle.
+ */
+ if (!ext4_handle_valid(handle) || handle->h_sync == 0) {
+ ext4_journal_stop(handle);
+ handle = NULL;
+ }
/* Submit prepared bio */
ext4_io_submit(&mpd.io_submit);
/* Unlock pages we didn't use */
mpage_release_unused_pages(&mpd, give_up_on_write);
- /* Drop our io_end reference we got from init */
- ext4_put_io_end(mpd.io_submit.io_end);
+ /*
+ * Drop our io_end reference we got from init. We have
+ * to be careful and use deferred io_end finishing if
+ * we are still holding the transaction as we can
+ * release the last reference to io_end which may end
+ * up doing unwritten extent conversion.
+ */
+ if (handle) {
+ ext4_put_io_end_defer(mpd.io_submit.io_end);
+ ext4_journal_stop(handle);
+ } else
+ ext4_put_io_end(mpd.io_submit.io_end);
if (ret == -ENOSPC && sbi->s_journal) {
/*
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 46/49] ext4: dont call ext4_should_journal_data() on the journal inode
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (42 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 45/49] ext4: fix deadlock during page writeback Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 47/49] ext4: validate s_reserved_gdt_blocks on mount Greg Kroah-Hartman
` (5 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Jan Kara, Vegard Nossum, Theodore Tso
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vegard Nossum <vegard.nossum@oracle.com>
commit 6a7fd522a7c94cdef0a3b08acf8e6702056e635c upstream.
If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
to call ext4_should_journal_data() before superblock options and flags
are fully set up. In that case, the iput() on the journal inode can
end up causing a BUG().
Work around this problem by reordering the tests so we only call
ext4_should_journal_data() after we know it's not the journal inode.
Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data")
Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang")
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/ext4/inode.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -205,9 +205,9 @@ void ext4_evict_inode(struct inode *inod
* Note that directories do not have this problem because they
* don't use page cache.
*/
- if (ext4_should_journal_data(inode) &&
- (S_ISLNK(inode->i_mode) || S_ISREG(inode->i_mode)) &&
- inode->i_ino != EXT4_JOURNAL_INO) {
+ if (inode->i_ino != EXT4_JOURNAL_INO &&
+ ext4_should_journal_data(inode) &&
+ (S_ISLNK(inode->i_mode) || S_ISREG(inode->i_mode))) {
journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
tid_t commit_tid = EXT4_I(inode)->i_datasync_tid;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 47/49] ext4: validate s_reserved_gdt_blocks on mount
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (43 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 46/49] ext4: dont call ext4_should_journal_data() on the journal inode Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 48/49] ext4: short-cut orphan cleanup on error Greg Kroah-Hartman
` (4 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Vegard Nossum, Theodore Tso
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o <tytso@mit.edu>
commit 5b9554dc5bf008ae7f68a52e3d7e76c0920938a2 upstream.
If s_reserved_gdt_blocks is extremely large, it's possible for
ext4_init_block_bitmap(), which is called when ext4 sets up an
uninitialized block bitmap, to corrupt random kernel memory. Add the
same checks which e2fsck has --- it must never be larger than
blocksize / sizeof(__u32) --- and then add a backup check in
ext4_init_block_bitmap() in case the superblock gets modified after
the file system is mounted.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/ext4/balloc.c | 3 +++
fs/ext4/super.c | 7 +++++++
2 files changed, 10 insertions(+)
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -208,6 +208,9 @@ static int ext4_init_block_bitmap(struct
memset(bh->b_data, 0, sb->s_blocksize);
bit_max = ext4_num_base_meta_clusters(sb, block_group);
+ if ((bit_max >> 3) >= bh->b_size)
+ return -EFSCORRUPTED;
+
for (bit = 0; bit < bit_max; bit++)
ext4_set_bit(bit, bh->b_data);
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3372,6 +3372,13 @@ static int ext4_fill_super(struct super_
goto failed_mount;
}
+ if (le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks) > (blocksize / 4)) {
+ ext4_msg(sb, KERN_ERR,
+ "Number of reserved GDT blocks insanely large: %d",
+ le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks));
+ goto failed_mount;
+ }
+
if (sbi->s_mount_opt & EXT4_MOUNT_DAX) {
if (blocksize != PAGE_SIZE) {
ext4_msg(sb, KERN_ERR,
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 48/49] ext4: short-cut orphan cleanup on error
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (44 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 47/49] ext4: validate s_reserved_gdt_blocks on mount Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 49/49] ext4: fix reference counting bug on block allocation error Greg Kroah-Hartman
` (3 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Jan Kara, Vegard Nossum, Theodore Tso
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vegard Nossum <vegard.nossum@oracle.com>
commit c65d5c6c81a1f27dec5f627f67840726fcd146de upstream.
If we encounter a filesystem error during orphan cleanup, we should stop.
Otherwise, we may end up in an infinite loop where the same inode is
processed again and again.
EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
Aborting journal on device loop0-8.
EXT4-fs (loop0): Remounting filesystem read-only
EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
[...]
See-also: c9eb13a9105 ("ext4: fix hang when processing corrupted orphaned inode list")
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/ext4/super.c | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2240,6 +2240,16 @@ static void ext4_orphan_cleanup(struct s
while (es->s_last_orphan) {
struct inode *inode;
+ /*
+ * We may have encountered an error during cleanup; if
+ * so, skip the rest.
+ */
+ if (EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS) {
+ jbd_debug(1, "Skipping orphan recovery on fs with errors.\n");
+ es->s_last_orphan = 0;
+ break;
+ }
+
inode = ext4_orphan_get(sb, le32_to_cpu(es->s_last_orphan));
if (IS_ERR(inode)) {
es->s_last_orphan = 0;
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH 4.4 49/49] ext4: fix reference counting bug on block allocation error
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (45 preceding siblings ...)
2016-08-14 20:23 ` [PATCH 4.4 48/49] ext4: short-cut orphan cleanup on error Greg Kroah-Hartman
@ 2016-08-14 20:23 ` Greg Kroah-Hartman
[not found] ` <57b11059.c4ebc20a.96bf2.6869@mx.google.com>
` (2 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-14 20:23 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Vegard Nossum, Theodore Tso,
Aneesh Kumar K.V
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vegard Nossum <vegard.nossum@oracle.com>
commit 554a5ccc4e4a20c5f3ec859de0842db4b4b9c77e upstream.
If we hit this error when mounted with errors=continue or
errors=remount-ro:
EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata
then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
continue. However, ext4_mb_release_context() is the wrong thing to call
here since we are still actually using the allocation context.
Instead, just error out. We could retry the allocation, but there is a
possibility of getting stuck in an infinite loop instead, so this seems
safer.
[ Fixed up so we don't return EAGAIN to userspace. --tytso ]
Fixes: 8556e8f3b6 ("ext4: Don't allow new groups to be added during block allocation")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/ext4/mballoc.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2932,7 +2932,7 @@ ext4_mb_mark_diskspace_used(struct ext4_
ext4_error(sb, "Allocating blocks %llu-%llu which overlap "
"fs metadata", block, block+len);
/* File system mounted not to panic on error
- * Fix the bitmap and repeat the block allocation
+ * Fix the bitmap and return EFSCORRUPTED
* We leak some of the blocks here.
*/
ext4_lock_group(sb, ac->ac_b_ex.fe_group);
@@ -2941,7 +2941,7 @@ ext4_mb_mark_diskspace_used(struct ext4_
ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
if (!err)
- err = -EAGAIN;
+ err = -EFSCORRUPTED;
goto out_err;
}
@@ -4506,18 +4506,7 @@ repeat:
}
if (likely(ac->ac_status == AC_STATUS_FOUND)) {
*errp = ext4_mb_mark_diskspace_used(ac, handle, reserv_clstrs);
- if (*errp == -EAGAIN) {
- /*
- * drop the reference that we took
- * in ext4_mb_use_best_found
- */
- ext4_mb_release_context(ac);
- ac->ac_b_ex.fe_group = 0;
- ac->ac_b_ex.fe_start = 0;
- ac->ac_b_ex.fe_len = 0;
- ac->ac_status = AC_STATUS_CONTINUE;
- goto repeat;
- } else if (*errp) {
+ if (*errp) {
ext4_discard_allocated_blocks(ac);
goto errout;
} else {
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 4.4 00/49] 4.4.18-stable review
[not found] ` <57b11059.c4ebc20a.96bf2.6869@mx.google.com>
@ 2016-08-15 7:56 ` Greg Kroah-Hartman
0 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2016-08-15 7:56 UTC (permalink / raw)
To: kernelci.org bot
Cc: linux-kernel, torvalds, akpm, linux, shuah.kh, patches, stable
On Sun, Aug 14, 2016 at 05:44:09PM -0700, kernelci.org bot wrote:
> stable-rc boot: 163 boots: 0 failed, 159 passed with 4 offline (v4.4.17-49-ga44a0226e486)
Yeah!
thanks for letting us know.
greg k-h
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 4.4 00/49] 4.4.18-stable review
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (47 preceding siblings ...)
[not found] ` <57b11059.c4ebc20a.96bf2.6869@mx.google.com>
@ 2016-08-15 13:05 ` Guenter Roeck
2016-08-16 4:02 ` Shuah Khan
49 siblings, 0 replies; 51+ messages in thread
From: Guenter Roeck @ 2016-08-15 13:05 UTC (permalink / raw)
To: Greg Kroah-Hartman, linux-kernel
Cc: torvalds, akpm, shuah.kh, patches, stable
On 08/14/2016 01:23 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.18 release.
> There are 49 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue Aug 16 20:22:43 UTC 2016.
> Anything received after that time might be too late.
>
Build results:
total: 148 pass: 148 fail: 0
Qemu test results:
total: 102 pass: 102 fail: 0
Details are available at http://kerneltests.org/builders.
Guenter
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 4.4 00/49] 4.4.18-stable review
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
` (48 preceding siblings ...)
2016-08-15 13:05 ` Guenter Roeck
@ 2016-08-16 4:02 ` Shuah Khan
49 siblings, 0 replies; 51+ messages in thread
From: Shuah Khan @ 2016-08-16 4:02 UTC (permalink / raw)
To: Greg Kroah-Hartman, linux-kernel
Cc: torvalds, akpm, linux, patches, stable, Shuah Khan
On 08/14/2016 02:23 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.18 release.
> There are 49 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue Aug 16 20:22:43 UTC 2016.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.18-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
Compiled and booted on my test system. No dmesg regressions.
thanks,
-- Shuah
--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America(Silicon Valley)
shuah.kh@samsung.com
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2016-08-16 4:02 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20160814202407uscas1p163bf70e2ff3a45b1cb089c7603e89f4a@uscas1p1.samsung.com>
2016-08-14 20:23 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 01/49] tcp: make challenge acks less predictable Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 02/49] tcp: enable per-socket rate limiting of all challenge acks Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 03/49] ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 04/49] bonding: set carrier off for devices created through netlink Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 05/49] net: bgmac: Fix infinite loop in bgmac_dma_tx_add() Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 06/49] net/irda: fix NULL pointer dereference on memory allocation failure Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 07/49] qed: Fix setting/clearing bit in completion bitmap Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 08/49] tcp: consider recv buf for the initial window scale Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 09/49] ipath: Restrict use of the write() interface Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 10/49] scsi: ignore errors from scsi_dh_add_device() Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 11/49] PNP: Add Haswell-ULT to Intel MCH size workaround Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 12/49] PNP: Add Broadwell " Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 13/49] HID: sony: do not bail out when the sixaxis refuses the output report Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 14/49] x86/mm/32: Enable full randomization on i386 and X86_32 Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 17/49] arm: oabi compat: add missing access checks Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 18/49] KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 19/49] Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL" Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 20/49] apparmor: fix ref count leak when profile sha1 hash is read Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 21/49] random: strengthen input validation for RNDADDTOENTCNT Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 22/49] devpts: clean up interface to pty drivers Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 23/49] x86/mm/pat: Add support of non-default PAT MSR setting Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 24/49] x86/mm/pat: Add pat_disable() interface Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 25/49] x86/mm/pat: Replace cpu_has_pat with boot_cpu_has() Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 26/49] x86/mtrr: Fix Xorg crashes in Qemu sessions Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 27/49] x86/mtrr: Fix PAT init handling when MTRR is disabled Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 28/49] x86/xen, pat: Remove PAT table init code from Xen Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 29/49] x86/pat: Document the PAT initialization sequence Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 30/49] x86/mm/pat: Fix BUG_ON() in mmap_mem() on QEMU/i386 Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 31/49] drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2) Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 32/49] mm: memcontrol: fix cgroup creation failure after many small jobs Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 33/49] mm: memcontrol: fix swap counter leak on swapout from offline cgroup Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 34/49] mm: memcontrol: fix memcg id ref counter on swap charge move Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 35/49] x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 36/49] block: fix use-after-free in seq file Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 37/49] sysv, ipc: fix security-layer leaking Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 38/49] fuse: fsync() did not return IO errors Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 39/49] fuse: fuse_flush must check mapping->flags for errors Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 40/49] fuse: fix wrong assignment of ->flags in fuse_send_init() Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 41/49] fs/dcache.c: avoid soft-lockup in dput() Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 42/49] crypto: gcm - Filter out async ghash if necessary Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 43/49] crypto: scatterwalk - Fix test in scatterwalk_done Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 44/49] ext4: check for extents that wrap around Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 45/49] ext4: fix deadlock during page writeback Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 46/49] ext4: dont call ext4_should_journal_data() on the journal inode Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 47/49] ext4: validate s_reserved_gdt_blocks on mount Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 48/49] ext4: short-cut orphan cleanup on error Greg Kroah-Hartman
2016-08-14 20:23 ` [PATCH 4.4 49/49] ext4: fix reference counting bug on block allocation error Greg Kroah-Hartman
[not found] ` <57b11059.c4ebc20a.96bf2.6869@mx.google.com>
2016-08-15 7:56 ` [PATCH 4.4 00/49] 4.4.18-stable review Greg Kroah-Hartman
2016-08-15 13:05 ` Guenter Roeck
2016-08-16 4:02 ` Shuah Khan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).