public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 3.4 00/23] 3.4.100-stable review
@ 2014-07-26 19:02 Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 01/23] crypto: testmgr - update LZO compression test vectors Greg Kroah-Hartman
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, satoru.takeuchi,
	shuah.kh, stable

This is the start of the stable review cycle for the 3.4.100 release.
There are 23 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Mon Jul 28 19:01:37 UTC 2014.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.4.100-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 3.4.100-rc1

Takao Indoh <indou.takao@jp.fujitsu.com>
    iommu/vt-d: Disable translation if already enabled

Takashi Iwai <tiwai@suse.de>
    PM / sleep: Fix request_firmware() error at resume

John Stultz <john.stultz@linaro.org>
    alarmtimer: Fix bug where relative alarm timers were treated as absolute

Alex Deucher <alexander.deucher@amd.com>
    drm/radeon: avoid leaking edid data

Amitkumar Karwar <akarwar@marvell.com>
    mwifiex: fix Tx timeout issue

HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
    perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

Eric Dumazet <edumazet@google.com>
    ipv4: fix buffer overflow in ip_options_compile()

Ben Hutchings <ben@decadent.org.uk>
    dns_resolver: Null-terminate the right string

Manuel Schölling <manuel.schoelling@gmx.de>
    dns_resolver: assure that dns_query() result is null-terminated

Sowmini Varadhan <sowmini.varadhan@oracle.com>
    sunvnet: clean up objects created in vnet_new() on vnet_exit()

Christoph Schulz <develop@kristov.de>
    net: pppoe: use correct channel MTU when using Multilink PPP

Daniel Borkmann <dborkman@redhat.com>
    net: sctp: fix information leaks in ulpevent layer

Jon Paul Maloy <jon.maloy@ericsson.com>
    tipc: clear 'next'-pointer of message fragments before reassembly

Suresh Reddy <Suresh.Reddy@emulex.com>
    be2net: set EQ DB clear-intr bit in be_open()

Andrey Utkin <andrey.krieger.utkin@gmail.com>
    appletalk: Fix socket referencing in skb

Yuchung Cheng <ycheng@google.com>
    tcp: fix false undo corner cases

dingtianhong <dingtianhong@huawei.com>
    igmp: fix the problem when mc leave group

Li RongQing <roy.qing.li@gmail.com>
    8021q: fix a potential memory leak

Neal Cardwell <ncardwell@google.com>
    tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb

Hugh Dickins <hughd@google.com>
    shmem: fix splicing from a hole while it's punched

Hugh Dickins <hughd@google.com>
    shmem: fix faulting into a hole, not taking i_mutex

Hugh Dickins <hughd@google.com>
    shmem: fix faulting into a hole while it's punched

Markus F.X.J. Oberhumer <markus@oberhumer.com>
    crypto: testmgr - update LZO compression test vectors


-------------

Diffstat:

 Makefile                                    |   4 +-
 arch/x86/kernel/cpu/perf_event_intel.c      |   9 ++
 crypto/testmgr.h                            |  38 ++++----
 drivers/gpu/drm/radeon/radeon_display.c     |   5 +
 drivers/iommu/dmar.c                        |  11 ++-
 drivers/iommu/intel-iommu.c                 |  15 +++
 drivers/net/ethernet/emulex/benet/be_main.c |   2 +-
 drivers/net/ethernet/sun/sunvnet.c          |  20 +++-
 drivers/net/ppp/pppoe.c                     |   2 +-
 drivers/net/wireless/mwifiex/main.c         |   1 +
 kernel/power/process.c                      |   1 +
 kernel/time/alarmtimer.c                    |  20 +++-
 mm/shmem.c                                  | 139 ++++++++++++++++++++++++++--
 mm/truncate.c                               |  25 -----
 net/8021q/vlan_core.c                       |   5 +-
 net/appletalk/ddp.c                         |   3 -
 net/dns_resolver/dns_query.c                |   4 +-
 net/ipv4/igmp.c                             |  10 +-
 net/ipv4/ip_options.c                       |   4 +
 net/ipv4/tcp_input.c                        |  10 +-
 net/ipv4/tcp_output.c                       |   6 +-
 net/sctp/ulpevent.c                         | 122 +++---------------------
 net/tipc/bcast.c                            |   1 +
 23 files changed, 274 insertions(+), 183 deletions(-)



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 01/23] crypto: testmgr - update LZO compression test vectors
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 02/23] shmem: fix faulting into a hole while its punched Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Markus F.X.J. Oberhumer,
	Derrick Pallas

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Markus F.X.J. Oberhumer" <markus@oberhumer.com>

commit 0ec7382036922be063b515b2a3f1d6f7a607392c upstream.

Update the LZO compression test vectors according to the latest compressor
version.

Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Cc: Derrick Pallas <pallas@meraki.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 crypto/testmgr.h |   38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -14558,38 +14558,40 @@ static struct pcomp_testvec zlib_decomp_
 static struct comp_testvec lzo_comp_tv_template[] = {
 	{
 		.inlen	= 70,
-		.outlen	= 46,
+		.outlen	= 57,
 		.input	= "Join us now and share the software "
 			"Join us now and share the software ",
 		.output	= "\x00\x0d\x4a\x6f\x69\x6e\x20\x75"
-			"\x73\x20\x6e\x6f\x77\x20\x61\x6e"
-			"\x64\x20\x73\x68\x61\x72\x65\x20"
-			"\x74\x68\x65\x20\x73\x6f\x66\x74"
-			"\x77\x70\x01\x01\x4a\x6f\x69\x6e"
-			"\x3d\x88\x00\x11\x00\x00",
+			  "\x73\x20\x6e\x6f\x77\x20\x61\x6e"
+			  "\x64\x20\x73\x68\x61\x72\x65\x20"
+			  "\x74\x68\x65\x20\x73\x6f\x66\x74"
+			  "\x77\x70\x01\x32\x88\x00\x0c\x65"
+			  "\x20\x74\x68\x65\x20\x73\x6f\x66"
+			  "\x74\x77\x61\x72\x65\x20\x11\x00"
+			  "\x00",
 	}, {
 		.inlen	= 159,
-		.outlen	= 133,
+		.outlen	= 131,
 		.input	= "This document describes a compression method based on the LZO "
 			"compression algorithm.  This document defines the application of "
 			"the LZO algorithm used in UBIFS.",
-		.output	= "\x00\x2b\x54\x68\x69\x73\x20\x64"
+		.output	= "\x00\x2c\x54\x68\x69\x73\x20\x64"
 			  "\x6f\x63\x75\x6d\x65\x6e\x74\x20"
 			  "\x64\x65\x73\x63\x72\x69\x62\x65"
 			  "\x73\x20\x61\x20\x63\x6f\x6d\x70"
 			  "\x72\x65\x73\x73\x69\x6f\x6e\x20"
 			  "\x6d\x65\x74\x68\x6f\x64\x20\x62"
 			  "\x61\x73\x65\x64\x20\x6f\x6e\x20"
-			  "\x74\x68\x65\x20\x4c\x5a\x4f\x2b"
-			  "\x8c\x00\x0d\x61\x6c\x67\x6f\x72"
-			  "\x69\x74\x68\x6d\x2e\x20\x20\x54"
-			  "\x68\x69\x73\x2a\x54\x01\x02\x66"
-			  "\x69\x6e\x65\x73\x94\x06\x05\x61"
-			  "\x70\x70\x6c\x69\x63\x61\x74\x76"
-			  "\x0a\x6f\x66\x88\x02\x60\x09\x27"
-			  "\xf0\x00\x0c\x20\x75\x73\x65\x64"
-			  "\x20\x69\x6e\x20\x55\x42\x49\x46"
-			  "\x53\x2e\x11\x00\x00",
+			  "\x74\x68\x65\x20\x4c\x5a\x4f\x20"
+			  "\x2a\x8c\x00\x09\x61\x6c\x67\x6f"
+			  "\x72\x69\x74\x68\x6d\x2e\x20\x20"
+			  "\x2e\x54\x01\x03\x66\x69\x6e\x65"
+			  "\x73\x20\x74\x06\x05\x61\x70\x70"
+			  "\x6c\x69\x63\x61\x74\x76\x0a\x6f"
+			  "\x66\x88\x02\x60\x09\x27\xf0\x00"
+			  "\x0c\x20\x75\x73\x65\x64\x20\x69"
+			  "\x6e\x20\x55\x42\x49\x46\x53\x2e"
+			  "\x11\x00\x00",
 	},
 };
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 02/23] shmem: fix faulting into a hole while its punched
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 01/23] crypto: testmgr - update LZO compression test vectors Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 03/23] shmem: fix faulting into a hole, not taking i_mutex Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Sasha Levin, Dave Jones,
	Andrew Morton, Linus Torvalds

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

commit f00cdc6df7d7cfcabb5b740911e6788cb0802bdb upstream.

Trinity finds that mmap access to a hole while it's punched from shmem
can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
from completing, until the reader chooses to stop; with the puncher's
hold on i_mutex locking out all other writers until it can complete.

It appears that the tmpfs fault path is too light in comparison with its
hole-punching path, lacking an i_data_sem to obstruct it; but we don't
want to slow down the common case.

Extend shmem_fallocate()'s existing range notification mechanism, so
shmem_fault() can refrain from faulting pages into the hole while it's
punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
faulting when not).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 mm/shmem.c    |   91 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/truncate.c |   25 ---------------
 2 files changed, 91 insertions(+), 25 deletions(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -76,6 +76,16 @@ static struct vfsmount *shm_mnt;
 /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
 #define SHORT_SYMLINK_LEN 128
 
+/*
+ * vmtruncate_range() communicates with shmem_fault via
+ * inode->i_private (with i_mutex making sure that it has only one user at
+ * a time): we would prefer not to enlarge the shmem inode just for that.
+ */
+struct shmem_falloc {
+	pgoff_t start;		/* start of range currently being fallocated */
+	pgoff_t next;		/* the next page offset to be fallocated */
+};
+
 struct shmem_xattr {
 	struct list_head list;	/* anchored by shmem_inode_info->xattr_list */
 	char *name;		/* xattr name */
@@ -1060,6 +1070,43 @@ static int shmem_fault(struct vm_area_st
 	int error;
 	int ret = VM_FAULT_LOCKED;
 
+	/*
+	 * Trinity finds that probing a hole which tmpfs is punching can
+	 * prevent the hole-punch from ever completing: which in turn
+	 * locks writers out with its hold on i_mutex.  So refrain from
+	 * faulting pages into the hole while it's being punched, and
+	 * wait on i_mutex to be released if vmf->flags permits.
+	 */
+	if (unlikely(inode->i_private)) {
+		struct shmem_falloc *shmem_falloc;
+
+		spin_lock(&inode->i_lock);
+		shmem_falloc = inode->i_private;
+		if (!shmem_falloc ||
+		    vmf->pgoff < shmem_falloc->start ||
+		    vmf->pgoff >= shmem_falloc->next)
+			shmem_falloc = NULL;
+		spin_unlock(&inode->i_lock);
+		/*
+		 * i_lock has protected us from taking shmem_falloc seriously
+		 * once return from vmtruncate_range() went back up that stack.
+		 * i_lock does not serialize with i_mutex at all, but it does
+		 * not matter if sometimes we wait unnecessarily, or sometimes
+		 * miss out on waiting: we just need to make those cases rare.
+		 */
+		if (shmem_falloc) {
+			if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) &&
+			   !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
+				up_read(&vma->vm_mm->mmap_sem);
+				mutex_lock(&inode->i_mutex);
+				mutex_unlock(&inode->i_mutex);
+				return VM_FAULT_RETRY;
+			}
+			/* cond_resched? Leave that to GUP or return to user */
+			return VM_FAULT_NOPAGE;
+		}
+	}
+
 	error = shmem_getpage(inode, vmf->pgoff, &vmf->page, SGP_CACHE, &ret);
 	if (error)
 		return ((error == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS);
@@ -1071,6 +1118,44 @@ static int shmem_fault(struct vm_area_st
 	return ret;
 }
 
+int vmtruncate_range(struct inode *inode, loff_t lstart, loff_t lend)
+{
+	/*
+	 * If the underlying filesystem is not going to provide
+	 * a way to truncate a range of blocks (punch a hole) -
+	 * we should return failure right now.
+	 * Only CONFIG_SHMEM shmem.c ever supported i_op->truncate_range().
+	 */
+	if (inode->i_op->truncate_range != shmem_truncate_range)
+		return -ENOSYS;
+
+	mutex_lock(&inode->i_mutex);
+	{
+		struct shmem_falloc shmem_falloc;
+		struct address_space *mapping = inode->i_mapping;
+		loff_t unmap_start = round_up(lstart, PAGE_SIZE);
+		loff_t unmap_end = round_down(1 + lend, PAGE_SIZE) - 1;
+
+		shmem_falloc.start = unmap_start >> PAGE_SHIFT;
+		shmem_falloc.next = (unmap_end + 1) >> PAGE_SHIFT;
+		spin_lock(&inode->i_lock);
+		inode->i_private = &shmem_falloc;
+		spin_unlock(&inode->i_lock);
+
+		if ((u64)unmap_end > (u64)unmap_start)
+			unmap_mapping_range(mapping, unmap_start,
+					    1 + unmap_end - unmap_start, 0);
+		shmem_truncate_range(inode, lstart, lend);
+		/* No need to unmap again: hole-punching leaves COWed pages */
+
+		spin_lock(&inode->i_lock);
+		inode->i_private = NULL;
+		spin_unlock(&inode->i_lock);
+	}
+	mutex_unlock(&inode->i_mutex);
+	return 0;
+}
+
 #ifdef CONFIG_NUMA
 static int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
 {
@@ -2547,6 +2632,12 @@ void shmem_truncate_range(struct inode *
 }
 EXPORT_SYMBOL_GPL(shmem_truncate_range);
 
+int vmtruncate_range(struct inode *inode, loff_t lstart, loff_t lend)
+{
+	/* Only CONFIG_SHMEM shmem.c ever supported i_op->truncate_range(). */
+	return -ENOSYS;
+}
+
 #define shmem_vm_ops				generic_file_vm_ops
 #define shmem_file_operations			ramfs_file_operations
 #define shmem_get_inode(sb, dir, mode, dev, flags)	ramfs_get_inode(sb, dir, mode, dev)
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -603,31 +603,6 @@ int vmtruncate(struct inode *inode, loff
 }
 EXPORT_SYMBOL(vmtruncate);
 
-int vmtruncate_range(struct inode *inode, loff_t lstart, loff_t lend)
-{
-	struct address_space *mapping = inode->i_mapping;
-	loff_t holebegin = round_up(lstart, PAGE_SIZE);
-	loff_t holelen = 1 + lend - holebegin;
-
-	/*
-	 * If the underlying filesystem is not going to provide
-	 * a way to truncate a range of blocks (punch a hole) -
-	 * we should return failure right now.
-	 */
-	if (!inode->i_op->truncate_range)
-		return -ENOSYS;
-
-	mutex_lock(&inode->i_mutex);
-	inode_dio_wait(inode);
-	unmap_mapping_range(mapping, holebegin, holelen, 1);
-	inode->i_op->truncate_range(inode, lstart, lend);
-	/* unmap again to remove racily COWed private pages */
-	unmap_mapping_range(mapping, holebegin, holelen, 1);
-	mutex_unlock(&inode->i_mutex);
-
-	return 0;
-}
-
 /**
  * truncate_pagecache_range - unmap and remove pagecache that is hole-punched
  * @inode: inode



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 03/23] shmem: fix faulting into a hole, not taking i_mutex
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 01/23] crypto: testmgr - update LZO compression test vectors Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 02/23] shmem: fix faulting into a hole while its punched Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 04/23] shmem: fix splicing from a hole while its punched Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Sasha Levin,
	Vlastimil Babka, Konstantin Khlebnikov, Johannes Weiner,
	Lukas Czerner, Dave Jones, Andrew Morton, Linus Torvalds

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

commit 8e205f779d1443a94b5ae81aa359cb535dd3021e upstream.

Commit f00cdc6df7d7 ("shmem: fix faulting into a hole while it's
punched") was buggy: Sasha sent a lockdep report to remind us that
grabbing i_mutex in the fault path is a no-no (write syscall may already
hold i_mutex while faulting user buffer).

We tried a completely different approach (see following patch) but that
proved inadequate: good enough for a rational workload, but not good
enough against trinity - which forks off so many mappings of the object
that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
into serious starvation when concurrent faults force the puncher to fall
back to single-page unmap_mapping_range() searches of the i_mmap tree.

So return to the original umbrella approach, but keep away from i_mutex
this time.  We really don't want to bloat every shmem inode with a new
mutex or completion, just to protect this unlikely case from trinity.
So extend the original with wait_queue_head on stack at the hole-punch
end, and wait_queue item on the stack at the fault end.

This involves further use of i_lock to guard against the races: lockdep
has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
i_lock around wake_up_bit(), which is comparable to what we do here.
i_lock is more convenient, but we could switch to shmem's info->lock.

This issue has been tagged with CVE-2014-4171, which will require commit
f00cdc6df7d7 and this and the following patch to be backported: we
suggest to 3.1+, though in fact the trinity forkbomb effect might go
back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
Anyone running trinity on 3.0 and earlier? I don't think we need care.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 mm/shmem.c |   64 +++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 20 deletions(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -82,6 +82,7 @@ static struct vfsmount *shm_mnt;
  * a time): we would prefer not to enlarge the shmem inode just for that.
  */
 struct shmem_falloc {
+	wait_queue_head_t *waitq; /* faults into hole wait for punch to end */
 	pgoff_t start;		/* start of range currently being fallocated */
 	pgoff_t next;		/* the next page offset to be fallocated */
 };
@@ -1074,37 +1075,57 @@ static int shmem_fault(struct vm_area_st
 	 * Trinity finds that probing a hole which tmpfs is punching can
 	 * prevent the hole-punch from ever completing: which in turn
 	 * locks writers out with its hold on i_mutex.  So refrain from
-	 * faulting pages into the hole while it's being punched, and
-	 * wait on i_mutex to be released if vmf->flags permits.
+	 * faulting pages into the hole while it's being punched.  Although
+	 * shmem_truncate_range() does remove the additions, it may be unable to
+	 * keep up, as each new page needs its own unmap_mapping_range() call,
+	 * and the i_mmap tree grows ever slower to scan if new vmas are added.
+	 *
+	 * It does not matter if we sometimes reach this check just before the
+	 * hole-punch begins, so that one fault then races with the punch:
+	 * we just need to make racing faults a rare case.
+	 *
+	 * The implementation below would be much simpler if we just used a
+	 * standard mutex or completion: but we cannot take i_mutex in fault,
+	 * and bloating every shmem inode for this unlikely case would be sad.
 	 */
 	if (unlikely(inode->i_private)) {
 		struct shmem_falloc *shmem_falloc;
 
 		spin_lock(&inode->i_lock);
 		shmem_falloc = inode->i_private;
-		if (!shmem_falloc ||
-		    vmf->pgoff < shmem_falloc->start ||
-		    vmf->pgoff >= shmem_falloc->next)
-			shmem_falloc = NULL;
-		spin_unlock(&inode->i_lock);
-		/*
-		 * i_lock has protected us from taking shmem_falloc seriously
-		 * once return from vmtruncate_range() went back up that stack.
-		 * i_lock does not serialize with i_mutex at all, but it does
-		 * not matter if sometimes we wait unnecessarily, or sometimes
-		 * miss out on waiting: we just need to make those cases rare.
-		 */
-		if (shmem_falloc) {
+		if (shmem_falloc &&
+		    vmf->pgoff >= shmem_falloc->start &&
+		    vmf->pgoff < shmem_falloc->next) {
+			wait_queue_head_t *shmem_falloc_waitq;
+			DEFINE_WAIT(shmem_fault_wait);
+
+			ret = VM_FAULT_NOPAGE;
 			if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) &&
 			   !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
+				/* It's polite to up mmap_sem if we can */
 				up_read(&vma->vm_mm->mmap_sem);
-				mutex_lock(&inode->i_mutex);
-				mutex_unlock(&inode->i_mutex);
-				return VM_FAULT_RETRY;
+				ret = VM_FAULT_RETRY;
 			}
-			/* cond_resched? Leave that to GUP or return to user */
-			return VM_FAULT_NOPAGE;
+
+			shmem_falloc_waitq = shmem_falloc->waitq;
+			prepare_to_wait(shmem_falloc_waitq, &shmem_fault_wait,
+					TASK_UNINTERRUPTIBLE);
+			spin_unlock(&inode->i_lock);
+			schedule();
+
+			/*
+			 * shmem_falloc_waitq points into the vmtruncate_range()
+			 * stack of the hole-punching task: shmem_falloc_waitq
+			 * is usually invalid by the time we reach here, but
+			 * finish_wait() does not dereference it in that case;
+			 * though i_lock needed lest racing with wake_up_all().
+			 */
+			spin_lock(&inode->i_lock);
+			finish_wait(shmem_falloc_waitq, &shmem_fault_wait);
+			spin_unlock(&inode->i_lock);
+			return ret;
 		}
+		spin_unlock(&inode->i_lock);
 	}
 
 	error = shmem_getpage(inode, vmf->pgoff, &vmf->page, SGP_CACHE, &ret);
@@ -1135,7 +1156,9 @@ int vmtruncate_range(struct inode *inode
 		struct address_space *mapping = inode->i_mapping;
 		loff_t unmap_start = round_up(lstart, PAGE_SIZE);
 		loff_t unmap_end = round_down(1 + lend, PAGE_SIZE) - 1;
+		DECLARE_WAIT_QUEUE_HEAD_ONSTACK(shmem_falloc_waitq);
 
+		shmem_falloc.waitq = &shmem_falloc_waitq;
 		shmem_falloc.start = unmap_start >> PAGE_SHIFT;
 		shmem_falloc.next = (unmap_end + 1) >> PAGE_SHIFT;
 		spin_lock(&inode->i_lock);
@@ -1150,6 +1173,7 @@ int vmtruncate_range(struct inode *inode
 
 		spin_lock(&inode->i_lock);
 		inode->i_private = NULL;
+		wake_up_all(&shmem_falloc_waitq);
 		spin_unlock(&inode->i_lock);
 	}
 	mutex_unlock(&inode->i_mutex);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 04/23] shmem: fix splicing from a hole while its punched
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 03/23] shmem: fix faulting into a hole, not taking i_mutex Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 05/23] tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Hugh Dickins, Sasha Levin,
	Vlastimil Babka, Konstantin Khlebnikov, Johannes Weiner,
	Lukas Czerner, Dave Jones, Andrew Morton, Linus Torvalds

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

commit b1a366500bd537b50c3aad26dc7df083ec03a448 upstream.

shmem_fault() is the actual culprit in trinity's hole-punch starvation,
and the most significant cause of such problems: since a page faulted is
one that then appears page_mapped(), needing unmap_mapping_range() and
i_mmap_mutex to be unmapped again.

But it is not the only way in which a page can be brought into a hole in
the radix_tree while that hole is being punched; and Vlastimil's testing
implies that if enough other processors are busy filling in the hole,
then shmem_undo_range() can be kept from completing indefinitely.

shmem_file_splice_read() is the main other user of SGP_CACHE, which can
instantiate shmem pagecache pages in the read-only case (without holding
i_mutex, so perhaps concurrently with a hole-punch).  Probably it's
silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
ought to be safe, but might bring surprises - not a change to be rushed.

shmem_read_mapping_page_gfp() is an internal interface used by
drivers/gpu/drm GEM (and next by uprobes): it should be okay.  And
shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
called internally by the kernel (perhaps for a stacking filesystem,
which might rely on holes to be reserved): it's unclear whether it could
be provoked to keep hole-punch busy or not.

We could apply the same umbrella as now used in shmem_fault() to
shmem_file_splice_read() and the others; but it looks ugly, and use over
a range raises questions - should it actually be per page? can these get
starved themselves?

The origin of this part of the problem is my v3.1 commit d0823576bf4b
("mm: pincer in truncate_inode_pages_range"), once it was duplicated
into shmem.c.  It seemed like a nice idea at the time, to ensure
(barring RCU lookup fuzziness) that there's an instant when the entire
hole is empty; but the indefinitely repeated scans to ensure that make
it vulnerable.

Revert that "enhancement" to hole-punch from shmem_undo_range(), but
retain the unproblematic rescanning when it's truncating; add a couple
of comments there.

Remove the "indices[0] >= end" test: that is now handled satisfactorily
by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
to be worth avoiding here.

But if we do not always loop indefinitely, we do need to handle the case
of swap swizzled back to page before shmem_free_swap() gets it: add a
retry for that case, as suggested by Konstantin Khlebnikov; and for the
case of page swizzled back to swap, as suggested by Johannes Weiner.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 mm/shmem.c |   24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -499,22 +499,19 @@ void shmem_truncate_range(struct inode *
 	}
 
 	index = start;
-	for ( ; ; ) {
+	while (index <= end) {
 		cond_resched();
 		pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
 							pvec.pages, indices);
 		if (!pvec.nr) {
-			if (index == start)
+			/* If all gone or hole-punch, we're done */
+			if (index == start || end != -1)
 				break;
+			/* But if truncating, restart to make sure all gone */
 			index = start;
 			continue;
 		}
-		if (index == start && indices[0] > end) {
-			shmem_deswap_pagevec(&pvec);
-			pagevec_release(&pvec);
-			break;
-		}
 		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
@@ -524,8 +521,12 @@ void shmem_truncate_range(struct inode *
 				break;
 
 			if (radix_tree_exceptional_entry(page)) {
-				nr_swaps_freed += !shmem_free_swap(mapping,
-								index, page);
+				if (shmem_free_swap(mapping, index, page)) {
+					/* Swap was replaced by page: retry */
+					index--;
+					break;
+				}
+				nr_swaps_freed++;
 				continue;
 			}
 
@@ -533,6 +534,11 @@ void shmem_truncate_range(struct inode *
 			if (page->mapping == mapping) {
 				VM_BUG_ON(PageWriteback(page));
 				truncate_inode_page(mapping, page);
+			} else {
+				/* Page was replaced by swap: retry */
+				unlock_page(page);
+				index--;
+				break;
 			}
 			unlock_page(page);
 		}



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 05/23] tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 04/23] shmem: fix splicing from a hole while its punched Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 06/23] 8021q: fix a potential memory leak Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Neal Cardwell,
	Yuchung Cheng, Ilpo Jarvinen, David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <ncardwell@google.com>

[ Upstream commit 2cd0d743b05e87445c54ca124a9916f22f16742e ]

If there is an MSS change (or misbehaving receiver) that causes a SACK
to arrive that covers the end of an skb but is less than one MSS, then
tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
the skb ("Round if necessary..."), then chopping all bytes off the skb
and creating a zero-byte skb in the write queue.

This was visible now because the recently simplified TLP logic in
bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte
skb at the end of the write queue, and now that we do not check that
skb's length we could send it as a TLP probe.

Consider the following example scenario:

 mss: 1000
 skb: seq: 0 end_seq: 4000  len: 4000
 SACK: start_seq: 3999 end_seq: 4000

The tcp_match_skb_to_sack() code will compute:

 in_sack = false
 pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
 new_len += mss = 4000

Previously we would find the new_len > skb->len check failing, so we
would fall through and set pkt_len = new_len = 4000 and chop off
pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
afterward in the write queue.

With this new commit, we notice that the new new_len >= skb->len check
succeeds, so that we return without trying to fragment.

Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_input.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1304,7 +1304,7 @@ static int tcp_match_skb_to_sack(struct
 			unsigned int new_len = (pkt_len / mss) * mss;
 			if (!in_sack && new_len < pkt_len) {
 				new_len += mss;
-				if (new_len > skb->len)
+				if (new_len >= skb->len)
 					return 0;
 			}
 			pkt_len = new_len;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 06/23] 8021q: fix a potential memory leak
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 05/23] tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 07/23] igmp: fix the problem when mc leave group Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Li RongQing, Eric Dumazet,
	David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Li RongQing <roy.qing.li@gmail.com>

[ Upstream commit 916c1689a09bc1ca81f2d7a34876f8d35aadd11b ]

skb_cow called in vlan_reorder_header does not free the skb when it failed,
and vlan_reorder_header returns NULL to reset original skb when it is called
in vlan_untag, lead to a memory leak.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/8021q/vlan_core.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -96,8 +96,11 @@ EXPORT_SYMBOL(vlan_dev_vlan_id);
 
 static struct sk_buff *vlan_reorder_header(struct sk_buff *skb)
 {
-	if (skb_cow(skb, skb_headroom(skb)) < 0)
+	if (skb_cow(skb, skb_headroom(skb)) < 0) {
+		kfree_skb(skb);
 		return NULL;
+	}
+
 	memmove(skb->data - ETH_HLEN, skb->data - VLAN_ETH_HLEN, 2 * ETH_ALEN);
 	skb->mac_header += VLAN_HLEN;
 	return skb;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 07/23] igmp: fix the problem when mc leave group
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 06/23] 8021q: fix a potential memory leak Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 08/23] tcp: fix false undo corner cases Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ding Tianhong, David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: dingtianhong <dingtianhong@huawei.com>

[ Upstream commit 52ad353a5344f1f700c5b777175bdfa41d3cd65a ]

The problem was triggered by these steps:

1) create socket, bind and then setsockopt for add mc group.
   mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
   mreq.imr_interface.s_addr = inet_addr("192.168.1.2");
   setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));

2) drop the mc group for this socket.
   mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
   mreq.imr_interface.s_addr = inet_addr("0.0.0.0");
   setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));

3) and then drop the socket, I found the mc group was still used by the dev:

   netstat -g

   Interface       RefCnt Group
   --------------- ------ ---------------------
   eth2		   1	  255.0.0.37

Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
to be released for the netdev when drop the socket, but this process was broken when
route default is NULL, the reason is that:

The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL,
the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be
released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
when dropping the socket, the mc_list is NULL and the dev still keep this group.

v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
	so I add the checking for the in_dev before polling the mc_list, make sure when
	we remove the mc group, dec the refcnt to the real dev which was using the mc address.
	The problem would never happened again.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/igmp.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1866,6 +1866,10 @@ int ip_mc_leave_group(struct sock *sk, s
 
 	rtnl_lock();
 	in_dev = ip_mc_find_dev(net, imr);
+	if (!in_dev) {
+		ret = -ENODEV;
+		goto out;
+	}
 	ifindex = imr->imr_ifindex;
 	for (imlp = &inet->mc_list;
 	     (iml = rtnl_dereference(*imlp)) != NULL;
@@ -1883,16 +1887,14 @@ int ip_mc_leave_group(struct sock *sk, s
 
 		*imlp = iml->next_rcu;
 
-		if (in_dev)
-			ip_mc_dec_group(in_dev, group);
+		ip_mc_dec_group(in_dev, group);
 		rtnl_unlock();
 		/* decrease mem now to avoid the memleak warning */
 		atomic_sub(sizeof(*iml), &sk->sk_omem_alloc);
 		kfree_rcu(iml, rcu);
 		return 0;
 	}
-	if (!in_dev)
-		ret = -ENODEV;
+out:
 	rtnl_unlock();
 	return ret;
 }



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 08/23] tcp: fix false undo corner cases
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 07/23] igmp: fix the problem when mc leave group Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 09/23] appletalk: Fix socket referencing in skb Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Yuchung Cheng, Neal Cardwell,
	David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yuchung Cheng <ycheng@google.com>

[ Upstream commit 6e08d5e3c8236e7484229e46fdf92006e1dd4c49 ]

The undo code assumes that, upon entering loss recovery, TCP
1) always retransmit something
2) the retransmission never fails locally (e.g., qdisc drop)

so undo_marker is set in tcp_enter_recovery() and undo_retrans is
incremented only when tcp_retransmit_skb() is successful.

When the assumption is broken because TCP's cwnd is too small to
retransmit or the retransmit fails locally. The next (DUP)ACK
would incorrectly revert the cwnd and the congestion state in
tcp_try_undo_dsack() or tcp_may_undo(). Subsequent (DUP)ACKs
may enter the recovery state. The sender repeatedly enter and
(incorrectly) exit recovery states if the retransmits continue to
fail locally while receiving (DUP)ACKs.

The fix is to initialize undo_retrans to -1 and start counting on
the first retransmission. Always increment undo_retrans even if the
retransmissions fail locally because they couldn't cause DSACKs to
undo the cwnd reduction.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_input.c  |    8 ++++----
 net/ipv4/tcp_output.c |    6 ++++--
 2 files changed, 8 insertions(+), 6 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1250,7 +1250,7 @@ static int tcp_check_dsack(struct sock *
 	}
 
 	/* D-SACK for already forgotten data... Do dumb counting. */
-	if (dup_sack && tp->undo_marker && tp->undo_retrans &&
+	if (dup_sack && tp->undo_marker && tp->undo_retrans > 0 &&
 	    !after(end_seq_0, prior_snd_una) &&
 	    after(end_seq_0, tp->undo_marker))
 		tp->undo_retrans--;
@@ -1328,7 +1328,7 @@ static u8 tcp_sacktag_one(struct sock *s
 
 	/* Account D-SACK for retransmitted packet. */
 	if (dup_sack && (sacked & TCPCB_RETRANS)) {
-		if (tp->undo_marker && tp->undo_retrans &&
+		if (tp->undo_marker && tp->undo_retrans > 0 &&
 		    after(end_seq, tp->undo_marker))
 			tp->undo_retrans--;
 		if (sacked & TCPCB_SACKED_ACKED)
@@ -2226,7 +2226,7 @@ static void tcp_clear_retrans_partial(st
 	tp->lost_out = 0;
 
 	tp->undo_marker = 0;
-	tp->undo_retrans = 0;
+	tp->undo_retrans = -1;
 }
 
 void tcp_clear_retrans(struct tcp_sock *tp)
@@ -3165,7 +3165,7 @@ static void tcp_fastretrans_alert(struct
 		tp->high_seq = tp->snd_nxt;
 		tp->prior_ssthresh = 0;
 		tp->undo_marker = tp->snd_una;
-		tp->undo_retrans = tp->retrans_out;
+		tp->undo_retrans = tp->retrans_out ? : -1;
 
 		if (icsk->icsk_ca_state < TCP_CA_CWR) {
 			if (!(flag & FLAG_ECE))
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2194,13 +2194,15 @@ int tcp_retransmit_skb(struct sock *sk,
 		if (!tp->retrans_stamp)
 			tp->retrans_stamp = TCP_SKB_CB(skb)->when;
 
-		tp->undo_retrans += tcp_skb_pcount(skb);
-
 		/* snd_nxt is stored to detect loss of retransmitted segment,
 		 * see tcp_input.c tcp_sacktag_write_queue().
 		 */
 		TCP_SKB_CB(skb)->ack_seq = tp->snd_nxt;
 	}
+
+	if (tp->undo_retrans < 0)
+		tp->undo_retrans = 0;
+	tp->undo_retrans += tcp_skb_pcount(skb);
 	return err;
 }
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 09/23] appletalk: Fix socket referencing in skb
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 08/23] tcp: fix false undo corner cases Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 10/23] be2net: set EQ DB clear-intr bit in be_open() Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ed Martin, Andrey Utkin, Eric Dumazet,
	David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andrey Utkin <andrey.krieger.utkin@gmail.com>

[ Upstream commit 36beddc272c111689f3042bf3d10a64d8a805f93 ]

Setting just skb->sk without taking its reference and setting a
destructor is invalid. However, in the places where this was done, skb
is used in a way not requiring skb->sk setting. So dropping the setting
of skb->sk.
Thanks to Eric Dumazet <eric.dumazet@gmail.com> for correct solution.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441
Reported-by: Ed Martin <edman007@edman007.com>
Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/appletalk/ddp.c |    3 ---
 1 file changed, 3 deletions(-)

--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1494,8 +1494,6 @@ static int atalk_rcv(struct sk_buff *skb
 		goto drop;
 
 	/* Queue packet (standard) */
-	skb->sk = sock;
-
 	if (sock_queue_rcv_skb(sock, skb) < 0)
 		goto drop;
 
@@ -1649,7 +1647,6 @@ static int atalk_sendmsg(struct kiocb *i
 	if (!skb)
 		goto out;
 
-	skb->sk = sk;
 	skb_reserve(skb, ddp_dl->header_length);
 	skb_reserve(skb, dev->hard_header_len);
 	skb->dev = dev;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 10/23] be2net: set EQ DB clear-intr bit in be_open()
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 09/23] appletalk: Fix socket referencing in skb Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 11/23] tipc: clear next-pointer of message fragments before reassembly Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Suresh Reddy, Sathya Perla,
	David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Suresh Reddy <Suresh.Reddy@emulex.com>

[ Upstream commit 4cad9f3b61c7268fa89ab8096e23202300399b5d ]

On BE3, if the clear-interrupt bit of the EQ doorbell is not set the first
time it is armed, ocassionally we have observed that the EQ doesn't raise
anymore interrupts even if it is in armed state.
This patch fixes this by setting the clear-interrupt bit when EQs are
armed for the first time in be_open().

Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/emulex/benet/be_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2411,7 +2411,7 @@ static int be_open(struct net_device *ne
 
 	for_all_evt_queues(adapter, eqo, i) {
 		napi_enable(&eqo->napi);
-		be_eq_notify(adapter, eqo->q.id, true, false, 0);
+		be_eq_notify(adapter, eqo->q.id, true, true, 0);
 	}
 
 	status = be_cmd_link_status_query(adapter, NULL, NULL,



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 11/23] tipc: clear next-pointer of message fragments before reassembly
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 10/23] be2net: set EQ DB clear-intr bit in be_open() Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 12/23] net: sctp: fix information leaks in ulpevent layer Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jon Maloy, David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jon Paul Maloy <jon.maloy@ericsson.com>

[ Upstream commit 999417549c16dd0e3a382aa9f6ae61688db03181 ]

If the 'next' pointer of the last fragment buffer in a message is not
zeroed before reassembly, we risk ending up with a corrupt message,
since the reassembly function itself isn't doing this.

Currently, when a buffer is retrieved from the deferred queue of the
broadcast link, the next pointer is not cleared, with the result as
described above.

This commit corrects this, and thereby fixes a bug that may occur when
long broadcast messages are transmitted across dual interfaces. The bug
has been present since 40ba3cdf542a469aaa9083fa041656e59b109b90 ("tipc:
message reassembly using fragment chain")

This commit should be applied to both net and net-next.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/tipc/bcast.c |    1 +
 1 file changed, 1 insertion(+)

--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -541,6 +541,7 @@ receive:
 
 		buf = node->bclink.deferred_head;
 		node->bclink.deferred_head = buf->next;
+		buf->next = NULL;
 		node->bclink.deferred_size--;
 		goto receive;
 	}



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 12/23] net: sctp: fix information leaks in ulpevent layer
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 11/23] tipc: clear next-pointer of message fragments before reassembly Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 13/23] net: pppoe: use correct channel MTU when using Multilink PPP Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

[ Upstream commit 8f2e5ae40ec193bc0a0ed99e95315c3eebca84ea ]

While working on some other SCTP code, I noticed that some
structures shared with user space are leaking uninitialized
stack or heap buffer. In particular, struct sctp_sndrcvinfo
has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
putting this into cmsg. But also struct sctp_remote_error
contains a 2 bytes hole that we don't fill but place into a skb
through skb_copy_expand() via sctp_ulpevent_make_remote_error().

Both structures are defined by the IETF in RFC6458:

* Section 5.3.2. SCTP Header Information Structure:

  The sctp_sndrcvinfo structure is defined below:

  struct sctp_sndrcvinfo {
    uint16_t sinfo_stream;
    uint16_t sinfo_ssn;
    uint16_t sinfo_flags;
    <-- 2 bytes hole  -->
    uint32_t sinfo_ppid;
    uint32_t sinfo_context;
    uint32_t sinfo_timetolive;
    uint32_t sinfo_tsn;
    uint32_t sinfo_cumtsn;
    sctp_assoc_t sinfo_assoc_id;
  };

* 6.1.3. SCTP_REMOTE_ERROR:

  A remote peer may send an Operation Error message to its peer.
  This message indicates a variety of error conditions on an
  association. The entire ERROR chunk as it appears on the wire
  is included in an SCTP_REMOTE_ERROR event. Please refer to the
  SCTP specification [RFC4960] and any extensions for a list of
  possible error formats. An SCTP error notification has the
  following format:

  struct sctp_remote_error {
    uint16_t sre_type;
    uint16_t sre_flags;
    uint32_t sre_length;
    uint16_t sre_error;
    <-- 2 bytes hole  -->
    sctp_assoc_t sre_assoc_id;
    uint8_t  sre_data[];
  };

Fix this by setting both to 0 before filling them out. We also
have other structures shared between user and kernel space in
SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
copy that buffer over from user space first and thus don't need
to care about it in that cases.

While at it, we can also remove lengthy comments copied from
the draft, instead, we update the comment with the correct RFC
number where one can look it up.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/ulpevent.c |  122 ++++++----------------------------------------------
 1 file changed, 15 insertions(+), 107 deletions(-)

--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -373,9 +373,10 @@ fail:
  * specification [SCTP] and any extensions for a list of possible
  * error formats.
  */
-struct sctp_ulpevent *sctp_ulpevent_make_remote_error(
-	const struct sctp_association *asoc, struct sctp_chunk *chunk,
-	__u16 flags, gfp_t gfp)
+struct sctp_ulpevent *
+sctp_ulpevent_make_remote_error(const struct sctp_association *asoc,
+				struct sctp_chunk *chunk, __u16 flags,
+				gfp_t gfp)
 {
 	struct sctp_ulpevent *event;
 	struct sctp_remote_error *sre;
@@ -394,8 +395,7 @@ struct sctp_ulpevent *sctp_ulpevent_make
 	/* Copy the skb to a new skb with room for us to prepend
 	 * notification with.
 	 */
-	skb = skb_copy_expand(chunk->skb, sizeof(struct sctp_remote_error),
-			      0, gfp);
+	skb = skb_copy_expand(chunk->skb, sizeof(*sre), 0, gfp);
 
 	/* Pull off the rest of the cause TLV from the chunk.  */
 	skb_pull(chunk->skb, elen);
@@ -406,62 +406,21 @@ struct sctp_ulpevent *sctp_ulpevent_make
 	event = sctp_skb2event(skb);
 	sctp_ulpevent_init(event, MSG_NOTIFICATION, skb->truesize);
 
-	sre = (struct sctp_remote_error *)
-		skb_push(skb, sizeof(struct sctp_remote_error));
+	sre = (struct sctp_remote_error *) skb_push(skb, sizeof(*sre));
 
 	/* Trim the buffer to the right length.  */
-	skb_trim(skb, sizeof(struct sctp_remote_error) + elen);
+	skb_trim(skb, sizeof(*sre) + elen);
 
-	/* Socket Extensions for SCTP
-	 * 5.3.1.3 SCTP_REMOTE_ERROR
-	 *
-	 * sre_type:
-	 *   It should be SCTP_REMOTE_ERROR.
-	 */
+	/* RFC6458, Section 6.1.3. SCTP_REMOTE_ERROR */
+	memset(sre, 0, sizeof(*sre));
 	sre->sre_type = SCTP_REMOTE_ERROR;
-
-	/*
-	 * Socket Extensions for SCTP
-	 * 5.3.1.3 SCTP_REMOTE_ERROR
-	 *
-	 * sre_flags: 16 bits (unsigned integer)
-	 *   Currently unused.
-	 */
 	sre->sre_flags = 0;
-
-	/* Socket Extensions for SCTP
-	 * 5.3.1.3 SCTP_REMOTE_ERROR
-	 *
-	 * sre_length: sizeof (__u32)
-	 *
-	 * This field is the total length of the notification data,
-	 * including the notification header.
-	 */
 	sre->sre_length = skb->len;
-
-	/* Socket Extensions for SCTP
-	 * 5.3.1.3 SCTP_REMOTE_ERROR
-	 *
-	 * sre_error: 16 bits (unsigned integer)
-	 * This value represents one of the Operational Error causes defined in
-	 * the SCTP specification, in network byte order.
-	 */
 	sre->sre_error = cause;
-
-	/* Socket Extensions for SCTP
-	 * 5.3.1.3 SCTP_REMOTE_ERROR
-	 *
-	 * sre_assoc_id: sizeof (sctp_assoc_t)
-	 *
-	 * The association id field, holds the identifier for the association.
-	 * All notifications for a given association have the same association
-	 * identifier.  For TCP style socket, this field is ignored.
-	 */
 	sctp_ulpevent_set_owner(event, asoc);
 	sre->sre_assoc_id = sctp_assoc2id(asoc);
 
 	return event;
-
 fail:
 	return NULL;
 }
@@ -904,7 +863,9 @@ __u16 sctp_ulpevent_get_notification_typ
 	return notification->sn_header.sn_type;
 }
 
-/* Copy out the sndrcvinfo into a msghdr.  */
+/* RFC6458, Section 5.3.2. SCTP Header Information Structure
+ * (SCTP_SNDRCV, DEPRECATED)
+ */
 void sctp_ulpevent_read_sndrcvinfo(const struct sctp_ulpevent *event,
 				   struct msghdr *msghdr)
 {
@@ -913,74 +874,21 @@ void sctp_ulpevent_read_sndrcvinfo(const
 	if (sctp_ulpevent_is_notification(event))
 		return;
 
-	/* Sockets API Extensions for SCTP
-	 * Section 5.2.2 SCTP Header Information Structure (SCTP_SNDRCV)
-	 *
-	 * sinfo_stream: 16 bits (unsigned integer)
-	 *
-	 * For recvmsg() the SCTP stack places the message's stream number in
-	 * this value.
-	*/
+	memset(&sinfo, 0, sizeof(sinfo));
 	sinfo.sinfo_stream = event->stream;
-	/* sinfo_ssn: 16 bits (unsigned integer)
-	 *
-	 * For recvmsg() this value contains the stream sequence number that
-	 * the remote endpoint placed in the DATA chunk.  For fragmented
-	 * messages this is the same number for all deliveries of the message
-	 * (if more than one recvmsg() is needed to read the message).
-	 */
 	sinfo.sinfo_ssn = event->ssn;
-	/* sinfo_ppid: 32 bits (unsigned integer)
-	 *
-	 * In recvmsg() this value is
-	 * the same information that was passed by the upper layer in the peer
-	 * application.  Please note that byte order issues are NOT accounted
-	 * for and this information is passed opaquely by the SCTP stack from
-	 * one end to the other.
-	 */
 	sinfo.sinfo_ppid = event->ppid;
-	/* sinfo_flags: 16 bits (unsigned integer)
-	 *
-	 * This field may contain any of the following flags and is composed of
-	 * a bitwise OR of these values.
-	 *
-	 * recvmsg() flags:
-	 *
-	 * SCTP_UNORDERED - This flag is present when the message was sent
-	 *                 non-ordered.
-	 */
 	sinfo.sinfo_flags = event->flags;
-	/* sinfo_tsn: 32 bit (unsigned integer)
-	 *
-	 * For the receiving side, this field holds a TSN that was
-	 * assigned to one of the SCTP Data Chunks.
-	 */
 	sinfo.sinfo_tsn = event->tsn;
-	/* sinfo_cumtsn: 32 bit (unsigned integer)
-	 *
-	 * This field will hold the current cumulative TSN as
-	 * known by the underlying SCTP layer.  Note this field is
-	 * ignored when sending and only valid for a receive
-	 * operation when sinfo_flags are set to SCTP_UNORDERED.
-	 */
 	sinfo.sinfo_cumtsn = event->cumtsn;
-	/* sinfo_assoc_id: sizeof (sctp_assoc_t)
-	 *
-	 * The association handle field, sinfo_assoc_id, holds the identifier
-	 * for the association announced in the COMMUNICATION_UP notification.
-	 * All notifications for a given association have the same identifier.
-	 * Ignored for one-to-one style sockets.
-	 */
 	sinfo.sinfo_assoc_id = sctp_assoc2id(event->asoc);
-
-	/* context value that is set via SCTP_CONTEXT socket option. */
+	/* Context value that is set via SCTP_CONTEXT socket option. */
 	sinfo.sinfo_context = event->asoc->default_rcv_context;
-
 	/* These fields are not used while receiving. */
 	sinfo.sinfo_timetolive = 0;
 
 	put_cmsg(msghdr, IPPROTO_SCTP, SCTP_SNDRCV,
-		 sizeof(struct sctp_sndrcvinfo), (void *)&sinfo);
+		 sizeof(sinfo), &sinfo);
 }
 
 /* Do accounting for bytes received and hold a reference to the association



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 13/23] net: pppoe: use correct channel MTU when using Multilink PPP
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 12/23] net: sctp: fix information leaks in ulpevent layer Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 14/23] sunvnet: clean up objects created in vnet_new() on vnet_exit() Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Schulz, David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Schulz <develop@kristov.de>

[ Upstream commit a8a3e41c67d24eb12f9ab9680cbb85e24fcd9711 ]

The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see
ppp_generic module) tries to determine how big a fragment might be. According
to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the
corresponding comment and code in ppp_mp_explode():

		/*
		 * hdrlen includes the 2-byte PPP protocol field, but the
		 * MTU counts only the payload excluding the protocol field.
		 * (RFC1661 Section 2)
		 */
		mtu = pch->chan->mtu - (hdrlen - 2);

However, the pppoe module *does* include the PPP protocol field in the channel
MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under
certain circumstances (one byte if PPP protocol compression is used, two
otherwise), causing the generated Ethernet packets to be dropped. So the pppoe
module has to subtract two bytes from the channel MTU. This error only
manifests itself when using Multilink PPP, as otherwise the channel MTU is not
used anywhere.

In the following, I will describe how to reproduce this bug. We configure two
pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with
a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is
computed by adding the two link MTUs and subtracting the MP header twice, which
is 4 bytes long.) The necessary pppd statements on both sides are "multilink
mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin
rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server
side, we additionally need to start two pppoe-server instances to be able to
establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU
of the PPP network interface to the MRRU (2976) on both sides of the connection
in order to make use of the higher bandwidth. (If we didn't do that, IP
fragmentation would kick in, which we want to avoid.)

Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to
server over the PPP link. This results in the following network packet:

   2948 (echo payload)
 +    8 (ICMPv4 header)
 +   20 (IPv4 header)
---------------------
   2976 (PPP payload)

These 2976 bytes do not exceed the MTU of the PPP network interface, so the
IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode()
prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger
than the negotiated MRRU. So this packet would have to be divided in three
fragments. But this does not happen as each link MTU is assumed to be two bytes
larger. So this packet is diveded into two fragments only, one of size 1489 and
one of size 1488. Now we have for that bigger fragment:

   1489 (PPP payload)
 +    4 (MP header)
 +    2 (PPP protocol field for the MP payload (0x3d))
 +    6 (PPPoE header)
--------------------------
   1501 (Ethernet payload)

This packet exceeds the link MTU and is discarded.

If one configures the link MTU on the client side to 1501, one can see the
discarded Ethernet frames with tcpdump running on the client. A

ping -s 2948 -c 1 192.168.15.254

leads to the smaller fragment that is correctly received on the server side:

(tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d)
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [end], length 1492

and to the bigger fragment that is not received on the server side:

(tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d)
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1515: PPPoE  [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000,
  Flags [begin], length 1493

With the patch below, we correctly obtain three fragments:

52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [begin], length 1492
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [none], length 1492
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 27: PPPoE  [ses 0x1] MLPPP (0x003d), length 7: seq 0x000,
  Flags [end], length 5

And the ICMPv4 echo request is successfully received at the server side:

IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1),
  length 2976)
    192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0,
      length 2956

The bug was introduced in commit c9aa6895371b2a257401f59d3393c9f7ac5a8698
("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies
to 3.10 upwards but the fix can be applied (with minor modifications) to
kernels as old as 2.6.32.

Signed-off-by: Christoph Schulz <develop@kristov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ppp/pppoe.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -681,7 +681,7 @@ static int pppoe_connect(struct socket *
 		po->chan.hdrlen = (sizeof(struct pppoe_hdr) +
 				   dev->hard_header_len);
 
-		po->chan.mtu = dev->mtu - sizeof(struct pppoe_hdr);
+		po->chan.mtu = dev->mtu - sizeof(struct pppoe_hdr) - 2;
 		po->chan.private = sk;
 		po->chan.ops = &pppoe_chan_ops;
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 14/23] sunvnet: clean up objects created in vnet_new() on vnet_exit()
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 13/23] net: pppoe: use correct channel MTU when using Multilink PPP Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 16/23] dns_resolver: Null-terminate the right string Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sowmini Varadhan, Dave Kleikamp,
	Karl Volz, David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>

[ Upstream commit a4b70a07ed12a71131cab7adce2ce91c71b37060 ]

Nothing cleans up the objects created by
vnet_new(), they are completely leaked.

vnet_exit(), after doing the vio_unregister_driver() to clean
up ports, should call a helper function that iterates over vnet_list
and cleans up those objects. This includes unregister_netdevice()
as well as free_netdev().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Karl Volz <karl.volz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/sun/sunvnet.c |   20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -1086,6 +1086,24 @@ static struct vnet * __devinit vnet_find
 	return vp;
 }
 
+static void vnet_cleanup(void)
+{
+	struct vnet *vp;
+	struct net_device *dev;
+
+	mutex_lock(&vnet_list_mutex);
+	while (!list_empty(&vnet_list)) {
+		vp = list_first_entry(&vnet_list, struct vnet, list);
+		list_del(&vp->list);
+		dev = vp->dev;
+		/* vio_unregister_driver() should have cleaned up port_list */
+		BUG_ON(!list_empty(&vp->port_list));
+		unregister_netdev(dev);
+		free_netdev(dev);
+	}
+	mutex_unlock(&vnet_list_mutex);
+}
+
 static const char *local_mac_prop = "local-mac-address";
 
 static struct vnet * __devinit vnet_find_parent(struct mdesc_handle *hp,
@@ -1244,7 +1262,6 @@ static int vnet_port_remove(struct vio_d
 
 		kfree(port);
 
-		unregister_netdev(vp->dev);
 	}
 	return 0;
 }
@@ -1272,6 +1289,7 @@ static int __init vnet_init(void)
 static void __exit vnet_exit(void)
 {
 	vio_unregister_driver(&vnet_port_driver);
+	vnet_cleanup();
 }
 
 module_init(vnet_init);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 16/23] dns_resolver: Null-terminate the right string
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 14/23] sunvnet: clean up objects created in vnet_new() on vnet_exit() Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 17/23] ipv4: fix buffer overflow in ip_options_compile() Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ben Hutchings, David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <ben@decadent.org.uk>

[ Upstream commit 640d7efe4c08f06c4ae5d31b79bd8740e7f6790a ]

*_result[len] is parsed as *(_result[len]) which is not at all what we
want to touch here.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 84a7c0b1db1c ("dns_resolver: assure that dns_query() result is null-terminated")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/dns_resolver/dns_query.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/dns_resolver/dns_query.c
+++ b/net/dns_resolver/dns_query.c
@@ -151,7 +151,7 @@ int dns_query(const char *type, const ch
 		goto put;
 
 	memcpy(*_result, upayload->data, len);
-	*_result[len] = '\0';
+	(*_result)[len] = '\0';
 
 	if (_expiry)
 		*_expiry = rkey->expiry;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 17/23] ipv4: fix buffer overflow in ip_options_compile()
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 16/23] dns_resolver: Null-terminate the right string Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 19/23] mwifiex: fix Tx timeout issue Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 10ec9472f05b45c94db3c854d22581a20b97db41 ]

There is a benign buffer overflow in ip_options_compile spotted by
AddressSanitizer[1] :

Its benign because we always can access one extra byte in skb->head
(because header is followed by struct skb_shared_info), and in this case
this byte is not even used.

[28504.910798] ==================================================================
[28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile
[28504.913170] Read of size 1 by thread T15843:
[28504.914026]  [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0
[28504.915394]  [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120
[28504.916843]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.918175]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
[28504.919490]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
[28504.920835]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
[28504.922208]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
[28504.923459]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
[28504.924722]
[28504.925106] Allocated by thread T15843:
[28504.925815]  [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120
[28504.926884]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.927975]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
[28504.929175]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
[28504.930400]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
[28504.931677]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
[28504.932851]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
[28504.934018]
[28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right
[28504.934377]  of 40-byte region [ffff880026382800, ffff880026382828)
[28504.937144]
[28504.937474] Memory state around the buggy address:
[28504.938430]  ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr
[28504.939884]  ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.941294]  ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.942504]  ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.943483]  ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.945573]                         ^
[28504.946277]  ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.094949]  ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.096114]  ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.097116]  ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.098472]  ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.099804] Legend:
[28505.100269]  f - 8 freed bytes
[28505.100884]  r - 8 redzone bytes
[28505.101649]  . - 8 allocated bytes
[28505.102406]  x=1..7 - x allocated bytes + (8-x) redzone bytes
[28505.103637] ==================================================================

[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/ip_options.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -279,6 +279,10 @@ int ip_options_compile(struct net *net,
 			optptr++;
 			continue;
 		}
+		if (unlikely(l < 2)) {
+			pp_ptr = optptr;
+			goto error;
+		}
 		optlen = optptr[1];
 		if (optlen<2 || optlen>l) {
 			pp_ptr = optptr;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 19/23] mwifiex: fix Tx timeout issue
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 17/23] ipv4: fix buffer overflow in ip_options_compile() Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 20/23] drm/radeon: avoid leaking edid data Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andrew Wiley, Linus Gasser,
	Michael Hirsch, Xinming Hu, Amitkumar Karwar, Maithili Hinge,
	Avinash Patil, Bing Zhao, John W. Linville

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Amitkumar Karwar <akarwar@marvell.com>

commit d76744a93246eccdca1106037e8ee29debf48277 upstream.

https://bugzilla.kernel.org/show_bug.cgi?id=70191
https://bugzilla.kernel.org/show_bug.cgi?id=77581

It is observed that sometimes Tx packet is downloaded without
adding driver's txpd header. This results in firmware parsing
garbage data as packet length. Sometimes firmware is unable
to read the packet if length comes out as invalid. This stops
further traffic and timeout occurs.

The root cause is uninitialized fields in tx_info(skb->cb) of
packet used to get garbage values. In this case if
MWIFIEX_BUF_FLAG_REQUEUED_PKT flag is mistakenly set, txpd
header was skipped. This patch makes sure that tx_info is
correctly initialized to fix the problem.

Reported-by: Andrew Wiley <wiley.andrew.j@gmail.com>
Reported-by: Linus Gasser <list@markas-al-nour.org>
Reported-by: Michael Hirsch <hirsch@teufel.de>
Tested-by: Xinming Hu <huxm@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Maithili Hinge <maithili@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/net/wireless/mwifiex/main.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/net/wireless/mwifiex/main.c
+++ b/drivers/net/wireless/mwifiex/main.c
@@ -457,6 +457,7 @@ mwifiex_hard_start_xmit(struct sk_buff *
 	}
 
 	tx_info = MWIFIEX_SKB_TXCB(skb);
+	memset(tx_info, 0, sizeof(*tx_info));
 	tx_info->bss_num = priv->bss_num;
 	tx_info->bss_type = priv->bss_type;
 	mwifiex_fill_buffer(skb);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 20/23] drm/radeon: avoid leaking edid data
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 19/23] mwifiex: fix Tx timeout issue Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 21/23] alarmtimer: Fix bug where relative alarm timers were treated as absolute Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alex Deucher

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alex Deucher <alexander.deucher@amd.com>

commit 0ac66effe7fcdee55bda6d5d10d3372c95a41920 upstream.

In some cases we fetch the edid in the detect() callback
in order to determine what sort of monitor is connected.
If that happens, don't fetch the edid again in the get_modes()
callback or we will leak the edid.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/radeon/radeon_display.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -709,6 +709,10 @@ int radeon_ddc_get_modes(struct radeon_c
 	struct radeon_device *rdev = dev->dev_private;
 	int ret = 0;
 
+	/* don't leak the edid if we already fetched it in detect() */
+	if (radeon_connector->edid)
+		goto got_edid;
+
 	/* on hw with routers, select right port */
 	if (radeon_connector->router.ddc_valid)
 		radeon_router_select_ddc_port(radeon_connector);
@@ -748,6 +752,7 @@ int radeon_ddc_get_modes(struct radeon_c
 			radeon_connector->edid = radeon_bios_get_hardcoded_edid(rdev);
 	}
 	if (radeon_connector->edid) {
+got_edid:
 		drm_mode_connector_update_edid_property(&radeon_connector->base, radeon_connector->edid);
 		ret = drm_add_edid_modes(&radeon_connector->base, radeon_connector->edid);
 		drm_edid_to_eld(&radeon_connector->base, radeon_connector->edid);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 21/23] alarmtimer: Fix bug where relative alarm timers were treated as absolute
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 20/23] drm/radeon: avoid leaking edid data Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 22/23] PM / sleep: Fix request_firmware() error at resume Greg Kroah-Hartman
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sharvil Nanavati, John Stultz,
	Thomas Gleixner, Ingo Molnar, Prarit Bhargava

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <john.stultz@linaro.org>

commit 16927776ae757d0d132bdbfabbfe2c498342bd59 upstream.

Sharvil noticed with the posix timer_settime interface, using the
CLOCK_REALTIME_ALARM or CLOCK_BOOTTIME_ALARM clockid, if the users
tried to specify a relative time timer, it would incorrectly be
treated as absolute regardless of the state of the flags argument.

This patch corrects this, properly checking the absolute/relative flag,
as well as adds further error checking that no invalid flag bits are set.

Reported-by: Sharvil Nanavati <sharvil@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sharvil Nanavati <sharvil@google.com>
Link: http://lkml.kernel.org/r/1404767171-6902-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/time/alarmtimer.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -569,9 +569,14 @@ static int alarm_timer_set(struct k_itim
 				struct itimerspec *new_setting,
 				struct itimerspec *old_setting)
 {
+	ktime_t exp;
+
 	if (!rtcdev)
 		return -ENOTSUPP;
 
+	if (flags & ~TIMER_ABSTIME)
+		return -EINVAL;
+
 	if (old_setting)
 		alarm_timer_get(timr, old_setting);
 
@@ -581,8 +586,16 @@ static int alarm_timer_set(struct k_itim
 
 	/* start the timer */
 	timr->it.alarm.interval = timespec_to_ktime(new_setting->it_interval);
-	alarm_start(&timr->it.alarm.alarmtimer,
-			timespec_to_ktime(new_setting->it_value));
+	exp = timespec_to_ktime(new_setting->it_value);
+	/* Convert (if necessary) to absolute time */
+	if (flags != TIMER_ABSTIME) {
+		ktime_t now;
+
+		now = alarm_bases[timr->it.alarm.alarmtimer.type].gettime();
+		exp = ktime_add(now, exp);
+	}
+
+	alarm_start(&timr->it.alarm.alarmtimer, exp);
 	return 0;
 }
 
@@ -714,6 +727,9 @@ static int alarm_timer_nsleep(const cloc
 	if (!alarmtimer_get_rtcdev())
 		return -ENOTSUPP;
 
+	if (flags & ~TIMER_ABSTIME)
+		return -EINVAL;
+
 	if (!capable(CAP_WAKE_ALARM))
 		return -EPERM;
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 22/23] PM / sleep: Fix request_firmware() error at resume
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 21/23] alarmtimer: Fix bug where relative alarm timers were treated as absolute Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-26 19:02 ` [PATCH 3.4 23/23] iommu/vt-d: Disable translation if already enabled Greg Kroah-Hartman
  2014-07-27 14:58 ` [PATCH 3.4 00/23] 3.4.100-stable review Guenter Roeck
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Takashi Iwai, Rafael J. Wysocki

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <tiwai@suse.de>

commit 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586 upstream.

The commit [247bc037: PM / Sleep: Mitigate race between the freezer
and request_firmware()] introduced the finer state control, but it
also leads to a new bug; for example, a bug report regarding the
firmware loading of intel BT device at suspend/resume:
  https://bugzilla.novell.com/show_bug.cgi?id=873790

The root cause seems to be a small window between the process resume
and the clear of usermodehelper lock.  The request_firmware() function
checks the UMH lock and gives up when it's in UMH_DISABLE state.  This
is for avoiding the invalid  f/w loading during suspend/resume phase.
The problem is, however, that usermodehelper_enable() is called at the
end of thaw_processes().  Thus, a thawed process in between can kick
off the f/w loader code path (in this case, via btusb_setup_intel())
even before the call of usermodehelper_enable().  Then
usermodehelper_read_trylock() returns an error and request_firmware()
spews WARN_ON() in the end.

This oneliner patch fixes the issue just by setting to UMH_FREEZING
state again before restarting tasks, so that the call of
request_firmware() will be blocked until the end of this function
instead of returning an error.

Fixes: 247bc0374254 (PM / Sleep: Mitigate race between the freezer and request_firmware())
Link: https://bugzilla.novell.com/show_bug.cgi?id=873790
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/power/process.c |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -185,6 +185,7 @@ void thaw_processes(void)
 
 	printk("Restarting tasks ... ");
 
+	__usermodehelper_set_disable_depth(UMH_FREEZING);
 	thaw_workqueues();
 
 	read_lock(&tasklist_lock);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3.4 23/23] iommu/vt-d: Disable translation if already enabled
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 22/23] PM / sleep: Fix request_firmware() error at resume Greg Kroah-Hartman
@ 2014-07-26 19:02 ` Greg Kroah-Hartman
  2014-07-27 14:58 ` [PATCH 3.4 00/23] 3.4.100-stable review Guenter Roeck
  21 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-26 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Takao Indoh, Joerg Roedel,
	Yijing Wang

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Takao Indoh <indou.takao@jp.fujitsu.com>

commit 3a93c841c2b3b14824f7728dd74bd00a1cedb806 upstream.

This patch disables translation(dma-remapping) before its initialization
if it is already enabled.

This is needed for kexec/kdump boot. If dma-remapping is enabled in the
first kernel, it need to be disabled before initializing its page table
during second kernel boot. Wei Hu also reported that this is needed
when second kernel boots with intel_iommu=off.

Basically iommu->gcmd is used to know whether translation is enabled or
disabled, but it is always zero at boot time even when translation is
enabled since iommu->gcmd is initialized without considering such a
case. Therefor this patch synchronizes iommu->gcmd value with global
command register when iommu structure is allocated.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
[wyj: Backported to 3.4: adjust context]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/iommu/dmar.c        |   11 ++++++++++-
 drivers/iommu/intel-iommu.c |   15 +++++++++++++++
 2 files changed, 25 insertions(+), 1 deletion(-)

--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -582,7 +582,7 @@ int alloc_iommu(struct dmar_drhd_unit *d
 {
 	struct intel_iommu *iommu;
 	int map_size;
-	u32 ver;
+	u32 ver, sts;
 	static int iommu_allocated = 0;
 	int agaw = 0;
 	int msagaw = 0;
@@ -652,6 +652,15 @@ int alloc_iommu(struct dmar_drhd_unit *d
 		(unsigned long long)iommu->cap,
 		(unsigned long long)iommu->ecap);
 
+	/* Reflect status in gcmd */
+	sts = readl(iommu->reg + DMAR_GSTS_REG);
+	if (sts & DMA_GSTS_IRES)
+		iommu->gcmd |= DMA_GCMD_IRE;
+	if (sts & DMA_GSTS_TES)
+		iommu->gcmd |= DMA_GCMD_TE;
+	if (sts & DMA_GSTS_QIES)
+		iommu->gcmd |= DMA_GCMD_QIE;
+
 	raw_spin_lock_init(&iommu->register_lock);
 
 	drhd->iommu = iommu;
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3659,6 +3659,7 @@ static struct notifier_block device_nb =
 int __init intel_iommu_init(void)
 {
 	int ret = 0;
+	struct dmar_drhd_unit *drhd;
 
 	/* VT-d is required for a TXT/tboot launch, so enforce that */
 	force_on = tboot_force_iommu();
@@ -3669,6 +3670,20 @@ int __init intel_iommu_init(void)
 		return 	-ENODEV;
 	}
 
+	/*
+	 * Disable translation if already enabled prior to OS handover.
+	 */
+	for_each_drhd_unit(drhd) {
+		struct intel_iommu *iommu;
+
+		if (drhd->ignored)
+			continue;
+
+		iommu = drhd->iommu;
+		if (iommu->gcmd & DMA_GCMD_TE)
+			iommu_disable_translation(iommu);
+	}
+
 	if (dmar_dev_scope_init() < 0) {
 		if (force_on)
 			panic("tboot: Failed to initialize DMAR device scope\n");



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3.4 00/23] 3.4.100-stable review
  2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2014-07-26 19:02 ` [PATCH 3.4 23/23] iommu/vt-d: Disable translation if already enabled Greg Kroah-Hartman
@ 2014-07-27 14:58 ` Guenter Roeck
  21 siblings, 0 replies; 23+ messages in thread
From: Guenter Roeck @ 2014-07-27 14:58 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, satoru.takeuchi, shuah.kh, stable

On 07/26/2014 12:02 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.4.100 release.
> There are 23 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Mon Jul 28 19:01:37 UTC 2014.
> Anything received after that time might be too late.
>

Build results:
	total: 117 pass: 111 fail: 6
Failed builds:
	alpha:allmodconfig
	arm:spear6xx_defconfig
	score:defconfig
	sparc64:allmodconfig
	unicore32:defconfig
	xtensa:allmodconfig

Qemu tests all passed.

Results are as expected.

Details are available at http://server.roeck-us.net:8010/builders.

Guenter


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-07-27 14:58 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 01/23] crypto: testmgr - update LZO compression test vectors Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 02/23] shmem: fix faulting into a hole while its punched Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 03/23] shmem: fix faulting into a hole, not taking i_mutex Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 04/23] shmem: fix splicing from a hole while its punched Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 05/23] tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 06/23] 8021q: fix a potential memory leak Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 07/23] igmp: fix the problem when mc leave group Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 08/23] tcp: fix false undo corner cases Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 09/23] appletalk: Fix socket referencing in skb Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 10/23] be2net: set EQ DB clear-intr bit in be_open() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 11/23] tipc: clear next-pointer of message fragments before reassembly Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 12/23] net: sctp: fix information leaks in ulpevent layer Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 13/23] net: pppoe: use correct channel MTU when using Multilink PPP Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 14/23] sunvnet: clean up objects created in vnet_new() on vnet_exit() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 16/23] dns_resolver: Null-terminate the right string Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 17/23] ipv4: fix buffer overflow in ip_options_compile() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 19/23] mwifiex: fix Tx timeout issue Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 20/23] drm/radeon: avoid leaking edid data Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 21/23] alarmtimer: Fix bug where relative alarm timers were treated as absolute Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 22/23] PM / sleep: Fix request_firmware() error at resume Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 23/23] iommu/vt-d: Disable translation if already enabled Greg Kroah-Hartman
2014-07-27 14:58 ` [PATCH 3.4 00/23] 3.4.100-stable review Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox