public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
@ 2014-07-31 10:16 Ilya Dryomov
  2014-07-31 11:57 ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Ilya Dryomov @ 2014-07-31 10:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, ceph-devel

This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.

This commit can lead to deadlocks by way of what at a high level
appears to look like a missing wakeup on mutex_unlock() when
CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
their kernels.  In particular, it causes reproducible deadlocks in
libceph/rbd code under higher than moderate loads with the evidence
actually pointing to the bowels of mutex_lock().

kernel/locking/mutex.c, __mutex_lock_common():
476         osq_unlock(&lock->osq);
477 slowpath:
478         /*
479          * If we fell out of the spin path because of need_resched(),
480          * reschedule now, before we try-lock the mutex. This avoids getting
481          * scheduled out right after we obtained the mutex.
482          */
483         if (need_resched())
484                 schedule_preempt_disabled(); <-- never returns
485 #endif
486         spin_lock_mutex(&lock->wait_lock, flags);

We started bumping into deadlocks in QA the day our branch has been
rebased onto 3.15 (the release this commit went in) but then as part of
debugging effort I enabled all locking debug options, which also
disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
which is why it hasn't been looked into until now.  Revert makes the
problem go away, confirmed by our users.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org # 3.15
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
---
 kernel/locking/mutex.c |    7 -------
 1 file changed, 7 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index acca2c1a3c5e..746ff280a2fc 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -475,13 +475,6 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	}
 	osq_unlock(&lock->osq);
 slowpath:
-	/*
-	 * If we fell out of the spin path because of need_resched(),
-	 * reschedule now, before we try-lock the mutex. This avoids getting
-	 * scheduled out right after we obtained the mutex.
-	 */
-	if (need_resched())
-		schedule_preempt_disabled();
 #endif
 	spin_lock_mutex(&lock->wait_lock, flags);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-07-31 10:16 [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point" Ilya Dryomov
@ 2014-07-31 11:57 ` Peter Zijlstra
  2014-07-31 12:37   ` Ilya Dryomov
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-07-31 11:57 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: linux-kernel, Ingo Molnar, ceph-devel, davidlohr, jason.low2

[-- Attachment #1: Type: text/plain, Size: 1571 bytes --]

On Thu, Jul 31, 2014 at 02:16:37PM +0400, Ilya Dryomov wrote:
> This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.
> 
> This commit can lead to deadlocks by way of what at a high level
> appears to look like a missing wakeup on mutex_unlock() when
> CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
> their kernels.  In particular, it causes reproducible deadlocks in
> libceph/rbd code under higher than moderate loads with the evidence
> actually pointing to the bowels of mutex_lock().
> 
> kernel/locking/mutex.c, __mutex_lock_common():
> 476         osq_unlock(&lock->osq);
> 477 slowpath:
> 478         /*
> 479          * If we fell out of the spin path because of need_resched(),
> 480          * reschedule now, before we try-lock the mutex. This avoids getting
> 481          * scheduled out right after we obtained the mutex.
> 482          */
> 483         if (need_resched())
> 484                 schedule_preempt_disabled(); <-- never returns
> 485 #endif
> 486         spin_lock_mutex(&lock->wait_lock, flags);
> 
> We started bumping into deadlocks in QA the day our branch has been
> rebased onto 3.15 (the release this commit went in) but then as part of
> debugging effort I enabled all locking debug options, which also
> disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
> which is why it hasn't been looked into until now.  Revert makes the
> problem go away, confirmed by our users.

This doesn't make sense and you fail to explain how this can possibly
deadlock.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-07-31 11:57 ` Peter Zijlstra
@ 2014-07-31 12:37   ` Ilya Dryomov
  2014-07-31 13:13     ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Ilya Dryomov @ 2014-07-31 12:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux Kernel Mailing List, Ingo Molnar, Ceph Development,
	davidlohr, jason.low2

On Thu, Jul 31, 2014 at 3:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jul 31, 2014 at 02:16:37PM +0400, Ilya Dryomov wrote:
>> This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.
>>
>> This commit can lead to deadlocks by way of what at a high level
>> appears to look like a missing wakeup on mutex_unlock() when
>> CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
>> their kernels.  In particular, it causes reproducible deadlocks in
>> libceph/rbd code under higher than moderate loads with the evidence
>> actually pointing to the bowels of mutex_lock().
>>
>> kernel/locking/mutex.c, __mutex_lock_common():
>> 476         osq_unlock(&lock->osq);
>> 477 slowpath:
>> 478         /*
>> 479          * If we fell out of the spin path because of need_resched(),
>> 480          * reschedule now, before we try-lock the mutex. This avoids getting
>> 481          * scheduled out right after we obtained the mutex.
>> 482          */
>> 483         if (need_resched())
>> 484                 schedule_preempt_disabled(); <-- never returns
>> 485 #endif
>> 486         spin_lock_mutex(&lock->wait_lock, flags);
>>
>> We started bumping into deadlocks in QA the day our branch has been
>> rebased onto 3.15 (the release this commit went in) but then as part of
>> debugging effort I enabled all locking debug options, which also
>> disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
>> which is why it hasn't been looked into until now.  Revert makes the
>> problem go away, confirmed by our users.
>
> This doesn't make sense and you fail to explain how this can possibly
> deadlock.

This didn't make sense to me at first too, and I'll be happy to be
proven wrong, but we can reproduce this with rbd very reliably under
higher than usual load, and the revert makes it go away.  What we are
seeing in the rbd scenario is the following.

Suppose foo needs mutexes A and B, bar needs mutex B.  foo acquires
A and then wants to acquire B, but B is held by bar.  foo spins
a little and ends up calling schedule_preempt_disabled() on line 484
above, but that call never returns, even though a hundred usecs later
bar releases B.  foo ends up stuck in mutex_lock() indefinitely, but
still holds A and everybody else who needs A gets behind A.  Given that
this A happens to be a central libceph mutex all rbd activity halts.
Deadlock may not be the best term for this, but never returning from
mutex_lock(&B) even though B has been unlocked is *a* problem.

This obviously doesn't happen every time schedule_preempt_disabled() on
line 484 is called, so there must be some sort of race here.  I'll send
along the actual rbd stack traces shortly.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-07-31 12:37   ` Ilya Dryomov
@ 2014-07-31 13:13     ` Peter Zijlstra
  2014-07-31 13:25       ` Ilya Dryomov
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Peter Zijlstra @ 2014-07-31 13:13 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Linux Kernel Mailing List, Ingo Molnar, Ceph Development,
	davidlohr, jason.low2

[-- Attachment #1: Type: text/plain, Size: 2107 bytes --]

On Thu, Jul 31, 2014 at 04:37:29PM +0400, Ilya Dryomov wrote:

> This didn't make sense to me at first too, and I'll be happy to be
> proven wrong, but we can reproduce this with rbd very reliably under
> higher than usual load, and the revert makes it go away.  What we are
> seeing in the rbd scenario is the following.

This is drivers/block/rbd.c ? I can find but a single mutex_lock() in
there.

> Suppose foo needs mutexes A and B, bar needs mutex B.  foo acquires
> A and then wants to acquire B, but B is held by bar.  foo spins
> a little and ends up calling schedule_preempt_disabled() on line 484
> above, but that call never returns, even though a hundred usecs later
> bar releases B.  foo ends up stuck in mutex_lock() indefinitely, but
> still holds A and everybody else who needs A gets behind A.  Given that
> this A happens to be a central libceph mutex all rbd activity halts.
> Deadlock may not be the best term for this, but never returning from
> mutex_lock(&B) even though B has been unlocked is *a* problem.
> 
> This obviously doesn't happen every time schedule_preempt_disabled() on
> line 484 is called, so there must be some sort of race here.  I'll send
> along the actual rbd stack traces shortly.

Smells like maybe current->state != TASK_RUNNING, does the below
trigger?

If so, you've wrecked something in whatever...

---
 kernel/locking/mutex.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index ae712b25e492..3d726fdaa764 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -473,8 +473,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	 * reschedule now, before we try-lock the mutex. This avoids getting
 	 * scheduled out right after we obtained the mutex.
 	 */
-	if (need_resched())
+	if (need_resched()) {
+		if (WARN_ON_ONCE(current->state != TASK_RUNNING))
+			__set_current_state(TASK_RUNNING);
+
 		schedule_preempt_disabled();
+	}
 #endif
 	spin_lock_mutex(&lock->wait_lock, flags);
 

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-07-31 13:13     ` Peter Zijlstra
@ 2014-07-31 13:25       ` Ilya Dryomov
  2014-07-31 13:44       ` Ingo Molnar
  2014-07-31 14:30       ` [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point" Mike Galbraith
  2 siblings, 0 replies; 14+ messages in thread
From: Ilya Dryomov @ 2014-07-31 13:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux Kernel Mailing List, Ingo Molnar, Ceph Development,
	davidlohr, jason.low2

[-- Attachment #1: Type: text/plain, Size: 2040 bytes --]

On Thu, Jul 31, 2014 at 5:13 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jul 31, 2014 at 04:37:29PM +0400, Ilya Dryomov wrote:
>
>> This didn't make sense to me at first too, and I'll be happy to be
>> proven wrong, but we can reproduce this with rbd very reliably under
>> higher than usual load, and the revert makes it go away.  What we are
>> seeing in the rbd scenario is the following.
>
> This is drivers/block/rbd.c ? I can find but a single mutex_lock() in
> there.

This is in net/ceph, include/linux/ceph.

Mutex A - struct ceph_osd_client::request_mutex, taken in alloc_msg(),
handle_timeout(), handle_osds_timeout(), ceph_osdc_start_request().

Mutex B - struct ceph_connection::mutex, taken in ceph_con_send().

dmesg with a sample dump of blocked tasks attached.

Basically everybody except kjournald:4398 is waiting for request_mutex,
which kjournald acquired in ceph_osdc_start_request().  kjournald
however itself sits waiting for ceph_connection::mutex, even though it
has been released.

>> Suppose foo needs mutexes A and B, bar needs mutex B.  foo acquires
>> A and then wants to acquire B, but B is held by bar.  foo spins
>> a little and ends up calling schedule_preempt_disabled() on line 484
>> above, but that call never returns, even though a hundred usecs later
>> bar releases B.  foo ends up stuck in mutex_lock() indefinitely, but
>> still holds A and everybody else who needs A gets behind A.  Given that
>> this A happens to be a central libceph mutex all rbd activity halts.
>> Deadlock may not be the best term for this, but never returning from
>> mutex_lock(&B) even though B has been unlocked is *a* problem.
>>
>> This obviously doesn't happen every time schedule_preempt_disabled() on
>> line 484 is called, so there must be some sort of race here.  I'll send
>> along the actual rbd stack traces shortly.
>
> Smells like maybe current->state != TASK_RUNNING, does the below
> trigger?
>
> If so, you've wrecked something in whatever...

Trying it now.

Thanks,

                Ilya

[-- Attachment #2: dmesg-11 --]
[-- Type: application/octet-stream, Size: 80155 bytes --]

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.16.0-rc7-ceph-01781-g721bdd2 (ubuntu@burnupi06) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #17 SMP Wed Jul 30 12:57:20 PDT 2014
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-rc7-ceph-01781-g721bdd2 root=UUID=688201a9-609d-49f5-8d37-af80d514f826 ro pci=realloc=off console=tty0 console=ttyS2,115200
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ebff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009ec00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bf78ffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf79e000-0x00000000bf79ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bf7a0000-0x00000000bf7adfff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bf7ae000-0x00000000bf7cffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf7d0000-0x00000000bf7dffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bf7ed000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed20000-0x00000000fed3ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000043fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.6 present.
[    0.000000] DMI: Supermicro X8SIL/X8SIL, BIOS 1.1 05/27/2010
[    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] AGP: No AGP bridge found
[    0.000000] e820: last_pfn = 0x440000 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: uncachable
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-CBFFF write-protect
[    0.000000]   CC000-DFFFF uncachable
[    0.000000]   E0000-E3FFF write-protect
[    0.000000]   E4000-EFFFF write-through
[    0.000000]   F0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 000000000 mask C00000000 write-back
[    0.000000]   1 base 400000000 mask FC0000000 write-back
[    0.000000]   2 base 0C0000000 mask FC0000000 uncachable
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] original variable MTRRs
[    0.000000] reg 0, base: 0GB, range: 16GB, type WB
[    0.000000] reg 1, base: 16GB, range: 1GB, type WB
[    0.000000] reg 2, base: 3GB, range: 1GB, type UC
[    0.000000] total RAM covered: 16384M
[    0.000000] Found optimal setting for mtrr clean up
[    0.000000]  gran_size: 64K 	chunk_size: 64K 	num_reg: 5  	lose cover RAM: 0G
[    0.000000] New variable MTRRs
[    0.000000] reg 0, base: 0GB, range: 2GB, type WB
[    0.000000] reg 1, base: 2GB, range: 1GB, type WB
[    0.000000] reg 2, base: 4GB, range: 4GB, type WB
[    0.000000] reg 3, base: 8GB, range: 8GB, type WB
[    0.000000] reg 4, base: 16GB, range: 1GB, type WB
[    0.000000] e820: update [mem 0xc0000000-0xffffffff] usable ==> reserved
[    0.000000] e820: last_pfn = 0xbf790 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at [ffff8800000ff780]
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] Base memory trampoline at [ffff880000098000] 98000 size 24576
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[    0.000000]  [mem 0x00000000-0x000fffff] page 4k
[    0.000000] BRK [0x01f4a000, 0x01f4afff] PGTABLE
[    0.000000] BRK [0x01f4b000, 0x01f4bfff] PGTABLE
[    0.000000] BRK [0x01f4c000, 0x01f4cfff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x43fe00000-0x43fffffff]
[    0.000000]  [mem 0x43fe00000-0x43fffffff] page 2M
[    0.000000] BRK [0x01f4d000, 0x01f4dfff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x43c000000-0x43fdfffff]
[    0.000000]  [mem 0x43c000000-0x43fdfffff] page 2M
[    0.000000] init_memory_mapping: [mem 0x400000000-0x43bffffff]
[    0.000000]  [mem 0x400000000-0x43bffffff] page 2M
[    0.000000] init_memory_mapping: [mem 0x00100000-0xbf78ffff]
[    0.000000]  [mem 0x00100000-0x001fffff] page 4k
[    0.000000]  [mem 0x00200000-0xbf5fffff] page 2M
[    0.000000]  [mem 0xbf600000-0xbf78ffff] page 4k
[    0.000000] init_memory_mapping: [mem 0x100000000-0x3ffffffff]
[    0.000000]  [mem 0x100000000-0x3ffffffff] page 2M
[    0.000000] BRK [0x01f4e000, 0x01f4efff] PGTABLE
[    0.000000] BRK [0x01f4f000, 0x01f4ffff] PGTABLE
[    0.000000] RAMDISK: [mem 0x376d8000-0x37b63fff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000FA280 000024 (v02 ACPIAM)
[    0.000000] ACPI: XSDT 0x00000000BF7A0100 000084 (v01 SMCI            20100527 MSFT 00000097)
[    0.000000] ACPI: FACP 0x00000000BF7A0290 0000F4 (v04 052710 FACP1450 20100527 MSFT 00000097)
[    0.000000] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20140424/tbfadt-618)
[    0.000000] ACPI: DSDT 0x00000000BF7A05F0 006C7B (v02 10605  10605000 00000000 INTL 20051117)
[    0.000000] ACPI: FACS 0x00000000BF7AE000 000040
[    0.000000] ACPI: APIC 0x00000000BF7A0390 000092 (v02 052710 APIC1450 20100527 MSFT 00000097)
[    0.000000] ACPI: MCFG 0x00000000BF7A0430 00003C (v01 052710 OEMMCFG  20100527 MSFT 00000097)
[    0.000000] ACPI: OEMB 0x00000000BF7AE040 000073 (v01 052710 OEMB1450 20100527 MSFT 00000097)
[    0.000000] ACPI: HPET 0x00000000BF7AA5F0 000038 (v01 052710 OEMHPET  20100527 MSFT 00000097)
[    0.000000] ACPI: GSCI 0x00000000BF7AE0C0 002024 (v01 052710 GMCHSCI  20100527 MSFT 00000097)
[    0.000000] ACPI: DMAR 0x00000000BF7B00F0 000090 (v01 AMI    OEMDMAR  00000001 MSFT 00000097)
[    0.000000] ACPI: SSDT 0x00000000BF7B2810 000363 (v01 DpgPmm CpuPm    00000012 INTL 20051117)
[    0.000000] ACPI: EINJ 0x00000000BF7AA630 000130 (v01 AMIER  AMI_EINJ 20100527 MSFT 00000097)
[    0.000000] ACPI: BERT 0x00000000BF7AA7C0 000030 (v01 AMIER  AMI_BERT 20100527 MSFT 00000097)
[    0.000000] ACPI: ERST 0x00000000BF7AA7F0 0001B0 (v01 AMIER  AMI_ERST 20100527 MSFT 00000097)
[    0.000000] ACPI: HEST 0x00000000BF7AA9A0 0000A8 (v01 AMIER  ABC_HEST 20100527 MSFT 00000097)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000043fffffff]
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x43fffffff]
[    0.000000]   NODE_DATA [mem 0x43ffee000-0x43fff2fff]
[    0.000000]  [ffffea0000000000-ffffea0010ffffff] PMD -> [ffff88042f600000-ffff88043f5fffff] on node 0
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   [mem 0x100000000-0x43fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x0009dfff]
[    0.000000]   node   0: [mem 0x00100000-0xbf78ffff]
[    0.000000]   node   0: [mem 0x100000000-0x43fffffff]
[    0.000000] On node 0 totalpages: 4192045
[    0.000000]   DMA zone: 64 pages used for memmap
[    0.000000]   DMA zone: 21 pages reserved
[    0.000000]   DMA zone: 3997 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 12191 pages used for memmap
[    0.000000]   DMA32 zone: 780176 pages, LIFO batch:31
[    0.000000]   Normal zone: 53248 pages used for memmap
[    0.000000]   Normal zone: 3407872 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0x808
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x05] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[    0.000000] smpboot: Allowing 8 CPUs, 0 hotplug CPUs
[    0.000000] nr_irqs_gsi: 40
[    0.000000] PM: Registered nosave memory: [mem 0x0009e000-0x0009efff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000e0000-0x000fffff]
[    0.000000] PM: Registered nosave memory: [mem 0xbf790000-0xbf79dfff]
[    0.000000] PM: Registered nosave memory: [mem 0xbf79e000-0xbf79ffff]
[    0.000000] PM: Registered nosave memory: [mem 0xbf7a0000-0xbf7adfff]
[    0.000000] PM: Registered nosave memory: [mem 0xbf7ae000-0xbf7cffff]
[    0.000000] PM: Registered nosave memory: [mem 0xbf7d0000-0xbf7dffff]
[    0.000000] PM: Registered nosave memory: [mem 0xbf7e0000-0xbf7ecfff]
[    0.000000] PM: Registered nosave memory: [mem 0xbf7ed000-0xbfffffff]
[    0.000000] PM: Registered nosave memory: [mem 0xc0000000-0xdfffffff]
[    0.000000] PM: Registered nosave memory: [mem 0xe0000000-0xefffffff]
[    0.000000] PM: Registered nosave memory: [mem 0xf0000000-0xfed1ffff]
[    0.000000] PM: Registered nosave memory: [mem 0xfed20000-0xfed3ffff]
[    0.000000] PM: Registered nosave memory: [mem 0xfed40000-0xfedfffff]
[    0.000000] PM: Registered nosave memory: [mem 0xfee00000-0xfee00fff]
[    0.000000] PM: Registered nosave memory: [mem 0xfee01000-0xffffffff]
[    0.000000] e820: [mem 0xc0000000-0xdfffffff] available for PCI devices
[    0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:8 nr_node_ids:1
[    0.000000] PERCPU: Embedded 27 pages/cpu @ffff88043fc00000 s81088 r8192 d21312 u262144
[    0.000000] pcpu-alloc: s81088 r8192 d21312 u262144 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 1 2 3 4 5 6 7 
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 4126521
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-rc7-ceph-01781-g721bdd2 root=UUID=688201a9-609d-49f5-8d37-af80d514f826 ro pci=realloc=off console=tty0 console=ttyS2,115200
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] AGP: Checking aperture...
[    0.000000] AGP: No AGP bridge found
[    0.000000] Calgary: detecting Calgary via BIOS EBDA area
[    0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
[    0.000000] Memory: 16418372K/16768180K available (6785K kernel code, 996K rwdata, 3036K rodata, 1204K init, 1128K bss, 349808K reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU dyntick-idle grace-period acceleration is enabled.
[    0.000000] 	RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[    0.000000] NR_IRQS:16640 nr_irqs:744 16
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [ttyS2] enabled
[    0.000000] allocated 67108864 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[    0.000000] hpet clockevent registered
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 2533.192 MHz processor
[    0.000035] Calibrating delay loop (skipped), value calculated using timer frequency.. 5066.38 BogoMIPS (lpj=10132768)
[    0.010860] pid_max: default: 32768 minimum: 301
[    0.015538] ACPI: Core revision 20140424
[    0.024804] ACPI: All ACPI Tables successfully acquired
[    0.055591] Security Framework initialized
[    0.059748] AppArmor: AppArmor initialized
[    0.063899] Yama: becoming mindful.
[    0.068706] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
[    0.080416] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.089733] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.096696] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.104454] Initializing cgroup subsys memory
[    0.108892] Initializing cgroup subsys devices
[    0.113401] Initializing cgroup subsys freezer
[    0.117902] Initializing cgroup subsys net_cls
[    0.122408] Initializing cgroup subsys blkio
[    0.126737] Initializing cgroup subsys perf_event
[    0.131500] Initializing cgroup subsys hugetlb
[    0.136026] CPU: Physical Processor ID: 0
[    0.140091] CPU: Processor Core ID: 0
[    0.143810] mce: CPU supports 9 MCE banks
[    0.147890] CPU0: Thermal monitoring enabled (TM1)
[    0.152749] Last level iTLB entries: 4KB 512, 2MB 7, 4MB 7
[    0.152749] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
[    0.152749] tlb_flushall_shift: 6
[    0.167973] Freeing SMP alternatives memory: 24K (ffffffff81e28000 - ffffffff81e2e000)
[    0.175957] ftrace: allocating 26150 entries in 103 pages
[    0.194886] dmar: Host address width 36
[    0.198779] dmar: DRHD base: 0x000000fed90000 flags: 0x1
[    0.204156] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c90780106f0462 ecap f020e3
[    0.212310] dmar: RMRR base: 0x000000000ed000 end: 0x000000000effff
[    0.218634] dmar: RMRR base: 0x000000bf7ed000 end: 0x000000bf7fffff
[    0.225443] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.271158] smpboot: CPU0: Intel(R) Xeon(R) CPU           X3440  @ 2.53GHz (fam: 06, model: 1e, stepping: 05)
[    0.387997] Performance Events: PEBS fmt1+, 16-deep LBR, Nehalem events, Intel PMU driver.
[    0.396545] perf_event_intel: CPU erratum AAJ80 worked around
[    0.402342] perf_event_intel: CPUID marked event: 'bus cycles' unavailable
[    0.409265] ... version:                3
[    0.413321] ... bit width:              48
[    0.418564] ... generic registers:      4
[    0.422626] ... value mask:             0000ffffffffffff
[    0.427988] ... max period:             000000007fffffff
[    0.433350] ... fixed-purpose events:   3
[    0.437411] ... event mask:             000000070000000f
[    0.444157] x86: Booting SMP configuration:
[    0.448391] .... node  #0, CPUs:      #1
[    0.465525] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[    0.473932]  #2 #3 #4 #5 #6 #7
[    0.556443] x86: Booted up 1 node, 8 CPUs
[    0.560743] smpboot: Total of 8 processors activated (40531.07 BogoMIPS)
[    0.573041] devtmpfs: initialized
[    0.580982] evm: security.selinux
[    0.584346] evm: security.SMACK64
[    0.587706] evm: security.capability
[    0.591378] PM: Registering ACPI NVS region [mem 0xbf7ae000-0xbf7cffff] (139264 bytes)
[    0.600457] regulator-dummy: no parameters
[    0.604657] NET: Registered protocol family 16
[    0.609271] cpuidle: using governor ladder
[    0.613419] cpuidle: using governor menu
[    0.617440] ACPI: bus type PCI registered
[    0.621556] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)
[    0.630922] PCI: MMCONFIG at [mem 0xe0000000-0xefffffff] reserved in E820
[    0.637970] PCI: Using configuration type 1 for base access
[    0.656297] ACPI: Added _OSI(Module Device)
[    0.660534] ACPI: Added _OSI(Processor Device)
[    0.665028] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.669783] ACPI: Added _OSI(Processor Aggregator Device)
[    0.677012] ACPI: Executed 1 blocks of module-level executable AML code
[    0.697317] ACPI: Dynamic OEM Table Load:
[    0.701495] ACPI: SSDT 0xFFFF8804293AC000 00202C (v01 DpgPmm P001Ist  00000011 INTL 20051117)
[    0.710718] ACPI: Dynamic OEM Table Load:
[    0.714890] ACPI: SSDT 0xFFFF8804293B8000 000659 (v01 PmRef  P001Cst  00003001 INTL 20051117)
[    0.724150] ACPI: Interpreter enabled
[    0.727867] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20140424/hwxface-580)
[    0.737256] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S3_] (20140424/hwxface-580)
[    0.746644] ACPI: (supports S0 S1 S4 S5)
[    0.750615] ACPI: Using IOAPIC for interrupt routing
[    0.755672] HEST: Table parsing has been initialized.
[    0.760773] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.775944] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.782182] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    0.790582] acpi PNP0A08:00: _OSC: platform does not support [PCIeHotplug]
[    0.797657] acpi PNP0A08:00: _OSC: OS now controls [PME AER PCIeCapability]
[    0.804806] PCI host bridge to bus 0000:00
[    0.808957] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.814494] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
[    0.820725] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
[    0.826952] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[    0.833875] pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000dffff]
[    0.840795] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff]
[    0.847715] pci_bus 0000:00: root bus resource [mem 0xf0000000-0xfed8ffff]
[    0.854645] pci 0000:00:00.0: [8086:d130] type 00 class 0x060000
[    0.854750] pci 0000:00:03.0: [8086:d138] type 01 class 0x060400
[    0.854798] pci 0000:00:03.0: PME# supported from D0 D3hot D3cold
[    0.854838] pci 0000:00:03.0: System wakeup disabled by ACPI
[    0.860588] pci 0000:00:05.0: [8086:d13a] type 01 class 0x060400
[    0.860634] pci 0000:00:05.0: PME# supported from D0 D3hot D3cold
[    0.860678] pci 0000:00:05.0: System wakeup disabled by ACPI
[    0.866424] pci 0000:00:08.0: [8086:d155] type 00 class 0x088000
[    0.866522] pci 0000:00:08.1: [8086:d156] type 00 class 0x088000
[    0.866614] pci 0000:00:08.2: [8086:d157] type 00 class 0x088000
[    0.866704] pci 0000:00:08.3: [8086:d158] type 00 class 0x088000
[    0.866791] pci 0000:00:10.0: [8086:d150] type 00 class 0x088000
[    0.866869] pci 0000:00:10.1: [8086:d151] type 00 class 0x088000
[    0.866968] pci 0000:00:1a.0: [8086:3b3c] type 00 class 0x0c0320
[    0.866989] pci 0000:00:1a.0: reg 0x10: [mem 0xfb3fc000-0xfb3fc3ff]
[    0.867072] pci 0000:00:1a.0: PME# supported from D0 D3hot D3cold
[    0.867114] pci 0000:00:1a.0: System wakeup disabled by ACPI
[    0.872866] pci 0000:00:1c.0: [8086:3b42] type 01 class 0x060400
[    0.872933] pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
[    0.872979] pci 0000:00:1c.0: System wakeup disabled by ACPI
[    0.878726] pci 0000:00:1c.4: [8086:3b4a] type 01 class 0x060400
[    0.878793] pci 0000:00:1c.4: PME# supported from D0 D3hot D3cold
[    0.878836] pci 0000:00:1c.4: System wakeup disabled by ACPI
[    0.884579] pci 0000:00:1c.5: [8086:3b4c] type 01 class 0x060400
[    0.884645] pci 0000:00:1c.5: PME# supported from D0 D3hot D3cold
[    0.884688] pci 0000:00:1c.5: System wakeup disabled by ACPI
[    0.890434] pci 0000:00:1d.0: [8086:3b34] type 00 class 0x0c0320
[    0.890454] pci 0000:00:1d.0: reg 0x10: [mem 0xfb3fa000-0xfb3fa3ff]
[    0.890538] pci 0000:00:1d.0: PME# supported from D0 D3hot D3cold
[    0.890581] pci 0000:00:1d.0: System wakeup disabled by ACPI
[    0.896325] pci 0000:00:1e.0: [8086:244e] type 01 class 0x060401
[    0.896403] pci 0000:00:1e.0: System wakeup disabled by ACPI
[    0.902150] pci 0000:00:1f.0: [8086:3b14] type 00 class 0x060100
[    0.902317] pci 0000:00:1f.2: [8086:3b22] type 00 class 0x010601
[    0.902334] pci 0000:00:1f.2: reg 0x10: [io  0xc400-0xc407]
[    0.902341] pci 0000:00:1f.2: reg 0x14: [io  0xcc00-0xcc03]
[    0.902348] pci 0000:00:1f.2: reg 0x18: [io  0xc880-0xc887]
[    0.902355] pci 0000:00:1f.2: reg 0x1c: [io  0xc800-0xc803]
[    0.902362] pci 0000:00:1f.2: reg 0x20: [io  0xc480-0xc49f]
[    0.902370] pci 0000:00:1f.2: reg 0x24: [mem 0xfb3f8000-0xfb3f87ff]
[    0.902412] pci 0000:00:1f.2: PME# supported from D3hot
[    0.902481] pci 0000:00:1f.3: [8086:3b30] type 00 class 0x0c0500
[    0.902494] pci 0000:00:1f.3: reg 0x10: [mem 0xfb3f6000-0xfb3f60ff 64bit]
[    0.902513] pci 0000:00:1f.3: reg 0x20: [io  0x0400-0x041f]
[    0.902619] pci 0000:00:03.0: PCI bridge to [bus 01]
[    0.907677] pci 0000:00:05.0: PCI bridge to [bus 02]
[    0.912751] pci 0000:03:00.0: [17d3:1680] type 00 class 0x010400
[    0.912770] pci 0000:03:00.0: reg 0x10: [mem 0xfb4ee000-0xfb4effff]
[    0.912827] pci 0000:03:00.0: reg 0x30: [mem 0xfb4f0000-0xfb4fffff pref]
[    0.912881] pci 0000:03:00.0: supports D1
[    0.922021] pci 0000:00:1c.0: PCI bridge to [bus 03]
[    0.927044] pci 0000:00:1c.0:   bridge window [mem 0xfb400000-0xfb4fffff]
[    0.927119] pci 0000:04:00.0: [8086:10d3] type 00 class 0x020000
[    0.927147] pci 0000:04:00.0: reg 0x10: [mem 0xfb5e0000-0xfb5fffff]
[    0.927178] pci 0000:04:00.0: reg 0x18: [io  0xdc00-0xdc1f]
[    0.927195] pci 0000:04:00.0: reg 0x1c: [mem 0xfb5dc000-0xfb5dffff]
[    0.927342] pci 0000:04:00.0: PME# supported from D0 D3hot D3cold
[    0.934019] pci 0000:00:1c.4: PCI bridge to [bus 04]
[    0.939040] pci 0000:00:1c.4:   bridge window [io  0xd000-0xdfff]
[    0.939043] pci 0000:00:1c.4:   bridge window [mem 0xfb500000-0xfb5fffff]
[    0.939120] pci 0000:05:00.0: [8086:10d3] type 00 class 0x020000
[    0.939148] pci 0000:05:00.0: reg 0x10: [mem 0xfb6e0000-0xfb6fffff]
[    0.939179] pci 0000:05:00.0: reg 0x18: [io  0xec00-0xec1f]
[    0.939196] pci 0000:05:00.0: reg 0x1c: [mem 0xfb6dc000-0xfb6dffff]
[    0.939343] pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
[    0.946016] pci 0000:00:1c.5: PCI bridge to [bus 05]
[    0.951036] pci 0000:00:1c.5:   bridge window [io  0xe000-0xefff]
[    0.951040] pci 0000:00:1c.5:   bridge window [mem 0xfb600000-0xfb6fffff]
[    0.951084] pci 0000:06:03.0: [102b:0532] type 00 class 0x030000
[    0.951099] pci 0000:06:03.0: reg 0x10: [mem 0xfa000000-0xfaffffff pref]
[    0.951108] pci 0000:06:03.0: reg 0x14: [mem 0xfb7fc000-0xfb7fffff]
[    0.951117] pci 0000:06:03.0: reg 0x18: [mem 0xfb800000-0xfbffffff]
[    0.951230] pci 0000:00:1e.0: PCI bridge to [bus 06] (subtractive decode)
[    0.958066] pci 0000:00:1e.0:   bridge window [mem 0xfb700000-0xfbffffff]
[    0.958071] pci 0000:00:1e.0:   bridge window [mem 0xfa000000-0xfaffffff 64bit pref]
[    0.958073] pci 0000:00:1e.0:   bridge window [io  0x0000-0x0cf7] (subtractive decode)
[    0.958075] pci 0000:00:1e.0:   bridge window [io  0x0d00-0xffff] (subtractive decode)
[    0.958077] pci 0000:00:1e.0:   bridge window [mem 0x000a0000-0x000bffff] (subtractive decode)
[    0.958078] pci 0000:00:1e.0:   bridge window [mem 0x000d0000-0x000dffff] (subtractive decode)
[    0.958080] pci 0000:00:1e.0:   bridge window [mem 0xc0000000-0xdfffffff] (subtractive decode)
[    0.958082] pci 0000:00:1e.0:   bridge window [mem 0xf0000000-0xfed8ffff] (subtractive decode)
[    0.958684] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 6 7 *10 11 12 14 15)
[    0.966218] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 6 7 10 *11 12 14 15)
[    0.973752] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *6 7 10 11 12 14 15)
[    0.981279] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 6 7 10 11 12 *14 15)
[    0.988806] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 6 7 10 11 12 14 15) *0, disabled.
[    0.997580] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 6 *7 10 11 12 14 15)
[    1.005108] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 6 7 10 11 12 14 15) *0, disabled.
[    1.013883] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 6 7 10 11 12 14 *15)
[    1.021413] ACPI: Enabled 2 GPEs in block 00 to 3F
[    1.026456] vgaarb: device added: PCI:0000:06:03.0,decodes=io+mem,owns=io+mem,locks=none
[    1.034610] vgaarb: loaded
[    1.037364] vgaarb: bridge control possible 0000:06:03.0
[    1.042921] SCSI subsystem initialized
[    1.046762] libata version 3.00 loaded.
[    1.046785] ACPI: bus type USB registered
[    1.050865] usbcore: registered new interface driver usbfs
[    1.056410] usbcore: registered new interface driver hub
[    1.061794] usbcore: registered new device driver usb
[    1.066990] PCI: Using ACPI for IRQ routing
[    1.076914] PCI: Discovered peer bus ff
[    1.080798] PCI: root bus ff: using default resources
[    1.080799] PCI: Probing PCI hardware (bus ff)
[    1.080821] PCI host bridge to bus 0000:ff
[    1.084967] pci_bus 0000:ff: root bus resource [io  0x0000-0xffff]
[    1.091199] pci_bus 0000:ff: root bus resource [mem 0x00000000-0xfffffffff]
[    1.098208] pci_bus 0000:ff: No busn resource found for root bus, will use [bus ff-ff]
[    1.106183] pci_bus 0000:ff: busn_res: can not insert [bus ff] under domain [bus 00-ff] (conflicts with (null) [bus 00-ff])
[    1.106188] pci 0000:ff:00.0: [8086:2c50] type 00 class 0x060000
[    1.106230] pci 0000:ff:00.1: [8086:2c81] type 00 class 0x060000
[    1.106271] pci 0000:ff:02.0: [8086:2c90] type 00 class 0x060000
[    1.106308] pci 0000:ff:02.1: [8086:2c91] type 00 class 0x060000
[    1.106346] pci 0000:ff:03.0: [8086:2c98] type 00 class 0x060000
[    1.106383] pci 0000:ff:03.1: [8086:2c99] type 00 class 0x060000
[    1.106420] pci 0000:ff:03.2: [8086:2c9a] type 00 class 0x060000
[    1.106457] pci 0000:ff:03.4: [8086:2c9c] type 00 class 0x060000
[    1.106496] pci 0000:ff:04.0: [8086:2ca0] type 00 class 0x060000
[    1.106533] pci 0000:ff:04.1: [8086:2ca1] type 00 class 0x060000
[    1.106569] pci 0000:ff:04.2: [8086:2ca2] type 00 class 0x060000
[    1.106606] pci 0000:ff:04.3: [8086:2ca3] type 00 class 0x060000
[    1.106645] pci 0000:ff:05.0: [8086:2ca8] type 00 class 0x060000
[    1.106682] pci 0000:ff:05.1: [8086:2ca9] type 00 class 0x060000
[    1.106719] pci 0000:ff:05.2: [8086:2caa] type 00 class 0x060000
[    1.106756] pci 0000:ff:05.3: [8086:2cab] type 00 class 0x060000
[    1.106803] pci_bus 0000:ff: busn_res: [bus ff] end is updated to ff
[    1.106806] pci_bus 0000:ff: busn_res: can not insert [bus ff] under domain [bus 00-ff] (conflicts with (null) [bus 00-ff])
[    1.106809] PCI: pci_cache_line_size set to 64 bytes
[    1.106868] e820: reserve RAM buffer [mem 0x0009ec00-0x0009ffff]
[    1.106870] e820: reserve RAM buffer [mem 0xbf790000-0xbfffffff]
[    1.106967] NetLabel: Initializing
[    1.110424] NetLabel:  domain hash size = 128
[    1.114833] NetLabel:  protocols = UNLABELED CIPSOv4
[    1.119862] NetLabel:  unlabeled traffic allowed by default
[    1.125525] HPET: 8 timers in total, 5 timers will be used for per-cpu timer
[    1.132626] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 40, 41, 42, 43, 44, 0
[    1.139755] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
[    1.147682] hpet: hpet2 irq 40 for MSI
[    1.147723] hpet: hpet3 irq 41 for MSI
[    1.147767] hpet: hpet4 irq 42 for MSI
[    1.147805] hpet: hpet5 irq 43 for MSI
[    1.147848] hpet: hpet6 irq 44 for MSI
[    1.147946] Switched to clocksource hpet
[    1.157035] AppArmor: AppArmor Filesystem Enabled
[    1.161824] pnp: PnP ACPI init
[    1.164950] ACPI: bus type PNP registered
[    1.169068] system 00:00: [mem 0xfc000000-0xfcffffff] has been reserved
[    1.175737] system 00:00: [mem 0xfd000000-0xfdffffff] has been reserved
[    1.182406] system 00:00: [mem 0xfe000000-0xfebfffff] has been reserved
[    1.189075] system 00:00: [mem 0xfed14000-0xfed19fff] has been reserved
[    1.195738] system 00:00: Plug and Play ACPI device, IDs PNP0c01 (active)
[    1.195805] pnp 00:01: Plug and Play ACPI device, IDs PNP0b00 (active)
[    1.195985] pnp 00:02: [dma 0 disabled]
[    1.196037] pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active)
[    1.196198] pnp 00:03: [dma 0 disabled]
[    1.196265] pnp 00:03: Plug and Play ACPI device, IDs PNP0501 (active)
[    1.196470] pnp 00:04: [dma 0 disabled]
[    1.196519] pnp 00:04: Plug and Play ACPI device, IDs PNP0501 (active)
[    1.196592] system 00:05: [io  0x164e-0x164f] has been reserved
[    1.202565] system 00:05: Plug and Play ACPI device, IDs PNP0c02 (active)
[    1.202690] system 00:06: [io  0x0a00-0x0a0f] has been reserved
[    1.208662] system 00:06: [io  0x0a10-0x0a1f] has been reserved
[    1.214638] system 00:06: Plug and Play ACPI device, IDs PNP0c02 (active)
[    1.214773] system 00:07: [io  0x0ca2-0x0ca3] has been reserved
[    1.220747] system 00:07: [io  0x0cf8-0x0cff] could not be reserved
[    1.227067] system 00:07: [io  0x04d0-0x04d1] has been reserved
[    1.233035] system 00:07: [io  0x0800-0x087f] could not be reserved
[    1.239349] system 00:07: [io  0x0500-0x057f] has been reserved
[    1.245318] system 00:07: [mem 0xfed1c000-0xfed1ffff] has been reserved
[    1.251980] system 00:07: [mem 0xfed20000-0xfed3ffff] has been reserved
[    1.258651] system 00:07: [mem 0xfed40000-0xfed8ffff] has been reserved
[    1.265322] system 00:07: Plug and Play ACPI device, IDs PNP0c02 (active)
[    1.265435] system 00:08: [mem 0xfec00000-0xfec00fff] could not be reserved
[    1.272455] system 00:08: [mem 0xfee00000-0xfee00fff] has been reserved
[    1.279120] system 00:08: Plug and Play ACPI device, IDs PNP0c02 (active)
[    1.279188] system 00:09: [mem 0xe0000000-0xefffffff] has been reserved
[    1.285861] system 00:09: Plug and Play ACPI device, IDs PNP0c02 (active)
[    1.286032] system 00:0a: [mem 0x000c0000-0x000cffff] could not be reserved
[    1.293051] system 00:0a: [mem 0x000e0000-0x000fffff] could not be reserved
[    1.300065] system 00:0a: [mem 0xfed90000-0xffffffff] could not be reserved
[    1.307081] system 00:0a: Plug and Play ACPI device, IDs PNP0c01 (active)
[    1.307178] pnp: PnP ACPI: found 11 devices
[    1.311417] ACPI: bus type PNP unregistered
[    1.321906] pci 0000:00:1c.0: bridge window [io  0x1000-0x0fff] to [bus 03] add_size 1000
[    1.321909] pci 0000:00:1c.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 03] add_size 200000
[    1.321917] pci 0000:00:1c.4: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 04] add_size 200000
[    1.321924] pci 0000:00:1c.5: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 05] add_size 200000
[    1.321935] pci 0000:00:1c.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[    1.321936] pci 0000:00:1c.4: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[    1.321938] pci 0000:00:1c.5: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[    1.321940] pci 0000:00:1c.0: res[13]=[io  0x1000-0x0fff] get_res_add_size add_size 1000
[    1.321947] pci 0000:00:1c.0: BAR 15: assigned [mem 0xc0000000-0xc01fffff 64bit pref]
[    1.329845] pci 0000:00:1c.4: BAR 15: assigned [mem 0xc0200000-0xc03fffff 64bit pref]
[    1.337743] pci 0000:00:1c.5: BAR 15: assigned [mem 0xc0400000-0xc05fffff 64bit pref]
[    1.345644] pci 0000:00:1c.0: BAR 13: assigned [io  0x2000-0x2fff]
[    1.351879] pci 0000:00:03.0: PCI bridge to [bus 01]
[    1.356899] pci 0000:00:05.0: PCI bridge to [bus 02]
[    1.361923] pci 0000:00:1c.0: PCI bridge to [bus 03]
[    1.366941] pci 0000:00:1c.0:   bridge window [io  0x2000-0x2fff]
[    1.373087] pci 0000:00:1c.0:   bridge window [mem 0xfb400000-0xfb4fffff]
[    1.379931] pci 0000:00:1c.0:   bridge window [mem 0xc0000000-0xc01fffff 64bit pref]
[    1.387743] pci 0000:00:1c.4: PCI bridge to [bus 04]
[    1.392763] pci 0000:00:1c.4:   bridge window [io  0xd000-0xdfff]
[    1.398910] pci 0000:00:1c.4:   bridge window [mem 0xfb500000-0xfb5fffff]
[    1.405753] pci 0000:00:1c.4:   bridge window [mem 0xc0200000-0xc03fffff 64bit pref]
[    1.413565] pci 0000:00:1c.5: PCI bridge to [bus 05]
[    1.418585] pci 0000:00:1c.5:   bridge window [io  0xe000-0xefff]
[    1.424731] pci 0000:00:1c.5:   bridge window [mem 0xfb600000-0xfb6fffff]
[    1.431575] pci 0000:00:1c.5:   bridge window [mem 0xc0400000-0xc05fffff 64bit pref]
[    1.439386] pci 0000:00:1e.0: PCI bridge to [bus 06]
[    1.444408] pci 0000:00:1e.0:   bridge window [mem 0xfb700000-0xfbffffff]
[    1.451247] pci 0000:00:1e.0:   bridge window [mem 0xfa000000-0xfaffffff 64bit pref]
[    1.459059] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7]
[    1.459061] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff]
[    1.459062] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff]
[    1.459064] pci_bus 0000:00: resource 7 [mem 0x000d0000-0x000dffff]
[    1.459066] pci_bus 0000:00: resource 8 [mem 0xc0000000-0xdfffffff]
[    1.459067] pci_bus 0000:00: resource 9 [mem 0xf0000000-0xfed8ffff]
[    1.459069] pci_bus 0000:03: resource 0 [io  0x2000-0x2fff]
[    1.459071] pci_bus 0000:03: resource 1 [mem 0xfb400000-0xfb4fffff]
[    1.459073] pci_bus 0000:03: resource 2 [mem 0xc0000000-0xc01fffff 64bit pref]
[    1.459074] pci_bus 0000:04: resource 0 [io  0xd000-0xdfff]
[    1.459076] pci_bus 0000:04: resource 1 [mem 0xfb500000-0xfb5fffff]
[    1.459078] pci_bus 0000:04: resource 2 [mem 0xc0200000-0xc03fffff 64bit pref]
[    1.459079] pci_bus 0000:05: resource 0 [io  0xe000-0xefff]
[    1.459081] pci_bus 0000:05: resource 1 [mem 0xfb600000-0xfb6fffff]
[    1.459083] pci_bus 0000:05: resource 2 [mem 0xc0400000-0xc05fffff 64bit pref]
[    1.459084] pci_bus 0000:06: resource 1 [mem 0xfb700000-0xfbffffff]
[    1.459086] pci_bus 0000:06: resource 2 [mem 0xfa000000-0xfaffffff 64bit pref]
[    1.459088] pci_bus 0000:06: resource 4 [io  0x0000-0x0cf7]
[    1.459089] pci_bus 0000:06: resource 5 [io  0x0d00-0xffff]
[    1.459091] pci_bus 0000:06: resource 6 [mem 0x000a0000-0x000bffff]
[    1.459092] pci_bus 0000:06: resource 7 [mem 0x000d0000-0x000dffff]
[    1.459094] pci_bus 0000:06: resource 8 [mem 0xc0000000-0xdfffffff]
[    1.459096] pci_bus 0000:06: resource 9 [mem 0xf0000000-0xfed8ffff]
[    1.459098] pci_bus 0000:ff: resource 4 [io  0x0000-0xffff]
[    1.459100] pci_bus 0000:ff: resource 5 [mem 0x00000000-0xfffffffff]
[    1.459124] NET: Registered protocol family 2
[    1.463722] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    1.471561] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    1.478606] TCP: Hash tables configured (established 131072 bind 65536)
[    1.485291] TCP: reno registered
[    1.488589] UDP hash table entries: 8192 (order: 6, 262144 bytes)
[    1.494855] UDP-Lite hash table entries: 8192 (order: 6, 262144 bytes)
[    1.501581] NET: Registered protocol family 1
[    1.506317] PCI: CLS mismatch (32 != 64), using 64 bytes
[    1.506323] pci 0000:06:03.0: Boot video device
[    1.506378] Trying to unpack rootfs image as initramfs...
[    1.593072] Freeing initrd memory: 4656K (ffff8800376d8000 - ffff880037b64000)
[    1.600408] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    1.606905] software IO TLB [mem 0xbb790000-0xbf790000] (64MB) mapped at [ffff8800bb790000-ffff8800bf78ffff]
[    1.617150] Scanning for low memory corruption every 60 seconds
[    1.624584] futex hash table entries: 2048 (order: 5, 131072 bytes)
[    1.630936] Initialise system trusted keyring
[    1.635367] audit: initializing netlink subsys (disabled)
[    1.640832] audit: type=2000 audit(1406750567.452:1): initialized
[    1.647224] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    1.655159] VFS: Disk quotas dquot_6.5.2
[    1.659174] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    1.666124] fuse init (API version 7.23)
[    1.670188] msgmni has been set to 32076
[    1.674670] Key type asymmetric registered
[    1.678822] Asymmetric key parser 'x509' registered
[    1.683767] bounce: pool size: 64 pages
[    1.687685] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    1.695217] io scheduler noop registered
[    1.699194] io scheduler deadline registered (default)
[    1.704431] io scheduler cfq registered
[    1.708510] pcieport 0000:00:03.0: irq 45 for MSI/MSI-X
[    1.708642] pcieport 0000:00:05.0: irq 46 for MSI/MSI-X
[    1.708697] pcieport 0000:00:1c.0: enabling device (0106 -> 0107)
[    1.714920] pcieport 0000:00:1c.0: irq 47 for MSI/MSI-X
[    1.715046] pcieport 0000:00:1c.4: irq 48 for MSI/MSI-X
[    1.715174] pcieport 0000:00:1c.5: irq 49 for MSI/MSI-X
[    1.715256] aer 0000:00:03.0:pcie02: service driver aer loaded
[    1.715274] aer 0000:00:05.0:pcie02: service driver aer loaded
[    1.715289] pcieport 0000:00:03.0: Signaling PME through PCIe PME interrupt
[    1.722303] pcie_pme 0000:00:03.0:pcie01: service driver pcie_pme loaded
[    1.722310] pcieport 0000:00:05.0: Signaling PME through PCIe PME interrupt
[    1.729324] pcie_pme 0000:00:05.0:pcie01: service driver pcie_pme loaded
[    1.729338] pcieport 0000:00:1c.0: Signaling PME through PCIe PME interrupt
[    1.736350] pci 0000:03:00.0: Signaling PME through PCIe PME interrupt
[    1.742933] pcie_pme 0000:00:1c.0:pcie01: service driver pcie_pme loaded
[    1.742949] pcieport 0000:00:1c.4: Signaling PME through PCIe PME interrupt
[    1.749968] pci 0000:04:00.0: Signaling PME through PCIe PME interrupt
[    1.756551] pcie_pme 0000:00:1c.4:pcie01: service driver pcie_pme loaded
[    1.756565] pcieport 0000:00:1c.5: Signaling PME through PCIe PME interrupt
[    1.763573] pci 0000:05:00.0: Signaling PME through PCIe PME interrupt
[    1.770148] pcie_pme 0000:00:1c.5:pcie01: service driver pcie_pme loaded
[    1.770165] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    1.775808] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    1.782509] intel_idle: MWAIT substates: 0x1120
[    1.782517] intel_idle: v0.4 model 0x1E
[    1.782518] intel_idle: lapic_timer_reliable_states 0x2
[    1.782853] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[    1.791270] ACPI: Power Button [PWRB]
[    1.795021] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[    1.802483] ACPI: Power Button [PWRF]
[    1.806440] ERST: Failed to get Error Log Address Range.
[    1.839623] [Firmware Warn]: GHES: Poll interval is 0 for generic hardware error source: 1, disabled.
[    1.848958] GHES: APEI firmware first mode is enabled by WHEA _OSC.
[    1.855346] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    1.882133] 00:02: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    1.909912] 00:03: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
[    1.937692] 00:04: ttyS2 at I/O 0x3e8 (irq = 5, base_baud = 115200) is a 16550A
[    1.947540] brd: module loaded
[    1.951240] loop: module loaded
[    1.954838] libphy: Fixed MDIO Bus: probed
[    1.959092] tun: Universal TUN/TAP device driver, 1.6
[    1.964194] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[    1.970461] PPP generic driver version 2.4.2
[    1.974832] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    1.981411] ehci-pci: EHCI PCI platform driver
[    1.986006] ehci-pci 0000:00:1a.0: EHCI Host Controller
[    1.991287] ehci-pci 0000:00:1a.0: new USB bus registered, assigned bus number 1
[    1.998758] ehci-pci 0000:00:1a.0: debug port 2
[    2.007288] ehci-pci 0000:00:1a.0: cache line size of 64 is not supported
[    2.007300] ehci-pci 0000:00:1a.0: irq 21, io mem 0xfb3fc000
[    2.023509] ehci-pci 0000:00:1a.0: USB 2.0 started, EHCI 1.00
[    2.029346] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    2.036189] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.043478] usb usb1: Product: EHCI Host Controller
[    2.048406] usb usb1: Manufacturer: Linux 3.16.0-rc7-ceph-01781-g721bdd2 ehci_hcd
[    2.055952] usb usb1: SerialNumber: 0000:00:1a.0
[    2.060781] hub 1-0:1.0: USB hub found
[    2.064592] hub 1-0:1.0: 2 ports detected
[    2.068841] ehci-pci 0000:00:1d.0: EHCI Host Controller
[    2.074123] ehci-pci 0000:00:1d.0: new USB bus registered, assigned bus number 2
[    2.081593] ehci-pci 0000:00:1d.0: debug port 2
[    2.090133] ehci-pci 0000:00:1d.0: cache line size of 64 is not supported
[    2.090147] ehci-pci 0000:00:1d.0: irq 23, io mem 0xfb3fa000
[    2.107512] ehci-pci 0000:00:1d.0: USB 2.0 started, EHCI 1.00
[    2.113335] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
[    2.120193] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.127484] usb usb2: Product: EHCI Host Controller
[    2.132412] usb usb2: Manufacturer: Linux 3.16.0-rc7-ceph-01781-g721bdd2 ehci_hcd
[    2.139958] usb usb2: SerialNumber: 0000:00:1d.0
[    2.144948] hub 2-0:1.0: USB hub found
[    2.148757] hub 2-0:1.0: 2 ports detected
[    2.152920] ehci-platform: EHCI generic platform driver
[    2.158209] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    2.164441] ohci-pci: OHCI PCI platform driver
[    2.168967] ohci-platform: OHCI generic platform driver
[    2.174250] uhci_hcd: USB Universal Host Controller Interface driver
[    2.180707] i8042: PNP: No PS/2 controller found. Probing ports directly.
[    2.190156] serio: i8042 KBD port at 0x60,0x64 irq 1
[    2.195191] serio: i8042 AUX port at 0x60,0x64 irq 12
[    2.200682] mousedev: PS/2 mouse device common for all mice
[    2.206802] rtc_cmos 00:01: RTC can wake from S4
[    2.211797] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[    2.218023] rtc_cmos 00:01: alarms up to one month, y3k, 114 bytes nvram, hpet irqs
[    2.225808] device-mapper: uevent: version 1.0.3
[    2.230697] device-mapper: ioctl: 4.27.0-ioctl (2013-10-30) initialised: dm-devel@redhat.com
[    2.239219] ledtrig-cpu: registered to indicate activity on CPUs
[    2.245342] TCP: cubic registered
[    2.248817] NET: Registered protocol family 10
[    2.253777] NET: Registered protocol family 17
[    2.258291] Key type dns_resolver registered
[    2.263693] Loading compiled-in X.509 certificates
[    2.269436] Loaded X.509 cert 'Magrathea: Glacier signing key: a57ee7b81b6e017087b14a925e3a16ef4b0b0ce9'
[    2.278992] registered taskstats version 1
[    2.286408] Key type trusted registered
[    2.293007] Key type encrypted registered
[    2.300115] AppArmor: AppArmor sha1 policy hashing enabled
[    2.305664] evm: HMAC attrs: 0x1
[    2.309774] rtc_cmos 00:01: setting system clock to 2014-07-30 20:02:49 UTC (1406750569)
[    2.321424] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[    2.327502] EDD information not available.
[    2.333159] Freeing unused kernel memory: 1204K (ffffffff81cfb000 - ffffffff81e28000)
[    2.341161] Write protecting the kernel read-only data: 12288k
[    2.350786] Freeing unused kernel memory: 1396K (ffff8800016a3000 - ffff880001800000)
[    2.361519] Freeing unused kernel memory: 1060K (ffff880001af7000 - ffff880001c00000)
[    2.383450] usb 1-1: new high-speed USB device number 2 using ehci-pci
[    2.390310] udevd[132]: starting version 175
[    2.396160] libcrc32c: module verification failed: signature and/or  required key missing - tainting kernel
[    2.408249] pps_core: LinuxPPS API ver. 1 registered
[    2.413278] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    2.422698] xor: measuring software checksum speed
[    2.428002] PTP clock support registered
[    2.432107] ahci 0000:00:1f.2: version 3.0
[    2.432257] ahci 0000:00:1f.2: irq 50 for MSI/MSI-X
[    2.432293] ahci 0000:00:1f.2: SSS flag set, parallel bus scan disabled
[    2.439057] e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k
[    2.444947] e1000e: Copyright(c) 1999 - 2014 Intel Corporation.
[    2.445034] Areca RAID Controller0: F/W V1.49 2011-08-10 & Model ARC-1222
[    2.457919] e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    2.459336] scsi0 : Areca SAS Host Adapter RAID Controller( RAID6 capable)
[    2.459336]  Driver Version 1.20.00.15 2010/08/05
[    2.459361] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 3 Gbps 0x3f impl SATA mode
[    2.459363] ahci 0000:00:1f.2: flags: 64bit ncq sntf stag pm led clo pio slum part ems sxs apst 
[    2.460637] scsi 0:0:0:0: Direct-Access     Seagate  ST31000528AS     R001 PQ: 0 ANSI: 5
[    2.461550] scsi 0:0:0:1: Direct-Access     Seagate  ST31000528AS     R001 PQ: 0 ANSI: 5
[    2.461740] scsi 0:0:0:2: Direct-Access     Seagate  ST31000528AS     R001 PQ: 0 ANSI: 5
[    2.461919] scsi 0:0:0:3: Direct-Access     Seagate  ST31000528AS     R001 PQ: 0 ANSI: 5
[    2.462094] scsi 0:0:0:4: Direct-Access     Hitachi  HDS721010CLA330  R001 PQ: 0 ANSI: 5
[    2.462268] scsi 0:0:0:5: Direct-Access     Seagate  ST31000528AS     R001 PQ: 0 ANSI: 5
[    2.462443] scsi 0:0:0:6: Direct-Access     Seagate  ST31000528AS     R001 PQ: 0 ANSI: 5
[    2.462620] scsi 0:0:0:7: Direct-Access     Seagate  ST31000524AS     R001 PQ: 0 ANSI: 5
[    2.463399] scsi 0:0:16:0: Processor         Areca    RAID controller  R001 PQ: 0 ANSI: 0
[    2.463713] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    2.463716] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    2.463834] sd 0:0:0:0: [sda] Write Protect is off
[    2.463835] sd 0:0:0:0: [sda] Mode Sense: cb 00 00 08
[    2.463900] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.463994] sd 0:0:0:1: Attached scsi generic sg1 type 0
[    2.464005] sd 0:0:0:1: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    2.464135] sd 0:0:0:1: [sdb] Write Protect is off
[    2.464137] sd 0:0:0:1: [sdb] Mode Sense: cb 00 00 08
[    2.464257] sd 0:0:0:2: Attached scsi generic sg2 type 0
[    2.464260] sd 0:0:0:2: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    2.464267] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.464397] sd 0:0:0:2: [sdc] Write Protect is off
[    2.464399] sd 0:0:0:2: [sdc] Mode Sense: cb 00 00 08
[    2.464477] sd 0:0:0:2: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.464504] sd 0:0:0:3: Attached scsi generic sg3 type 0
[    2.464568] sd 0:0:0:3: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    2.464595]  sda: sda1 sda2 < sda5 >
[    2.464696] sd 0:0:0:4: Attached scsi generic sg4 type 0
[    2.464729] sd 0:0:0:3: [sdd] Write Protect is off
[    2.464730] sd 0:0:0:3: [sdd] Mode Sense: cb 00 00 08
[    2.464832] sd 0:0:0:3: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.464871] sd 0:0:0:5: Attached scsi generic sg5 type 0
[    2.464959] sd 0:0:0:4: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    2.464959]  sdb: sdb1
[    2.465016] sd 0:0:0:5: [sdf] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    2.465128] sd 0:0:0:6: Attached scsi generic sg6 type 0
[    2.465132] sd 0:0:0:4: [sde] Write Protect is off
[    2.465135] sd 0:0:0:4: [sde] Mode Sense: cb 00 00 08
[    2.465247] sd 0:0:0:4: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.465264] sd 0:0:0:6: [sdg] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    2.465273] sd 0:0:0:5: [sdf] Write Protect is off
[    2.465276] sd 0:0:0:5: [sdf] Mode Sense: cb 00 00 08
[    2.465390] sd 0:0:0:7: Attached scsi generic sg7 type 0
[    2.465426] sd 0:0:0:5: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.465446] sd 0:0:0:6: [sdg] Write Protect is off
[    2.465448] sd 0:0:0:6: [sdg] Mode Sense: cb 00 00 08
[    2.465503] sd 0:0:0:1: [sdb] Attached SCSI disk
[    2.465540] sd 0:0:0:0: [sda] Attached SCSI disk
[    2.465568] scsi 0:0:16:0: Attached scsi generic sg8 type 3
[    2.465602] sd 0:0:0:6: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.465673] sd 0:0:0:7: [sdh] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    2.465782]  sdc: unknown partition table
[    2.465859] sd 0:0:0:7: [sdh] Write Protect is off
[    2.465861] sd 0:0:0:7: [sdh] Mode Sense: cb 00 00 08
[    2.465947] sd 0:0:0:7: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.466194] sd 0:0:0:2: [sdc] Attached SCSI disk
[    2.466449]  sdd: unknown partition table
[    2.466714] sd 0:0:0:3: [sdd] Attached SCSI disk
[    2.466763]  sdf: unknown partition table
[    2.466908]  sde: unknown partition table
[    2.467010]  sdg: unknown partition table
[    2.467108] sd 0:0:0:5: [sdf] Attached SCSI disk
[    2.467177]  sdh: unknown partition table
[    2.467204] sd 0:0:0:4: [sde] Attached SCSI disk
[    2.467328] sd 0:0:0:6: [sdg] Attached SCSI disk
[    2.467487] sd 0:0:0:7: [sdh] Attached SCSI disk
[    2.475273]    prefetch64-sse: 27185.000 MB/sec
[    2.500135] scsi1 : ahci
[    2.500455] scsi2 : ahci
[    2.500732] scsi3 : ahci
[    2.500992] scsi4 : ahci
[    2.501231] scsi5 : ahci
[    2.501463] scsi6 : ahci
[    2.501531] ata1: SATA max UDMA/133 abar m2048@0xfb3f8000 port 0xfb3f8100 irq 50
[    2.501532] ata2: SATA max UDMA/133 abar m2048@0xfb3f8000 port 0xfb3f8180 irq 50
[    2.501535] ata3: SATA max UDMA/133 abar m2048@0xfb3f8000 port 0xfb3f8200 irq 50
[    2.501537] ata4: SATA max UDMA/133 abar m2048@0xfb3f8000 port 0xfb3f8280 irq 50
[    2.501540] ata5: SATA max UDMA/133 abar m2048@0xfb3f8000 port 0xfb3f8300 irq 50
[    2.501542] ata6: SATA max UDMA/133 abar m2048@0xfb3f8000 port 0xfb3f8380 irq 50
[    2.535664] usb 1-1: New USB device found, idVendor=8087, idProduct=0020
[    2.535666] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    2.536004] hub 1-1:1.0: USB hub found
[    2.536214] hub 1-1:1.0: 6 ports detected
[    2.567251]    generic_sse: 36767.000 MB/sec
[    2.567271] xor: using function: generic_sse (36767.000 MB/sec)
[    2.615299] tsc: Refined TSC clocksource calibration: 2533.333 MHz
[    2.647260] usb 2-1: new high-speed USB device number 2 using ehci-pci
[    2.779772] usb 2-1: New USB device found, idVendor=8087, idProduct=0020
[    2.779773] usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    2.780085] hub 2-1:1.0: USB hub found
[    2.780215] hub 2-1:1.0: 8 ports detected
[    2.819217] ata1: SATA link down (SStatus 0 SControl 300)
[    2.851251] usb 1-1.2: new full-speed USB device number 3 using ehci-pci
[    2.944578] usb 1-1.2: New USB device found, idVendor=0557, idProduct=2221
[    2.944580] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    2.944581] usb 1-1.2: Product: Hermon USB hidmouse Device
[    2.944582] usb 1-1.2: Manufacturer: Winbond Electronics Corp
[    3.032115] e1000e 0000:04:00.0: irq 51 for MSI/MSI-X
[    3.032122] e1000e 0000:04:00.0: irq 52 for MSI/MSI-X
[    3.032128] e1000e 0000:04:00.0: irq 53 for MSI/MSI-X
[    3.032627] hidraw: raw HID events driver (C) Jiri Kosina
[    3.039192] random: nonblocking pool is initialized
[    3.039980] usbcore: registered new interface driver usbhid
[    3.039980] usbhid: USB HID core driver
[    3.055208] input: Winbond Electronics Corp Hermon USB hidmouse Device as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:0557:2221.0001/input/input4
[    3.070243] hid-generic 0003:0557:2221.0001: input,hidraw0: USB HID v1.00 Mouse [Winbond Electronics Corp Hermon USB hidmouse Device] on usb-0000:00:1a.0-1.2/input0
[    3.085212] input: Winbond Electronics Corp Hermon USB hidmouse Device as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.1/0003:0557:2221.0002/input/input5
[    3.099056] raid6: sse2x1    5664 MB/s
[    3.104183] hid-generic 0003:0557:2221.0002: input,hidraw1: USB HID v1.00 Keyboard [Winbond Electronics Corp Hermon USB hidmouse Device] on usb-0000:00:1a.0-1.2/input1
[    3.138967] ata2: SATA link down (SStatus 0 SControl 300)
[    3.143099] e1000e 0000:04:00.0 eth0: registered PHC clock
[    3.143101] e1000e 0000:04:00.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:25:90:09:e0:78
[    3.143102] e1000e 0000:04:00.0 eth0: Intel(R) PRO/1000 Network Connection
[    3.143179] e1000e 0000:04:00.0 eth0: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
[    3.143382] e1000e 0000:05:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    3.143404] e1000e 0000:05:00.0: irq 54 for MSI/MSI-X
[    3.143408] e1000e 0000:05:00.0: irq 55 for MSI/MSI-X
[    3.143411] e1000e 0000:05:00.0: irq 56 for MSI/MSI-X
[    3.166982] raid6: sse2x2    7441 MB/s
[    3.234964] raid6: sse2x4    8947 MB/s
[    3.238766] raid6: using algorithm sse2x4 (8947 MB/s)
[    3.243981] raid6: using ssse3x2 recovery algorithm
[    3.252559] Btrfs loaded
[    3.262221] e1000e 0000:05:00.0 eth1: registered PHC clock
[    3.269151] e1000e 0000:05:00.0 eth1: (PCI Express:2.5GT/s:Width x1) 00:25:90:09:e0:79
[    3.278637] e1000e 0000:05:00.0 eth1: Intel(R) PRO/1000 Network Connection
[    3.287098] e1000e 0000:05:00.0 eth1: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
[    3.502834] ata3: SATA link down (SStatus 0 SControl 300)
[    3.615154] Switched to clocksource tsc
[    3.826635] ata4: SATA link down (SStatus 0 SControl 300)
[    3.947945] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[    4.150529] ata5: SATA link down (SStatus 0 SControl 300)
[    4.474357] ata6: SATA link down (SStatus 0 SControl 300)
[    5.686235] init: ureadahead main process (401) terminated with status 5
[    6.168154] Adding 16766972k swap on /dev/sda5.  Priority:-1 extents:1 across:16766972k FS
[    6.358058] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr
[    6.696974] udevd[493]: starting version 175
[    7.006483] RPC: Registered named UNIX socket transport module.
[    7.006486] RPC: Registered udp transport module.
[    7.006487] RPC: Registered tcp transport module.
[    7.006488] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    7.081536] lp: driver loaded but no devices found
[    7.129800] FS-Cache: Loaded
[    7.134406] FS-Cache: Netfs 'nfs' registered for caching
[    7.146420] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[    7.158010] Loading iSCSI transport class v2.0-870.
[    7.510723] ACPI Warning: SystemIO range 0x0000000000000828-0x000000000000082f conflicts with OpRegion 0x0000000000000800-0x000000000000084f (\PMRG) (20140424/utaddress-258)
[    7.510731] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[    7.510765] lpc_ich: Resource conflict(s) found affecting gpio_ich
[    7.548813] gpio_ich: GPIO from 180 to 255 on gpio_ich
[    7.640621] EDAC MC: Ver: 3.0.0
[    7.641688] EDAC MC0: Giving out device to module i7core_edac.c controller i7 core #0: DEV 0000:ff:03.0 (POLLED)
[    7.641727] EDAC PCI0: Giving out device to module i7core_edac controller EDAC PCI controller: DEV 0000:ff:03.0 (POLLED)
[    7.641880] EDAC i7core: Driver loaded, 1 memory controller(s) found.
[    8.052088] init: udev-fallback-graphics main process (865) terminated with status 1
[    8.254116] audit: type=1400 audit(1406750575.442:2): apparmor="STATUS" operation="profile_load" name="/sbin/dhclient" pid=862 comm="apparmor_parser"
[    8.254121] audit: type=1400 audit(1406750575.442:3): apparmor="STATUS" operation="profile_load" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=862 comm="apparmor_parser"
[    8.254124] audit: type=1400 audit(1406750575.442:4): apparmor="STATUS" operation="profile_load" name="/usr/lib/connman/scripts/dhclient-script" pid=862 comm="apparmor_parser"
[    8.254139] audit: type=1400 audit(1406750575.442:5): apparmor="STATUS" operation="profile_replace" name="/sbin/dhclient" pid=868 comm="apparmor_parser"
[    8.254144] audit: type=1400 audit(1406750575.442:6): apparmor="STATUS" operation="profile_replace" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=868 comm="apparmor_parser"
[    8.254147] audit: type=1400 audit(1406750575.442:7): apparmor="STATUS" operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" pid=868 comm="apparmor_parser"
[    8.254186] audit: type=1400 audit(1406750575.442:8): apparmor="STATUS" operation="profile_replace" name="/sbin/dhclient" pid=864 comm="apparmor_parser"
[    8.254191] audit: type=1400 audit(1406750575.442:9): apparmor="STATUS" operation="profile_replace" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=864 comm="apparmor_parser"
[    8.254195] audit: type=1400 audit(1406750575.442:10): apparmor="STATUS" operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" pid=864 comm="apparmor_parser"
[    8.254206] audit: type=1400 audit(1406750575.442:11): apparmor="STATUS" operation="profile_replace" name="/sbin/dhclient" pid=861 comm="apparmor_parser"
[    8.516888] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[    8.737090] iscsi: registered transport (tcp)
[    8.915394] iscsi: registered transport (iser)
[    8.936872] iscsid (1020): /proc/1020/oom_adj is deprecated, please use /proc/1020/oom_score_adj instead.
[    9.015886] init: failsafe main process (904) killed by TERM signal
[    9.903708] init: isc-dhcp-server main process (1187) terminated with status 1
[    9.903726] init: isc-dhcp-server main process ended, respawning
[    9.914392] init: isc-dhcp-server main process (1213) terminated with status 1
[    9.914418] init: isc-dhcp-server main process ended, respawning
[    9.923956] init: isc-dhcp-server main process (1230) terminated with status 1
[    9.923979] init: isc-dhcp-server main process ended, respawning
[    9.933893] init: isc-dhcp-server main process (1239) terminated with status 1
[    9.933912] init: isc-dhcp-server main process ended, respawning
[    9.943884] init: isc-dhcp-server main process (1246) terminated with status 1
[    9.943905] init: isc-dhcp-server main process ended, respawning
[    9.953331] init: isc-dhcp-server main process (1254) terminated with status 1
[    9.953351] init: isc-dhcp-server main process ended, respawning
[    9.963208] init: isc-dhcp-server main process (1261) terminated with status 1
[    9.963229] init: isc-dhcp-server main process ended, respawning
[    9.973295] init: isc-dhcp-server main process (1268) terminated with status 1
[    9.973315] init: isc-dhcp-server main process ended, respawning
[    9.979520] kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround
[    9.983232] init: isc-dhcp-server main process (1275) terminated with status 1
[    9.983251] init: isc-dhcp-server main process ended, respawning
[    9.993355] init: isc-dhcp-server main process (1283) terminated with status 1
[    9.993377] init: isc-dhcp-server main process ended, respawning
[   10.003678] init: isc-dhcp-server main process (1290) terminated with status 1
[   10.003701] init: isc-dhcp-server respawning too fast, stopped
[   10.238760] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[   10.265217] NFSD: starting 90-second grace period (net ffffffff81ccd740)
[   11.631644] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   11.632039] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  765.112922] Key type ceph registered
[  765.116602] libceph: loaded (mon/osd proto 15/24)
[  765.147194] rbd: loaded
[  765.178519] libceph: client4109 fsid ad299151-b84e-430b-ab51-bf4ccfa8c16d
[  765.187965] libceph: mon2 10.214.135.140:6790 session established
[  765.204495]  rbd0: unknown partition table
[  765.208858] rbd: rbd0: added with size 0x500000000
[  777.606833] kjournald starting.  Commit interval 5 seconds
[  777.609848] EXT3-fs (rbd0): using internal journal
[  777.609851] EXT3-fs (rbd0): mounted filesystem with ordered data mode
[  828.008212] libceph: osd1 10.214.135.140:6810 socket closed (con state OPEN)
[  828.217325] libceph: osd1 10.214.135.140:6810 socket closed (con state OPEN)
[  830.086171] libceph: osd4 10.214.133.112:6810 socket closed (con state OPEN)
[  843.309861] libceph: osd0 10.214.135.140:6800 socket closed (con state OPEN)
[  850.397682] libceph: osd2 10.214.135.140:6805 socket closed (con state OPEN)
[  912.340271] libceph: osd4 10.214.133.112:6810 socket closed (con state OPEN)
[  945.097347] libceph: osd0 10.214.135.140:6800 socket closed (con state OPEN)
[  945.135629] libceph: osd1 10.214.135.140:6810 socket closed (con state OPEN)
[  989.348457] libceph: osd5 10.214.133.112:6805 socket closed (con state OPEN)
[  995.199518] libceph: osd2 10.214.135.140:6805 socket closed (con state OPEN)
[ 1004.081646] libceph: osd0 10.214.135.140:6800 socket closed (con state OPEN)
[ 1023.640729] libceph: osd1 10.214.135.140:6810 socket closed (con state OPEN)
[ 1036.553828] libceph: osd4 10.214.133.112:6810 socket closed (con state OPEN)
[ 1200.016610] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
[ 1200.023275]       Tainted: G            E 3.16.0-rc7-ceph-01781-g721bdd2 #17
[ 1200.030412] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.038345] kworker/0:0     D 0000000000000000     0     4      2 0x00000000
[ 1200.045587] Workqueue: ceph-msgr con_work [libceph]
[ 1200.050576]  ffff8804295bfc18 0000000000000046 ffff8804295bffd8 0000000000000000
[ 1200.058173]  ffff88042953cb60 0000000000012f80 ffff8804295bffd8 0000000000012f80
[ 1200.065865]  ffff880427ee0000 ffff88042953cb60 0000000000000000 ffff8804280107c8
[ 1200.073513] Call Trace:
[ 1200.075985]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1200.081059]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1200.087505]  [<ffffffff81699f03>] __mutex_lock_slowpath+0xd3/0x1c0
[ 1200.093836]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1200.099069]  [<ffffffffa046d88f>] alloc_msg+0xcf/0x230 [libceph]
[ 1200.105229]  [<ffffffffa0468fd6>] con_work+0x19d6/0x2020 [libceph]
[ 1200.111596]  [<ffffffff8108532c>] ? ttwu_do_wakeup+0x2c/0x100
[ 1200.117473]  [<ffffffff81070f34>] process_one_work+0x174/0x4a0
[ 1200.123393]  [<ffffffff810888f7>] ? wake_up_process+0x27/0x50
[ 1200.129297]  [<ffffffff81071c5b>] worker_thread+0x11b/0x530
[ 1200.134948]  [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
[ 1200.141608]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1200.146577]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1200.152848]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1200.158332]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1200.164600] INFO: task kworker/0:1:61 blocked for more than 120 seconds.
[ 1200.171436]       Tainted: G            E 3.16.0-rc7-ceph-01781-g721bdd2 #17
[ 1200.178594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.186576] kworker/0:1     D 0000000000000000     0    61      2 0x00000000
[ 1200.193947] Workqueue: ceph-msgr con_work [libceph]
[ 1200.198924]  ffff880428cc3c18 0000000000000046 ffff880428cc3bd8 0000000000000000
[ 1200.206579]  ffff880428cb8000 0000000000012f80 ffff880428cc3fd8 0000000000012f80
[ 1200.214254]  ffffffff81c164a0 ffff880428cb8000 0000000000000000 ffff8804280107c8
[ 1200.222017] Call Trace:
[ 1200.224543]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1200.229592]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1200.236036]  [<ffffffff81699f03>] __mutex_lock_slowpath+0xd3/0x1c0
[ 1200.242356]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1200.247581]  [<ffffffffa046d88f>] alloc_msg+0xcf/0x230 [libceph]
[ 1200.253810]  [<ffffffffa0468fd6>] con_work+0x19d6/0x2020 [libceph]
[ 1200.260088]  [<ffffffff81092c4c>] ? put_prev_entity+0x3c/0x320
[ 1200.266007]  [<ffffffff8109598f>] ? pick_next_task_fair+0xff/0x4e0
[ 1200.272320]  [<ffffffff81070f34>] process_one_work+0x174/0x4a0
[ 1200.278251]  [<ffffffff81071c5b>] worker_thread+0x11b/0x530
[ 1200.283908]  [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
[ 1200.290510]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1200.295541]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1200.301866]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1200.307355]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1200.313616] INFO: task kworker/2:1:63 blocked for more than 120 seconds.
[ 1200.320403]       Tainted: G            E 3.16.0-rc7-ceph-01781-g721bdd2 #17
[ 1200.327483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.335465] kworker/2:1     D 0000000000000000     0    63      2 0x00000000
[ 1200.342797] Workqueue: events handle_timeout [libceph]
[ 1200.348032]  ffff880428ce3cd8 0000000000000046 ffff880428ce3cb8 0000000000000002
[ 1200.355654]  ffff88042937cb60 0000000000012f80 ffff880428ce3fd8 0000000000012f80
[ 1200.363390]  ffff8804295f1920 ffff88042937cb60 ffff880428ce3cd8 ffff8804280107c8
[ 1200.371065] Call Trace:
[ 1200.373625]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1200.378720]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1200.385159]  [<ffffffff81699f03>] __mutex_lock_slowpath+0xd3/0x1c0
[ 1200.391421]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1200.396644]  [<ffffffffa046c5d5>] handle_timeout+0x65/0x180 [libceph]
[ 1200.403165]  [<ffffffff81070f34>] process_one_work+0x174/0x4a0
[ 1200.409148]  [<ffffffff81697cec>] ? __schedule+0x3ac/0x7b0
[ 1200.414726]  [<ffffffff81071c5b>] worker_thread+0x11b/0x530
[ 1200.420443]  [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
[ 1200.426969]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1200.431936]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1200.438256]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1200.443745]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1200.450080] INFO: task kworker/4:2:823 blocked for more than 120 seconds.
[ 1200.456946]       Tainted: G            E 3.16.0-rc7-ceph-01781-g721bdd2 #17
[ 1200.464078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.472002] kworker/4:2     D 0000000000000000     0   823      2 0x00000000
[ 1200.479346] Workqueue: events handle_osds_timeout [libceph]
[ 1200.485119]  ffff8800ba9b7ce8 0000000000000046 ffff88042f00ac00 ffff88043fd0f930
[ 1200.492785]  ffff880428219920 0000000000012f80 ffff8800ba9b7fd8 0000000000012f80
[ 1200.500409]  ffff8804295f4b60 ffff880428219920 ffff880426761180 ffff8804280107c8
[ 1200.508030] Call Trace:
[ 1200.510564]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1200.515614]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1200.522054]  [<ffffffff81699f03>] __mutex_lock_slowpath+0xd3/0x1c0
[ 1200.528388]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1200.533624]  [<ffffffffa046d5d5>] handle_osds_timeout+0x65/0x120 [libceph]
[ 1200.540647]  [<ffffffff81070f34>] process_one_work+0x174/0x4a0
[ 1200.546568]  [<ffffffff81071c5b>] worker_thread+0x11b/0x530
[ 1200.552331]  [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
[ 1200.558894]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1200.563857]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1200.570126]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1200.575631]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1200.581930] INFO: task kworker/u16:0:3481 blocked for more than 120 seconds.
[ 1200.589068]       Tainted: G            E 3.16.0-rc7-ceph-01781-g721bdd2 #17
[ 1200.596252] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.605372] kworker/u16:0   D 0000000000000000     0  3481      2 0x00000000
[ 1200.612598] Workqueue: writeback bdi_writeback_workfn (flush-251:0)
[ 1200.618992]  ffff880427adb7c8 0000000000000046 ffff880427adb778 ffffffffa046db08
[ 1200.626640]  ffff880423323240 0000000000012f80 ffff880427adbfd8 0000000000012f80
[ 1200.634288]  ffff8804295f6480 ffff880423323240 ffff880427adb7f8 ffff8804280107c8
[ 1200.641886] Call Trace:
[ 1200.644423]  [<ffffffffa046db08>] ? ceph_osdc_msg_data_add+0x98/0xa0 [libceph]
[ 1200.651742]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1200.656797]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1200.663267]  [<ffffffff81699f03>] __mutex_lock_slowpath+0xd3/0x1c0
[ 1200.669537]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1200.674760]  [<ffffffffa046ed12>] ceph_osdc_start_request+0x42/0x80 [libceph]
[ 1200.682037]  [<ffffffffa042bf60>] rbd_obj_request_submit.isra.25+0x10/0x20 [rbd]
[ 1200.689543]  [<ffffffffa042e8ee>] rbd_img_obj_request_submit+0x1ce/0x460 [rbd]
[ 1200.696926]  [<ffffffffa042ebcc>] rbd_img_request_submit+0x4c/0x60 [rbd]
[ 1200.703762]  [<ffffffffa042f2a8>] rbd_request_fn+0x238/0x290 [rbd]
[ 1200.710033]  [<ffffffff8133a397>] __blk_run_queue+0x37/0x50
[ 1200.715693]  [<ffffffff8133affd>] queue_unplugged+0x3d/0xc0
[ 1200.721412]  [<ffffffff8133fddb>] blk_flush_plug_list+0x1db/0x210
[ 1200.727586]  [<ffffffff8133fe28>] blk_finish_plug+0x18/0x50
[ 1200.733248]  [<ffffffff8114c146>] generic_writepages+0x56/0x70
[ 1200.739222]  [<ffffffff8136da80>] ? plist_requeue+0xd0/0xd0
[ 1200.744877]  [<ffffffff8114db45>] do_writepages+0x35/0x40
[ 1200.750440]  [<ffffffff811d4f25>] __writeback_single_inode+0x45/0x290
[ 1200.756963]  [<ffffffff811d5b66>] writeback_sb_inodes+0x216/0x450
[ 1200.763135]  [<ffffffff811d5e3e>] __writeback_inodes_wb+0x9e/0xd0
[ 1200.769316]  [<ffffffff811d661b>] wb_writeback+0x27b/0x320
[ 1200.774883]  [<ffffffff811d670e>] ? wb_check_old_data_flush+0x4e/0xb0
[ 1200.781470]  [<ffffffff811d8e29>] bdi_writeback_workfn+0x209/0x460
[ 1200.787741]  [<ffffffff81070f34>] process_one_work+0x174/0x4a0
[ 1200.793716]  [<ffffffff81071c5b>] worker_thread+0x11b/0x530
[ 1200.799372]  [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
[ 1200.805975]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1200.810940]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1200.817279]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1200.822769]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1200.829103] INFO: task kjournald:4398 blocked for more than 120 seconds.
[ 1200.835890]       Tainted: G            E 3.16.0-rc7-ceph-01781-g721bdd2 #17
[ 1200.843080] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.851004] kjournald       D 0000000000000002     0  4398      2 0x00000000
[ 1200.858283]  ffff8803537bb788 0000000000000046 ffff8803537bb7a8 0000000000000000
[ 1200.865914]  ffff880423324b60 0000000000012f80 ffff8803537bbfd8 0000000000012f80
[ 1200.873590]  ffff88042937cb60 ffff880423324b60 ffff880428ceb240 ffff8804231e59b8
[ 1200.881256] Call Trace:
[ 1200.883724]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1200.888798]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1200.895239]  [<ffffffff81699fe5>] __mutex_lock_slowpath+0x1b5/0x1c0
[ 1200.901673]  [<ffffffffa0479826>] ? ceph_str_hash+0x26/0x80 [libceph]
[ 1200.908198]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1200.913430]  [<ffffffffa046751d>] ceph_con_send+0x4d/0x130 [libceph]
[ 1200.919912]  [<ffffffffa046c540>] __send_queued+0x120/0x150 [libceph]
[ 1200.926444]  [<ffffffffa046ec5b>] __ceph_osdc_start_request+0x5b/0xd0 [libceph]
[ 1200.933855]  [<ffffffffa046ed21>] ceph_osdc_start_request+0x51/0x80 [libceph]
[ 1200.941126]  [<ffffffffa042bf60>] rbd_obj_request_submit.isra.25+0x10/0x20 [rbd]
[ 1200.948622]  [<ffffffffa042e8ee>] rbd_img_obj_request_submit+0x1ce/0x460 [rbd]
[ 1200.956040]  [<ffffffffa042ebcc>] rbd_img_request_submit+0x4c/0x60 [rbd]
[ 1200.962845]  [<ffffffffa042f2a8>] rbd_request_fn+0x238/0x290 [rbd]
[ 1200.969108]  [<ffffffff8133a397>] __blk_run_queue+0x37/0x50
[ 1200.974764]  [<ffffffff8133affd>] queue_unplugged+0x3d/0xc0
[ 1200.980424]  [<ffffffff8133fddb>] blk_flush_plug_list+0x1db/0x210
[ 1200.986635]  [<ffffffff81698288>] io_schedule+0x78/0xd0
[ 1200.991954]  [<ffffffff8133b864>] get_request+0x414/0x800
[ 1200.997440]  [<ffffffff8133f477>] ? bio_attempt_back_merge+0x37/0x100
[ 1201.004013]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
[ 1201.009782]  [<ffffffff8133ff2c>] blk_queue_bio+0xcc/0x360
[ 1201.015353]  [<ffffffff8133c2d0>] generic_make_request+0xc0/0x100
[ 1201.021605]  [<ffffffff8133c385>] submit_bio+0x75/0x140
[ 1201.026921]  [<ffffffff811de4e6>] _submit_bh+0x136/0x1f0
[ 1201.032390]  [<ffffffff81290081>] journal_do_submit_data+0x41/0x50
[ 1201.038662]  [<ffffffff81291380>] journal_commit_transaction+0x1150/0x1350
[ 1201.045683]  [<ffffffff81063aff>] ? try_to_del_timer_sync+0x4f/0x70
[ 1201.052043]  [<ffffffff81293e01>] kjournald+0xe1/0x260
[ 1201.057324]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
[ 1201.063072]  [<ffffffff81293d20>] ? commit_timeout+0x10/0x10
[ 1201.068855]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1201.073819]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.080084]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1201.085573]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.091877] INFO: task kworker/0:2:6108 blocked for more than 120 seconds.
[ 1201.098841]       Tainted: G            E 3.16.0-rc7-ceph-01781-g721bdd2 #17
[ 1201.105978] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1201.113977] kworker/0:2     D 0000000000000000     0  6108      2 0x00000000
[ 1201.121218] Workqueue: ceph-msgr con_work [libceph]
[ 1201.126256]  ffff88034bdebc18 0000000000000046 ffff88034bdebfd8 0000000000000000
[ 1201.133972]  ffff880427ee0000 0000000000012f80 ffff88034bdebfd8 0000000000012f80
[ 1201.141647]  ffff880427ee6480 ffff880427ee0000 0000000000000000 ffff8804280107c8
[ 1201.149426] Call Trace:
[ 1201.151968]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1201.157020]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1201.163459]  [<ffffffff81699f03>] __mutex_lock_slowpath+0xd3/0x1c0
[ 1201.169758]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1201.174995]  [<ffffffffa046d88f>] alloc_msg+0xcf/0x230 [libceph]
[ 1201.181093]  [<ffffffffa0468fd6>] con_work+0x19d6/0x2020 [libceph]
[ 1201.187358]  [<ffffffff8108532c>] ? ttwu_do_wakeup+0x2c/0x100
[ 1201.193250]  [<ffffffff810854c6>] ? ttwu_do_activate.constprop.121+0x66/0x70
[ 1201.200395]  [<ffffffff810887ba>] ? try_to_wake_up+0x26a/0x340
[ 1201.206375]  [<ffffffff81070f34>] process_one_work+0x174/0x4a0
[ 1201.212323]  [<ffffffff810888f7>] ? wake_up_process+0x27/0x50
[ 1201.218184]  [<ffffffff81071c5b>] worker_thread+0x11b/0x530
[ 1201.223886]  [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
[ 1201.230421]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1201.235389]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.241652]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1201.247138]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.253473] INFO: task kworker/0:3:6109 blocked for more than 120 seconds.
[ 1201.260459]       Tainted: G            E 3.16.0-rc7-ceph-01781-g721bdd2 #17
[ 1201.267636] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1201.275646] kworker/0:3     D 0000000000000000     0  6109      2 0x00000000
[ 1201.282980] Workqueue: ceph-msgr con_work [libceph]
[ 1201.287993]  ffff88034bdefc18 0000000000000046 ffff88034bdeffd8 0000000000000000
[ 1201.295624]  ffff880427ee6480 0000000000012f80 ffff88034bdeffd8 0000000000012f80
[ 1201.303265]  ffff880428310000 ffff880427ee6480 0000000000000000 ffff8804280107c8
[ 1201.310853] Call Trace:
[ 1201.313417]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1201.318438]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1201.324892]  [<ffffffff81699f03>] __mutex_lock_slowpath+0xd3/0x1c0
[ 1201.331157]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1201.336407]  [<ffffffffa046d88f>] alloc_msg+0xcf/0x230 [libceph]
[ 1201.342505]  [<ffffffffa0468fd6>] con_work+0x19d6/0x2020 [libceph]
[ 1201.348853]  [<ffffffff8108532c>] ? ttwu_do_wakeup+0x2c/0x100
[ 1201.354706]  [<ffffffff810854c6>] ? ttwu_do_activate.constprop.121+0x66/0x70
[ 1201.361886]  [<ffffffff810887ba>] ? try_to_wake_up+0x26a/0x340
[ 1201.367947]  [<ffffffff81070f34>] process_one_work+0x174/0x4a0
[ 1201.373868]  [<ffffffff810888f7>] ? wake_up_process+0x27/0x50
[ 1201.379761]  [<ffffffff81071c5b>] worker_thread+0x11b/0x530
[ 1201.385420]  [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
[ 1201.391990]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1201.396955]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.403221]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1201.408708]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.414973] INFO: task kworker/0:4:6110 blocked for more than 120 seconds.
[ 1201.421993]       Tainted: G            E 3.16.0-rc7-ceph-01781-g721bdd2 #17
[ 1201.429126] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1201.437054] kworker/0:4     D 0000000000000000     0  6110      2 0x00000000
[ 1201.444304] Workqueue: ceph-msgr con_work [libceph]
[ 1201.449273]  ffff88034b877c18 0000000000000046 00000000000000de 0000000000000000
[ 1201.456966]  ffff880427ee1920 0000000000012f80 ffff88034b877fd8 0000000000012f80
[ 1201.464518]  ffffffff81c164a0 ffff880427ee1920 0000000000000000 ffff8804280107c8
[ 1201.472099] Call Trace:
[ 1201.474566]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1201.479558]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1201.486035]  [<ffffffff81699f03>] __mutex_lock_slowpath+0xd3/0x1c0
[ 1201.492252]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1201.497469]  [<ffffffffa046d88f>] alloc_msg+0xcf/0x230 [libceph]
[ 1201.503564]  [<ffffffffa0468fd6>] con_work+0x19d6/0x2020 [libceph]
[ 1201.509800]  [<ffffffff81092c4c>] ? put_prev_entity+0x3c/0x320
[ 1201.515771]  [<ffffffff8109598f>] ? pick_next_task_fair+0xff/0x4e0
[ 1201.521977]  [<ffffffff81070f34>] process_one_work+0x174/0x4a0
[ 1201.527885]  [<ffffffff81071c5b>] worker_thread+0x11b/0x530
[ 1201.533589]  [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
[ 1201.540056]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1201.544967]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.551235]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1201.556661]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.562872] INFO: task kworker/0:5:6111 blocked for more than 120 seconds.
[ 1201.570899]       Tainted: G            E 3.16.0-rc7-ceph-01781-g721bdd2 #17
[ 1201.578030] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1201.586041] kworker/0:5     D 0000000000000000     0  6111      2 0x00000000
[ 1201.593197] Workqueue: ceph-msgr con_work [libceph]
[ 1201.598175]  ffff88034b857c18 0000000000000046 0000000000000000 0000000000000000
[ 1201.605711]  ffff880428314b60 0000000000012f80 ffff88034b857fd8 0000000000012f80
[ 1201.613326]  ffffffff81c164a0 ffff880428314b60 0000000000000000 ffff8804280107c8
[ 1201.620904] Call Trace:
[ 1201.623374]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1201.628419]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1201.634804]  [<ffffffff81699f03>] __mutex_lock_slowpath+0xd3/0x1c0
[ 1201.641039]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1201.646322]  [<ffffffffa046d88f>] alloc_msg+0xcf/0x230 [libceph]
[ 1201.652435]  [<ffffffffa0468fd6>] con_work+0x19d6/0x2020 [libceph]
[ 1201.658741]  [<ffffffff81092c4c>] ? put_prev_entity+0x3c/0x320
[ 1201.664678]  [<ffffffff81095c72>] ? pick_next_task_fair+0x3e2/0x4e0
[ 1201.671041]  [<ffffffff81070f34>] process_one_work+0x174/0x4a0
[ 1201.676987]  [<ffffffff81071c5b>] worker_thread+0x11b/0x530
[ 1201.682645]  [<ffffffff81071b40>] ? create_and_start_worker+0x50/0x50
[ 1201.689170]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1201.694133]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.700466]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1201.705950]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-07-31 13:13     ` Peter Zijlstra
  2014-07-31 13:25       ` Ilya Dryomov
@ 2014-07-31 13:44       ` Ingo Molnar
  2014-07-31 13:56         ` Peter Zijlstra
  2014-07-31 14:30       ` [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point" Mike Galbraith
  2 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2014-07-31 13:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ilya Dryomov, Linux Kernel Mailing List, Ceph Development,
	davidlohr, jason.low2


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Jul 31, 2014 at 04:37:29PM +0400, Ilya Dryomov wrote:
> 
> > This didn't make sense to me at first too, and I'll be happy to be
> > proven wrong, but we can reproduce this with rbd very reliably under
> > higher than usual load, and the revert makes it go away.  What we are
> > seeing in the rbd scenario is the following.
> 
> This is drivers/block/rbd.c ? I can find but a single mutex_lock() in
> there.
> 
> > Suppose foo needs mutexes A and B, bar needs mutex B.  foo acquires
> > A and then wants to acquire B, but B is held by bar.  foo spins
> > a little and ends up calling schedule_preempt_disabled() on line 484
> > above, but that call never returns, even though a hundred usecs later
> > bar releases B.  foo ends up stuck in mutex_lock() indefinitely, but
> > still holds A and everybody else who needs A gets behind A.  Given that
> > this A happens to be a central libceph mutex all rbd activity halts.
> > Deadlock may not be the best term for this, but never returning from
> > mutex_lock(&B) even though B has been unlocked is *a* problem.
> > 
> > This obviously doesn't happen every time schedule_preempt_disabled() on
> > line 484 is called, so there must be some sort of race here.  I'll send
> > along the actual rbd stack traces shortly.
> 
> Smells like maybe current->state != TASK_RUNNING, does the below
> trigger?
> 
> If so, you've wrecked something in whatever...
> 
> ---
>  kernel/locking/mutex.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index ae712b25e492..3d726fdaa764 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -473,8 +473,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	 * reschedule now, before we try-lock the mutex. This avoids getting
>  	 * scheduled out right after we obtained the mutex.
>  	 */
> -	if (need_resched())
> +	if (need_resched()) {
> +		if (WARN_ON_ONCE(current->state != TASK_RUNNING))
> +			__set_current_state(TASK_RUNNING);
> +
>  		schedule_preempt_disabled();
> +	}

Might make sense to add that debug check under mutex debugging or so, 
with a sensible kernel message printed.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-07-31 13:44       ` Ingo Molnar
@ 2014-07-31 13:56         ` Peter Zijlstra
  2014-08-02 20:04           ` [RFC][PATCH] locking: Debug nested wait/locking primitives Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-07-31 13:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ilya Dryomov, Linux Kernel Mailing List, Ceph Development,
	davidlohr, jason.low2

[-- Attachment #1: Type: text/plain, Size: 2010 bytes --]

On Thu, Jul 31, 2014 at 03:44:11PM +0200, Ingo Molnar wrote:
> 
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index ae712b25e492..3d726fdaa764 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -473,8 +473,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >  	 * reschedule now, before we try-lock the mutex. This avoids getting
> >  	 * scheduled out right after we obtained the mutex.
> >  	 */
> > -	if (need_resched())
> > +	if (need_resched()) {
> > +		if (WARN_ON_ONCE(current->state != TASK_RUNNING))
> > +			__set_current_state(TASK_RUNNING);
> > +
> >  		schedule_preempt_disabled();
> > +	}
> 
> Might make sense to add that debug check under mutex debugging or so, 
> with a sensible kernel message printed.

Something like so? I suppose we should do a similar one for rwsem,
semaphores and possibly wait_event*() too.

---
Subject: locking/mutex: Add debug check for task state

Calling blocking locks with current->state != TASK_RUNNING is a bug.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/locking/mutex.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index ae712b25e492..d5daf8c38899 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -375,6 +375,17 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	unsigned long flags;
 	int ret;
 
+#ifdef CONFIG_DEBUG_MUTEXES
+	/*
+	 * Blocking primitives (including this one) will set (and therefore
+	 * destroy) current->state, since we will exit with TASK_RUNNING
+	 * make sure we enter with it, otherwise we will destroy state.
+	 */
+	if (WARN_ONCE(current->state != TASK_RUNNING,
+			"do not call blocking locks when !TASK_RUNNING\n"))
+		__set_current_state(TASK_RUNNING);
+#endif
+
 	preempt_disable();
 	mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
 

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-07-31 13:13     ` Peter Zijlstra
  2014-07-31 13:25       ` Ilya Dryomov
  2014-07-31 13:44       ` Ingo Molnar
@ 2014-07-31 14:30       ` Mike Galbraith
  2014-07-31 14:37         ` Ilya Dryomov
  2014-07-31 14:39         ` Peter Zijlstra
  2 siblings, 2 replies; 14+ messages in thread
From: Mike Galbraith @ 2014-07-31 14:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ilya Dryomov, Linux Kernel Mailing List, Ingo Molnar,
	Ceph Development, davidlohr, jason.low2

On Thu, 2014-07-31 at 15:13 +0200, Peter Zijlstra wrote:

> Smells like maybe current->state != TASK_RUNNING

Bingo

[ 1200.851004] kjournald       D 0000000000000002     0  4398      2 0x00000000
[ 1200.858283]  ffff8803537bb788 0000000000000046 ffff8803537bb7a8 0000000000000000
[ 1200.865914]  ffff880423324b60 0000000000012f80 ffff8803537bbfd8 0000000000012f80
[ 1200.873590]  ffff88042937cb60 ffff880423324b60 ffff880428ceb240 ffff8804231e59b8
[ 1200.881256] Call Trace:
[ 1200.883724]  [<ffffffff816981c9>] schedule+0x29/0x70
[ 1200.888798]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1200.895239]  [<ffffffff81699fe5>] __mutex_lock_slowpath+0x1b5/0x1c0
[ 1200.901673]  [<ffffffffa0479826>] ? ceph_str_hash+0x26/0x80 [libceph]
[ 1200.908198]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1200.913430]  [<ffffffffa046751d>] ceph_con_send+0x4d/0x130 [libceph]
[ 1200.919912]  [<ffffffffa046c540>] __send_queued+0x120/0x150 [libceph]
[ 1200.926444]  [<ffffffffa046ec5b>] __ceph_osdc_start_request+0x5b/0xd0 [libceph]
[ 1200.933855]  [<ffffffffa046ed21>] ceph_osdc_start_request+0x51/0x80 [libceph]
[ 1200.941126]  [<ffffffffa042bf60>] rbd_obj_request_submit.isra.25+0x10/0x20 [rbd]
[ 1200.948622]  [<ffffffffa042e8ee>] rbd_img_obj_request_submit+0x1ce/0x460 [rbd]
[ 1200.956040]  [<ffffffffa042ebcc>] rbd_img_request_submit+0x4c/0x60 [rbd]
[ 1200.962845]  [<ffffffffa042f2a8>] rbd_request_fn+0x238/0x290 [rbd]
[ 1200.969108]  [<ffffffff8133a397>] __blk_run_queue+0x37/0x50
[ 1200.974764]  [<ffffffff8133affd>] queue_unplugged+0x3d/0xc0
[ 1200.980424]  [<ffffffff8133fddb>] blk_flush_plug_list+0x1db/0x210
[ 1200.986635]  [<ffffffff81698288>] io_schedule+0x78/0xd0
[ 1200.991954]  [<ffffffff8133b864>] get_request+0x414/0x800
[ 1200.997440]  [<ffffffff8133f477>] ? bio_attempt_back_merge+0x37/0x100
[ 1201.004013]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
[ 1201.009782]  [<ffffffff8133ff2c>] blk_queue_bio+0xcc/0x360
[ 1201.015353]  [<ffffffff8133c2d0>] generic_make_request+0xc0/0x100
[ 1201.021605]  [<ffffffff8133c385>] submit_bio+0x75/0x140
[ 1201.026921]  [<ffffffff811de4e6>] _submit_bh+0x136/0x1f0
[ 1201.032390]  [<ffffffff81290081>] journal_do_submit_data+0x41/0x50
[ 1201.038662]  [<ffffffff81291380>] journal_commit_transaction+0x1150/0x1350
[ 1201.045683]  [<ffffffff81063aff>] ? try_to_del_timer_sync+0x4f/0x70
[ 1201.052043]  [<ffffffff81293e01>] kjournald+0xe1/0x260
[ 1201.057324]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
[ 1201.063072]  [<ffffffff81293d20>] ? commit_timeout+0x10/0x10
[ 1201.068855]  [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1201.073819]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.080084]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1201.085573]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-07-31 14:30       ` [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point" Mike Galbraith
@ 2014-07-31 14:37         ` Ilya Dryomov
  2014-07-31 14:39         ` Peter Zijlstra
  1 sibling, 0 replies; 14+ messages in thread
From: Ilya Dryomov @ 2014-07-31 14:37 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Ingo Molnar,
	Ceph Development, davidlohr, jason.low2

On Thu, Jul 31, 2014 at 6:30 PM, Mike Galbraith
<umgwanakikbuti@gmail.com> wrote:
> On Thu, 2014-07-31 at 15:13 +0200, Peter Zijlstra wrote:
>
>> Smells like maybe current->state != TASK_RUNNING

It just triggered for me too, took longer than usual.  Sorry for the
churn Peter, this was really confusing.  Onto finding the real bug..

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-07-31 14:30       ` [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point" Mike Galbraith
  2014-07-31 14:37         ` Ilya Dryomov
@ 2014-07-31 14:39         ` Peter Zijlstra
  2014-08-01 12:56           ` Ilya Dryomov
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-07-31 14:39 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ilya Dryomov, Linux Kernel Mailing List, Ingo Molnar,
	Ceph Development, davidlohr, jason.low2, axboe

[-- Attachment #1: Type: text/plain, Size: 3373 bytes --]

On Thu, Jul 31, 2014 at 04:30:52PM +0200, Mike Galbraith wrote:
> On Thu, 2014-07-31 at 15:13 +0200, Peter Zijlstra wrote:
> 
> > Smells like maybe current->state != TASK_RUNNING
> 
> Bingo
> 
> [ 1200.851004] kjournald       D 0000000000000002     0  4398      2 0x00000000
> [ 1200.858283]  ffff8803537bb788 0000000000000046 ffff8803537bb7a8 0000000000000000
> [ 1200.865914]  ffff880423324b60 0000000000012f80 ffff8803537bbfd8 0000000000012f80
> [ 1200.873590]  ffff88042937cb60 ffff880423324b60 ffff880428ceb240 ffff8804231e59b8
> [ 1200.881256] Call Trace:
> [ 1200.883724]  [<ffffffff816981c9>] schedule+0x29/0x70
> [ 1200.888798]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
> [ 1200.895239]  [<ffffffff81699fe5>] __mutex_lock_slowpath+0x1b5/0x1c0
> [ 1200.901673]  [<ffffffffa0479826>] ? ceph_str_hash+0x26/0x80 [libceph]
> [ 1200.908198]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
> [ 1200.913430]  [<ffffffffa046751d>] ceph_con_send+0x4d/0x130 [libceph]
> [ 1200.919912]  [<ffffffffa046c540>] __send_queued+0x120/0x150 [libceph]
> [ 1200.926444]  [<ffffffffa046ec5b>] __ceph_osdc_start_request+0x5b/0xd0 [libceph]
> [ 1200.933855]  [<ffffffffa046ed21>] ceph_osdc_start_request+0x51/0x80 [libceph]
> [ 1200.941126]  [<ffffffffa042bf60>] rbd_obj_request_submit.isra.25+0x10/0x20 [rbd]
> [ 1200.948622]  [<ffffffffa042e8ee>] rbd_img_obj_request_submit+0x1ce/0x460 [rbd]
> [ 1200.956040]  [<ffffffffa042ebcc>] rbd_img_request_submit+0x4c/0x60 [rbd]
> [ 1200.962845]  [<ffffffffa042f2a8>] rbd_request_fn+0x238/0x290 [rbd]
> [ 1200.969108]  [<ffffffff8133a397>] __blk_run_queue+0x37/0x50
> [ 1200.974764]  [<ffffffff8133affd>] queue_unplugged+0x3d/0xc0
> [ 1200.980424]  [<ffffffff8133fddb>] blk_flush_plug_list+0x1db/0x210
> [ 1200.986635]  [<ffffffff81698288>] io_schedule+0x78/0xd0
> [ 1200.991954]  [<ffffffff8133b864>] get_request+0x414/0x800
> [ 1200.997440]  [<ffffffff8133f477>] ? bio_attempt_back_merge+0x37/0x100
> [ 1201.004013]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
> [ 1201.009782]  [<ffffffff8133ff2c>] blk_queue_bio+0xcc/0x360
> [ 1201.015353]  [<ffffffff8133c2d0>] generic_make_request+0xc0/0x100
> [ 1201.021605]  [<ffffffff8133c385>] submit_bio+0x75/0x140
> [ 1201.026921]  [<ffffffff811de4e6>] _submit_bh+0x136/0x1f0
> [ 1201.032390]  [<ffffffff81290081>] journal_do_submit_data+0x41/0x50
> [ 1201.038662]  [<ffffffff81291380>] journal_commit_transaction+0x1150/0x1350
> [ 1201.045683]  [<ffffffff81063aff>] ? try_to_del_timer_sync+0x4f/0x70
> [ 1201.052043]  [<ffffffff81293e01>] kjournald+0xe1/0x260
> [ 1201.057324]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
> [ 1201.063072]  [<ffffffff81293d20>] ? commit_timeout+0x10/0x10
> [ 1201.068855]  [<ffffffff81078829>] kthread+0xc9/0xe0
> [ 1201.073819]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
> [ 1201.080084]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
> [ 1201.085573]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0

Ohh. that's properly broken indeed.

You can't just call blocking primitives on the way to schedule(), that's
fail.

Also, if I look at blk_flush_plug_list(), it calls queue_unplugged()
with IRQs disabled, so _who_ is enabling them again and calling blocking
stuff?

/me stares more..

rbd_request_fn() does.. *argh*

Someone needs to go fix, this cannot work right.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-07-31 14:39         ` Peter Zijlstra
@ 2014-08-01 12:56           ` Ilya Dryomov
  2014-08-01 13:27             ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Ilya Dryomov @ 2014-08-01 12:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Linux Kernel Mailing List, Ingo Molnar,
	Ceph Development, davidlohr, jason.low2, axboe

On Thu, Jul 31, 2014 at 6:39 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jul 31, 2014 at 04:30:52PM +0200, Mike Galbraith wrote:
>> On Thu, 2014-07-31 at 15:13 +0200, Peter Zijlstra wrote:
>>
>> > Smells like maybe current->state != TASK_RUNNING
>>
>> Bingo
>>
>> [ 1200.851004] kjournald       D 0000000000000002     0  4398      2 0x00000000
>> [ 1200.858283]  ffff8803537bb788 0000000000000046 ffff8803537bb7a8 0000000000000000
>> [ 1200.865914]  ffff880423324b60 0000000000012f80 ffff8803537bbfd8 0000000000012f80
>> [ 1200.873590]  ffff88042937cb60 ffff880423324b60 ffff880428ceb240 ffff8804231e59b8
>> [ 1200.881256] Call Trace:
>> [ 1200.883724]  [<ffffffff816981c9>] schedule+0x29/0x70
>> [ 1200.888798]  [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
>> [ 1200.895239]  [<ffffffff81699fe5>] __mutex_lock_slowpath+0x1b5/0x1c0
>> [ 1200.901673]  [<ffffffffa0479826>] ? ceph_str_hash+0x26/0x80 [libceph]
>> [ 1200.908198]  [<ffffffff8169a013>] mutex_lock+0x23/0x37
>> [ 1200.913430]  [<ffffffffa046751d>] ceph_con_send+0x4d/0x130 [libceph]
>> [ 1200.919912]  [<ffffffffa046c540>] __send_queued+0x120/0x150 [libceph]
>> [ 1200.926444]  [<ffffffffa046ec5b>] __ceph_osdc_start_request+0x5b/0xd0 [libceph]
>> [ 1200.933855]  [<ffffffffa046ed21>] ceph_osdc_start_request+0x51/0x80 [libceph]
>> [ 1200.941126]  [<ffffffffa042bf60>] rbd_obj_request_submit.isra.25+0x10/0x20 [rbd]
>> [ 1200.948622]  [<ffffffffa042e8ee>] rbd_img_obj_request_submit+0x1ce/0x460 [rbd]
>> [ 1200.956040]  [<ffffffffa042ebcc>] rbd_img_request_submit+0x4c/0x60 [rbd]
>> [ 1200.962845]  [<ffffffffa042f2a8>] rbd_request_fn+0x238/0x290 [rbd]
>> [ 1200.969108]  [<ffffffff8133a397>] __blk_run_queue+0x37/0x50
>> [ 1200.974764]  [<ffffffff8133affd>] queue_unplugged+0x3d/0xc0
>> [ 1200.980424]  [<ffffffff8133fddb>] blk_flush_plug_list+0x1db/0x210
>> [ 1200.986635]  [<ffffffff81698288>] io_schedule+0x78/0xd0
>> [ 1200.991954]  [<ffffffff8133b864>] get_request+0x414/0x800
>> [ 1200.997440]  [<ffffffff8133f477>] ? bio_attempt_back_merge+0x37/0x100
>> [ 1201.004013]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
>> [ 1201.009782]  [<ffffffff8133ff2c>] blk_queue_bio+0xcc/0x360
>> [ 1201.015353]  [<ffffffff8133c2d0>] generic_make_request+0xc0/0x100
>> [ 1201.021605]  [<ffffffff8133c385>] submit_bio+0x75/0x140
>> [ 1201.026921]  [<ffffffff811de4e6>] _submit_bh+0x136/0x1f0
>> [ 1201.032390]  [<ffffffff81290081>] journal_do_submit_data+0x41/0x50
>> [ 1201.038662]  [<ffffffff81291380>] journal_commit_transaction+0x1150/0x1350
>> [ 1201.045683]  [<ffffffff81063aff>] ? try_to_del_timer_sync+0x4f/0x70
>> [ 1201.052043]  [<ffffffff81293e01>] kjournald+0xe1/0x260
>> [ 1201.057324]  [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
>> [ 1201.063072]  [<ffffffff81293d20>] ? commit_timeout+0x10/0x10
>> [ 1201.068855]  [<ffffffff81078829>] kthread+0xc9/0xe0
>> [ 1201.073819]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
>> [ 1201.080084]  [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
>> [ 1201.085573]  [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
>
> Ohh. that's properly broken indeed.
>
> You can't just call blocking primitives on the way to schedule(), that's
> fail.
>
> Also, if I look at blk_flush_plug_list(), it calls queue_unplugged()
> with IRQs disabled, so _who_ is enabling them again and calling blocking
> stuff?
>
> /me stares more..
>
> rbd_request_fn() does.. *argh*
>
> Someone needs to go fix, this cannot work right.

I'm going to fix up rbd_request_fn(), but I want to make sure
I understand this in full.

- Previously the danger of calling blocking primitives on the way to
  schedule(), i.e. with task->state != TASK_RUNNING, was that if the
  blocking primitive was indeed to block the task state would be set
  back to TASK_RUNNING and the schedule() that that task was on the way
  to wouldn't have any effect.  Your "Add extra reschedule point" patch
  essentially made calling mutex_lock() and probably others much more
  wrong that it used to be, because mutex_lock() may now reschedule
  when the task is not on the mutex wait queue.

- There is nothing wrong with releasing queue_lock and reenabling IRQs
  in rbd_request_fn() as long as it doesn't block and I remember to
  disable IRQs and take queue_lock back on return.

I'm asking because rbd_request_fn() is probably not the only broken in
this way code path.  I poked around and found read_events() in aio.c,
it seems to have been written with the "danger" assumption that
I outlined above and there is even a comment to it.

Does that above make sense or am I missing something?

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-08-01 12:56           ` Ilya Dryomov
@ 2014-08-01 13:27             ` Peter Zijlstra
  2014-08-01 13:50               ` Ilya Dryomov
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-08-01 13:27 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Mike Galbraith, Linux Kernel Mailing List, Ingo Molnar,
	Ceph Development, davidlohr, jason.low2, axboe

[-- Attachment #1: Type: text/plain, Size: 2211 bytes --]

On Fri, Aug 01, 2014 at 04:56:27PM +0400, Ilya Dryomov wrote:
> I'm going to fix up rbd_request_fn(), but I want to make sure
> I understand this in full.
> 
> - Previously the danger of calling blocking primitives on the way to
>   schedule(), i.e. with task->state != TASK_RUNNING, was that if the
>   blocking primitive was indeed to block the task state would be set
>   back to TASK_RUNNING and the schedule() that that task was on the way
>   to wouldn't have any effect.  Your "Add extra reschedule point" patch
>   essentially made calling mutex_lock() and probably others much more
>   wrong that it used to be, because mutex_lock() may now reschedule
>   when the task is not on the mutex wait queue.

Right and in general we cannot allow spurious wakeups (although we try
very hard to deal with them in generic code, which is why things more or
less worked).

But if you do a patch that 'randomly' ignores ->state on schedule (I did
one) stuff comes apart _real_ quick.

Therefore you should very much not destroy ->state on the way to
schedule.

> - There is nothing wrong with releasing queue_lock and reenabling IRQs
>   in rbd_request_fn() as long as it doesn't block and I remember to
>   disable IRQs and take queue_lock back on return.

Releasing queue_lock might be ok, dunno about the blk locking, however
reenabling IRQs it is actually wrong as per blk_flush_plug_list() since
that uses local_irq_save()/restore() which means it can be called from
contexts which cannot deal with enabling IRQs, and then your
->request_fn() goes and does that.

Now maybe blk_flush_plug_list() is overly paranoid and it could use
local_irq_disable()/enable() instead, I don't know. But until it does, a
request_fn() should never reenable IRQs.

> I'm asking because rbd_request_fn() is probably not the only broken in
> this way code path.  I poked around and found read_events() in aio.c,
> it seems to have been written with the "danger" assumption that
> I outlined above and there is even a comment to it.

I'm fairly sure there's more broken stuff, I didn't dare looking.

> Does that above make sense or am I missing something?

I think that's about it.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"
  2014-08-01 13:27             ` Peter Zijlstra
@ 2014-08-01 13:50               ` Ilya Dryomov
  0 siblings, 0 replies; 14+ messages in thread
From: Ilya Dryomov @ 2014-08-01 13:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Linux Kernel Mailing List, Ingo Molnar,
	Ceph Development, davidlohr, jason.low2, axboe, Kent Overstreet

On Fri, Aug 1, 2014 at 5:27 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Aug 01, 2014 at 04:56:27PM +0400, Ilya Dryomov wrote:
>> I'm going to fix up rbd_request_fn(), but I want to make sure
>> I understand this in full.
>>
>> - Previously the danger of calling blocking primitives on the way to
>>   schedule(), i.e. with task->state != TASK_RUNNING, was that if the
>>   blocking primitive was indeed to block the task state would be set
>>   back to TASK_RUNNING and the schedule() that that task was on the way
>>   to wouldn't have any effect.  Your "Add extra reschedule point" patch
>>   essentially made calling mutex_lock() and probably others much more
>>   wrong that it used to be, because mutex_lock() may now reschedule
>>   when the task is not on the mutex wait queue.
>
> Right and in general we cannot allow spurious wakeups (although we try
> very hard to deal with them in generic code, which is why things more or
> less worked).
>
> But if you do a patch that 'randomly' ignores ->state on schedule (I did
> one) stuff comes apart _real_ quick.
>
> Therefore you should very much not destroy ->state on the way to
> schedule.
>
>> - There is nothing wrong with releasing queue_lock and reenabling IRQs
>>   in rbd_request_fn() as long as it doesn't block and I remember to
>>   disable IRQs and take queue_lock back on return.
>
> Releasing queue_lock might be ok, dunno about the blk locking, however
> reenabling IRQs it is actually wrong as per blk_flush_plug_list() since
> that uses local_irq_save()/restore() which means it can be called from
> contexts which cannot deal with enabling IRQs, and then your
> ->request_fn() goes and does that.
>
> Now maybe blk_flush_plug_list() is overly paranoid and it could use
> local_irq_disable()/enable() instead, I don't know. But until it does, a
> request_fn() should never reenable IRQs.
>
>> I'm asking because rbd_request_fn() is probably not the only broken in
>> this way code path.  I poked around and found read_events() in aio.c,
>> it seems to have been written with the "danger" assumption that
>> I outlined above and there is even a comment to it.
>
> I'm fairly sure there's more broken stuff, I didn't dare looking.
>
>> Does that above make sense or am I missing something?
>
> I think that's about it.

Thanks for clarifying things.  CC'ing Kent to draw attention to
read_events().

                Ilya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][PATCH] locking: Debug nested wait/locking primitives
  2014-07-31 13:56         ` Peter Zijlstra
@ 2014-08-02 20:04           ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2014-08-02 20:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ilya Dryomov, Linux Kernel Mailing List, Ceph Development,
	davidlohr, jason.low2


This should cover most cases I think.

I'll have to break this out into different patches, and maybe clean up
things a bit (there's certainly comments missing and some repetition).

But this boots on my test box and builds a kernel without generating a
single WARN -- big improvement over not booting (partly due to excessive
warn output) with just the sched/core.c bit.

---
 drivers/tty/n_tty.c              | 71 +++++++++++++++++++++++++++++++++-------
 fs/notify/inotify/inotify_user.c | 34 +++++++++++++++++--
 fs/notify/notification.c         |  2 +-
 include/linux/kernel.h           |  4 +--
 include/linux/sched.h            | 46 ++++++++++++++++++++++++--
 include/linux/wait.h             |  2 ++
 kernel/exit.c                    |  6 ++++
 kernel/sched/core.c              | 14 ++++++++
 kernel/smpboot.c                 | 15 +++++----
 9 files changed, 168 insertions(+), 26 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index f44f1ba762c3..5e4830979937 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -2098,6 +2098,36 @@ static int job_control(struct tty_struct *tty, struct file *file)
 	return 0;
 }
 
+/*
+ * n_tty_{read,write} use blocking primitives (mutex_lock, down_read, etc.)
+ * inside an open-coded wait loop.
+ *
+ * The wait loop relies on current->state to record wakeups; these will change
+ * it back to TASK_RUNNING. However blocking primitives themselves also change
+ * current->state. Therefore we must implement another means of recording
+ * wakeups.
+ *
+ * We do this by setting an alternative waitqueue wake function which changes
+ * an additional variable.
+ */
+
+struct n_tty_wakeup_state {
+	struct task_struct *task;
+	bool woken;
+};
+
+static int n_tty_wake(wait_queue_t *wait, unsigned mode,
+			    int wake_flags, void *key)
+{
+	struct n_tty_wakeup_state *s = wait->private;
+	DECLARE_WAITQUEUE(dummy_wait, s->task);
+
+	smp_wmb();
+	s->woken = true;
+
+	return default_wake_function(&dummy_wait, mode, wake_flags, key);
+}
+
 
 /**
  *	n_tty_read		-	read function for tty
@@ -2123,7 +2153,11 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 {
 	struct n_tty_data *ldata = tty->disc_data;
 	unsigned char __user *b = buf;
-	DECLARE_WAITQUEUE(wait, current);
+	struct n_tty_wakeup_state s = {
+		.task = current,
+		.woken = false,
+	};
+	wait_queue_t wait;
 	int c;
 	int minimum, time;
 	ssize_t retval = 0;
@@ -2167,6 +2201,9 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 
 	packet = tty->packet;
 
+	init_waitqueue_func_entry(&wait, n_tty_wake);
+	wait.private = &s;
+
 	add_wait_queue(&tty->read_wait, &wait);
 	while (nr) {
 		/* First test for status change. */
@@ -2186,10 +2223,11 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 			nr--;
 			break;
 		}
-		/* This statement must be first before checking for input
-		   so that any interrupt will set the state back to
-		   TASK_RUNNING. */
-		set_current_state(TASK_INTERRUPTIBLE);
+		/*
+		 * This statement must be first before checking for input so
+		 * that any interrupt will set it to true.
+		 */
+		s.woken = false;
 
 		if (((minimum - (b - buf)) < ldata->minimum_to_wake) &&
 		    ((minimum - (b - buf)) >= 1))
@@ -2220,13 +2258,15 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 				n_tty_set_room(tty);
 				up_read(&tty->termios_rwsem);
 
-				timeout = schedule_timeout(timeout);
+				set_current_state(TASK_INTERRUPTIBLE);
+				if (!s.woken)
+					timeout = schedule_timeout(timeout);
+				__set_current_state(TASK_RUNNING);
 
 				down_read(&tty->termios_rwsem);
 				continue;
 			}
 		}
-		__set_current_state(TASK_RUNNING);
 
 		/* Deal with packet mode. */
 		if (packet && b == buf) {
@@ -2273,7 +2313,6 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 
 	mutex_unlock(&ldata->atomic_read_lock);
 
-	__set_current_state(TASK_RUNNING);
 	if (b - buf)
 		retval = b - buf;
 
@@ -2306,7 +2345,11 @@ static ssize_t n_tty_write(struct tty_struct *tty, struct file *file,
 			   const unsigned char *buf, size_t nr)
 {
 	const unsigned char *b = buf;
-	DECLARE_WAITQUEUE(wait, current);
+	struct n_tty_wakeup_state s = {
+		.task = current,
+		.woken = false,
+	};
+	wait_queue_t wait;
 	int c;
 	ssize_t retval = 0;
 
@@ -2322,9 +2365,12 @@ static ssize_t n_tty_write(struct tty_struct *tty, struct file *file,
 	/* Write out any echoed characters that are still pending */
 	process_echoes(tty);
 
+	init_waitqueue_func_entry(&wait, n_tty_wake);
+	wait.private = &s;
+
 	add_wait_queue(&tty->write_wait, &wait);
 	while (1) {
-		set_current_state(TASK_INTERRUPTIBLE);
+		s.woken = false;
 		if (signal_pending(current)) {
 			retval = -ERESTARTSYS;
 			break;
@@ -2378,7 +2424,10 @@ static ssize_t n_tty_write(struct tty_struct *tty, struct file *file,
 		}
 		up_read(&tty->termios_rwsem);
 
-		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (!s.woken)
+			schedule();
+		__set_current_state(TASK_RUNNING);
 
 		down_read(&tty->termios_rwsem);
 	}
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index cc423a30a0c8..e3b65ce6b312 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -220,6 +220,23 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 	return event_size;
 }
 
+struct inotify_wakeup_state {
+	struct task_struct *task;
+	bool woken;
+};
+
+static int inotify_wake(wait_queue_t *wait, unsigned mode,
+			int wake_flags, void *key)
+{
+	struct inotify_wakeup_state *s = wait->private;
+	DECLARE_WAITQUEUE(dummy_wait, s->task);
+
+	smp_wmb();
+	s->woken = true;
+
+	return autoremove_wake_function(&dummy_wait, mode, wake_flags, key);
+}
+
 static ssize_t inotify_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *pos)
 {
@@ -227,13 +244,21 @@ static ssize_t inotify_read(struct file *file, char __user *buf,
 	struct fsnotify_event *kevent;
 	char __user *start;
 	int ret;
-	DEFINE_WAIT(wait);
+	struct inotify_wakeup_state s = {
+		.task = current,
+		.woken = false,
+	};
+	wait_queue_t wait;
+
+	init_waitqueue_func_entry(&wait, inotify_wake);
+	wait.private = &s;
 
 	start = buf;
 	group = file->private_data;
 
+	add_wait_queue(&group->notification_waitq, &wait);
 	while (1) {
-		prepare_to_wait(&group->notification_waitq, &wait, TASK_INTERRUPTIBLE);
+		s.woken = false;
 
 		mutex_lock(&group->notification_mutex);
 		kevent = get_one_event(group, count);
@@ -264,7 +289,10 @@ static ssize_t inotify_read(struct file *file, char __user *buf,
 		if (start != buf)
 			break;
 
-		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (!s.woken)
+			schedule();
+		__set_current_state(TASK_RUNNING);
 	}
 
 	finish_wait(&group->notification_waitq, &wait);
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index 1e58402171a5..dcfcdd69d1de 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -63,7 +63,7 @@ EXPORT_SYMBOL_GPL(fsnotify_get_cookie);
 /* return true if the notify queue is empty, false otherwise */
 bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group)
 {
-	BUG_ON(!mutex_is_locked(&group->notification_mutex));
+	lockdep_assert_held(&group->notification_mutex);
 	return list_empty(&group->notification_list) ? true : false;
 }
 
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 4c52907a6d8b..aac1dc9da2d0 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -162,7 +162,7 @@ extern int _cond_resched(void);
 #endif
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
-  void __might_sleep(const char *file, int line, int preempt_offset);
+extern  void __might_sleep(const char *file, int line, int preempt_offset);
 /**
  * might_sleep - annotation for functions that can sleep
  *
@@ -174,7 +174,7 @@ extern int _cond_resched(void);
  * supposed to.
  */
 # define might_sleep() \
-	do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
+	  do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
 #else
   static inline void __might_sleep(const char *file, int line,
 				   int preempt_offset) { }
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 66124d63371a..62dab5738e66 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -241,6 +241,43 @@ extern char ___assert_task_state[1 - 2*!!(
 				((task->state & TASK_UNINTERRUPTIBLE) != 0 && \
 				 (task->flags & PF_FROZEN) == 0)
 
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+
+#define __set_task_state(tsk, state_value)			\
+	do {							\
+		(tsk)->task_state_change = _THIS_IP_;		\
+		(tsk)->state = (state_value);			\
+	} while (0)
+#define set_task_state(tsk, state_value)			\
+	do {							\
+		(tsk)->task_state_change = _THIS_IP_;		\
+		set_mb((tsk)->state, (state_value));		\
+	} while (0)
+
+/*
+ * set_current_state() includes a barrier so that the write of current->state
+ * is correctly serialised wrt the caller's subsequent test of whether to
+ * actually sleep:
+ *
+ *	set_current_state(TASK_UNINTERRUPTIBLE);
+ *	if (do_i_need_to_sleep())
+ *		schedule();
+ *
+ * If the caller does not need such serialisation then use __set_current_state()
+ */
+#define __set_current_state(state_value)			\
+	do {							\
+		current->task_state_change = _THIS_IP_;		\
+		current->state = (state_value);			\
+	} while (0)
+#define set_current_state(state_value)				\
+	do {							\
+		current->task_state_change = _THIS_IP_;		\
+		set_mb(current->state, (state_value));		\
+	} while (0)
+
+#else
+
 #define __set_task_state(tsk, state_value)		\
 	do { (tsk)->state = (state_value); } while (0)
 #define set_task_state(tsk, state_value)		\
@@ -257,11 +294,13 @@ extern char ___assert_task_state[1 - 2*!!(
  *
  * If the caller does not need such serialisation then use __set_current_state()
  */
-#define __set_current_state(state_value)			\
+#define __set_current_state(state_value)		\
 	do { current->state = (state_value); } while (0)
-#define set_current_state(state_value)		\
+#define set_current_state(state_value)			\
 	set_mb(current->state, (state_value))
 
+#endif
+
 /* Task command name length */
 #define TASK_COMM_LEN 16
 
@@ -1650,6 +1689,9 @@ struct task_struct {
 	unsigned int	sequential_io;
 	unsigned int	sequential_io_avg;
 #endif
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+	unsigned long	task_state_change;
+#endif
 };
 
 /* Future-safe accessor for struct task_struct's cpus_allowed. */
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 6fb1ba5f9b2f..041b744e99b4 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -211,6 +211,8 @@ wait_queue_head_t *bit_waitqueue(void *, int);
 	wait_queue_t __wait;						\
 	long __ret = ret;	/* explicit shadow */			\
 									\
+	might_sleep();							\
+									\
 	INIT_LIST_HEAD(&__wait.task_list);				\
 	if (exclusive)							\
 		__wait.flags = WQ_FLAG_EXCLUSIVE;			\
diff --git a/kernel/exit.c b/kernel/exit.c
index e5c4668f1799..44cbc8791fca 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1067,6 +1067,12 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
 	}
 
 	/*
+	 * Since we're going to terminate the wait loop from do_wait(),
+	 * reset task state.
+	 */
+	__set_current_state(TASK_RUNNING);
+
+	/*
 	 * Now we are sure this task is interesting, and no other
 	 * thread can reap it because we its state == DEAD/TRACE.
 	 */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2676866b4394..0577c12c9cf8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7077,6 +7077,19 @@ void __might_sleep(const char *file, int line, int preempt_offset)
 {
 	static unsigned long prev_jiffy;	/* ratelimiting */
 
+	/*
+	 * Blocking primitives will set (and therefore destroy) current->state,
+	 * since we will exit with TASK_RUNNING make sure we enter with it,
+	 * otherwise we will destroy state.
+	 */
+	if (WARN(current->state != TASK_RUNNING,
+			"do not call blocking ops when !TASK_RUNNING; "
+			"state=%lx set at [<%p>] %pS\n",
+			current->state,
+			current->task_state_change,
+			current->task_state_change))
+		__set_current_state(TASK_RUNNING);
+
 	rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
 	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
 	     !is_idle_task(current)) ||
@@ -7107,6 +7120,7 @@ void __might_sleep(const char *file, int line, int preempt_offset)
 	dump_stack();
 }
 EXPORT_SYMBOL(__might_sleep);
+
 #endif
 
 #ifdef CONFIG_MAGIC_SYSRQ
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index eb89e1807408..f032fb5284e3 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -110,7 +110,7 @@ static int smpboot_thread_fn(void *data)
 		set_current_state(TASK_INTERRUPTIBLE);
 		preempt_disable();
 		if (kthread_should_stop()) {
-			set_current_state(TASK_RUNNING);
+			__set_current_state(TASK_RUNNING);
 			preempt_enable();
 			if (ht->cleanup)
 				ht->cleanup(td->cpu, cpu_online(td->cpu));
@@ -136,26 +136,27 @@ static int smpboot_thread_fn(void *data)
 		/* Check for state change setup */
 		switch (td->status) {
 		case HP_THREAD_NONE:
+			__set_current_state(TASK_RUNNING);
 			preempt_enable();
 			if (ht->setup)
 				ht->setup(td->cpu);
 			td->status = HP_THREAD_ACTIVE;
-			preempt_disable();
-			break;
+			continue;
+
 		case HP_THREAD_PARKED:
+			__set_current_state(TASK_RUNNING);
 			preempt_enable();
 			if (ht->unpark)
 				ht->unpark(td->cpu);
 			td->status = HP_THREAD_ACTIVE;
-			preempt_disable();
-			break;
+			continue;
 		}
 
 		if (!ht->thread_should_run(td->cpu)) {
-			preempt_enable();
+			preempt_enable_no_resched();
 			schedule();
 		} else {
-			set_current_state(TASK_RUNNING);
+			__set_current_state(TASK_RUNNING);
 			preempt_enable();
 			ht->thread_fn(td->cpu);
 		}

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-08-02 20:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-31 10:16 [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point" Ilya Dryomov
2014-07-31 11:57 ` Peter Zijlstra
2014-07-31 12:37   ` Ilya Dryomov
2014-07-31 13:13     ` Peter Zijlstra
2014-07-31 13:25       ` Ilya Dryomov
2014-07-31 13:44       ` Ingo Molnar
2014-07-31 13:56         ` Peter Zijlstra
2014-08-02 20:04           ` [RFC][PATCH] locking: Debug nested wait/locking primitives Peter Zijlstra
2014-07-31 14:30       ` [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point" Mike Galbraith
2014-07-31 14:37         ` Ilya Dryomov
2014-07-31 14:39         ` Peter Zijlstra
2014-08-01 12:56           ` Ilya Dryomov
2014-08-01 13:27             ` Peter Zijlstra
2014-08-01 13:50               ` Ilya Dryomov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox