* Re: Oops in VMA code
From: Alexander Graf @ 2011-06-16 7:06 UTC (permalink / raw)
To: Linus Torvalds
Cc: Benjamin Herrenschmidt, linux-mm,
linux-kernel@vger.kernel.org List
In-Reply-To: <BANLkTimB5gEZ2S=b9EiiWR-_u+o+wEPyjw@mail.gmail.com>
On 16.06.2011, at 08:54, Linus Torvalds wrote:
> On Wed, Jun 15, 2011 at 11:20 PM, Alexander Graf <agraf@suse.de> wrote:
>>
>> On 16.06.2011, at 07:59, Linus Torvalds wrote:
>>>
>>> r26 has the value 0xc00090026236bbb0, and that "90" byte in the middle
>>> there looks bogus. It's not a valid pointer any more, but if that "9"
>>> had been a zero, it would have been.
>>
>> Please see my reply to Ben here.
>
> Your reply to Ben seems to say that 0xc00000026236bbb0 wouldn't have
> been a valid address, because you don't have that much memory.
>
> But that's clearly not true. All the other registers have valid
> pointers in them, and the stack pointer (r1) is c000000262987cd0, for
> example. And that stack is clearly valid - if the kernel stack pointer
> was corrupted, you'd never have gotten as far as reporting the oops.
>
> So you may have only 8GB of RAM in that machine, but if so, there's
> some empty unmapped physical space. Because clearly your RAM is _not_
> limited to being mapped to below 0xc000000200000000.
Ah, yes. The PowerMacs have this nice memory hole, so RAM is actually mapped non-linearly:
Top of RAM: 0x280000000, Total RAM: 0x200000000
So you're right. The address does look valid.
> To recap: I'm pretty sure the memory corruption is just the "90" byte.
> The rest of the pointer looks too much like a pointer to be otherwise.
> Whether that's due to a two-bit error (unlikely) or a wild byte write
> (or 16-bit write with zeroes) is hard to say. USUALLY when we have
> wild pointer errors, the corruption is more than just a few bits, but
> it could have been something that sets a few bits in software, and
> just sets them using a stale pointer.
That could very well be - the unaligned location is very odd indeed. So some ORing function sounds likely.
>> Yup, so let's keep this documented for now. Actually, the more I think about it the more it looks like simple random memory corruption by someone else in the kernel - and that's basically impossible to track and will give completely different bugs next time around :(.
>
> We've had several bugs found by the pattern of the corruption, so I
> wouldn't say "impossible to track". Even if the next time ends up
> being a completely different oops (because the corruption happened in
> a totally different kind of data structure), it might be possible that
> there's that same "90" byte pattern, for example.
>
> But it needs more than one bug report to see what the pattern is.
> Usually it takes a _lot_ more..
Yeah, let's wait for that moment then :). For now everything's pure speculation.
Alex
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* [54/91] UBIFS: fix memory leak on error path
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Artem Bityutskiy
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
commit 812eb258311f89bcd664a34a620f249d54a2cd83 upstream.
UBIFS leaks memory on error path in 'ubifs_jnl_update()' in case of write
failure because it forgets to free the 'struct ubifs_dent_node *dent' object.
Although the object is small, the alignment can make it large - e.g., 2KiB
if the min. I/O unit is 2KiB.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/ubifs/journal.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/ubifs/journal.c
+++ b/fs/ubifs/journal.c
@@ -665,6 +665,7 @@ out_free:
out_release:
release_head(c, BASEHD);
+ kfree(dent);
out_ro:
ubifs_ro_mode(c, err);
if (last_reference)
^ permalink raw reply
* [55/91] nbd: limit module parameters to a sane value
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Namhyung Kim, Laurent Vivier,
Paul Clements, Jens Axboe
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Namhyung Kim <namhyung@gmail.com>
commit 3b2710824e00d238554c13b5add347e6c701ab1a upstream.
The 'max_part' parameter controls the number of maximum partition
a nbd device can have. However if a user specifies very large
value it would exceed the limitation of device minor number and
can cause a kernel oops (or, at least, produce invalid device
nodes in some cases).
In addition, specifying large 'nbds_max' value causes same
problem for the same reason.
On my desktop, following command results to the kernel bug:
$ sudo modprobe nbd max_part=100000
kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/nbd4/range
CPU 1
Modules linked in: nbd(+) bridge stp llc kvm_intel kvm asus_atk0110 sg sr_mod cdrom
Pid: 2522, comm: modprobe Tainted: G W 2.6.39-leonard+ #159 System manufacturer System Product Name/P5G41TD-M PRO
RIP: 0010:[<ffffffff8115aa08>] [<ffffffff8115aa08>] internal_create_group+0x2f/0x166
RSP: 0018:ffff8801009f1de8 EFLAGS: 00010246
RAX: 00000000ffffffef RBX: ffff880103920478 RCX: 00000000000a7bd3
RDX: ffffffff81a2dbe0 RSI: 0000000000000000 RDI: ffff880103920478
RBP: ffff8801009f1e38 R08: ffff880103920468 R09: ffff880103920478
R10: ffff8801009f1de8 R11: ffff88011eccbb68 R12: ffffffff81a2dbe0
R13: ffff880103920468 R14: 0000000000000000 R15: ffff880103920400
FS: 00007f3c49de9700(0000) GS:ffff88011f800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f3b7fe7c000 CR3: 00000000cd58d000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 2522, threadinfo ffff8801009f0000, task ffff8801009a93a0)
Stack:
ffff8801009f1e58 ffffffff812e8f6e ffff8801009f1e58 ffffffff812e7a80
ffff880000000010 ffff880103920400 ffff8801002fd0c0 ffff880103920468
0000000000000011 ffff880103920400 ffff8801009f1e48 ffffffff8115ab6a
Call Trace:
[<ffffffff812e8f6e>] ? device_add+0x4f1/0x5e4
[<ffffffff812e7a80>] ? dev_set_name+0x41/0x43
[<ffffffff8115ab6a>] sysfs_create_group+0x13/0x15
[<ffffffff810b857e>] blk_trace_init_sysfs+0x14/0x16
[<ffffffff811ee58b>] blk_register_queue+0x4c/0xfd
[<ffffffff811f3bdf>] add_disk+0xe4/0x29c
[<ffffffffa007e2ab>] nbd_init+0x2ab/0x30d [nbd]
[<ffffffffa007e000>] ? 0xffffffffa007dfff
[<ffffffff8100020f>] do_one_initcall+0x7f/0x13e
[<ffffffff8107ab0a>] sys_init_module+0xa1/0x1e3
[<ffffffff814f3542>] system_call_fastpath+0x16/0x1b
Code: 41 57 41 56 41 55 41 54 53 48 83 ec 28 0f 1f 44 00 00 48 89 fb 41 89 f6 49 89 d4 48 85 ff 74 0b 85 f6 75 0b 48 83
7f 30 00 75 14 <0f> 0b eb fe b9 ea ff ff ff 48 83 7f 30 00 0f 84 09 01 00 00 49
RIP [<ffffffff8115aa08>] internal_create_group+0x2f/0x166
RSP <ffff8801009f1de8>
---[ end trace 753285ffbf72c57c ]---
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/block/nbd.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -754,6 +754,12 @@ static int __init nbd_init(void)
if (max_part > 0)
part_shift = fls(max_part);
+ if ((1UL << part_shift) > DISK_MAX_PARTS)
+ return -EINVAL;
+
+ if (nbds_max > 1UL << (MINORBITS - part_shift))
+ return -EINVAL;
+
for (i = 0; i < nbds_max; i++) {
struct gendisk *disk = alloc_disk(1 << part_shift);
if (!disk)
^ permalink raw reply
* Linux kernel thread model
From: manish honap @ 2011-06-16 7:06 UTC (permalink / raw)
To: kernelnewbies
In-Reply-To: <BANLkTi=RNpDBZ8_kC5y5gk33cLyXHB=hAg@mail.gmail.com>
----- Original Message ----
From: Mulyadi Santosa <mulyadi.santosa@gmail.com>
To: manish honap <manish_honap_vit@yahoo.co.in>
Cc: kernelnewbies at kernelnewbies.org
Sent: Thu, 16 June, 2011 10:45:36 AM
Subject: Re: Linux kernel thread model
On Thu, Jun 16, 2011 at 11:39, manish honap
<manish_honap_vit@yahoo.co.in> wrote:
> Hi all
>
> Can someone please tell me what is the threading model of linux kernel ?
> user space thread:kernel thread process - n:1 or m:n or 1:1
1:1, that is 1 kernel process represent 1 user space thread.... more
about it, google for NPTL paper written by Ulrich Drepper and Ingo
Molnar.
How they understand whether kernel part is scheduled or user part is scheduled ?
^ permalink raw reply
* [53/91] UBIFS: fix shrinker object count reports
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Artem Bityutskiy
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
commit cf610bf4199770420629d3bc273494bd27ad6c1d upstream.
Sometimes VM asks the shrinker to return amount of objects it can shrink,
and we return the ubifs_clean_zn_cnt in that case. However, it is possible
that this counter is negative for a short period of time, due to the way
UBIFS TNC code updates it. And I can observe the following warnings sometimes:
shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788
This patch makes sure UBIFS never returns negative count of objects.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/ubifs/shrinker.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/fs/ubifs/shrinker.c
+++ b/fs/ubifs/shrinker.c
@@ -283,7 +283,11 @@ int ubifs_shrinker(int nr, gfp_t gfp_mas
long clean_zn_cnt = atomic_long_read(&ubifs_clean_zn_cnt);
if (nr == 0)
- return clean_zn_cnt;
+ /*
+ * Due to the way UBIFS updates the clean znode counter it may
+ * temporarily be negative.
+ */
+ return clean_zn_cnt >= 0 ? clean_zn_cnt : 1;
if (!clean_zn_cnt) {
/*
^ permalink raw reply
* [40/91] i8k: Avoid lahf in 64-bit code
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Luca Tettamanti,
Massimo Dal Zotto, Jean Delvare
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Luca Tettamanti <kronos.it@gmail.com>
commit bc1f419c76a2d6450413ce4349f4e4a07be011d5 upstream.
i8k uses lahf to read the flag register in 64-bit code; early x86-64
CPUs, however, lack this instruction and we get an invalid opcode
exception at runtime.
Use pushf to load the flag register into the stack instead.
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Reported-by: Jeff Rickman <jrickman@myamigos.us>
Tested-by: Jeff Rickman <jrickman@myamigos.us>
Tested-by: Harry G McGavran Jr <w5pny@arrl.net>
Cc: Massimo Dal Zotto <dz@debian.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/char/i8k.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/char/i8k.c
+++ b/drivers/char/i8k.c
@@ -138,8 +138,8 @@ static int i8k_smm(struct smm_regs *regs
"movl %%edi,20(%%rax)\n\t"
"popq %%rdx\n\t"
"movl %%edx,0(%%rax)\n\t"
- "lahf\n\t"
- "shrl $8,%%eax\n\t"
+ "pushfq\n\t"
+ "popq %%rax\n\t"
"andl $1,%%eax\n"
:"=a"(rc)
: "a"(regs)
^ permalink raw reply
* [37/91] p54usb: add zoom 4410 usbid
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Christian Lamparter,
John W. Linville
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Christian Lamparter <chunkeey@googlemail.com>
commit 9368a9a2378ab721f82f59430a135b4ce4ff5109 upstream.
Reported-by: Mark Davis <marked86@gmail.com>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/net/wireless/p54/p54usb.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/net/wireless/p54/p54usb.c
+++ b/drivers/net/wireless/p54/p54usb.c
@@ -80,6 +80,7 @@ static struct usb_device_id p54u_table[]
{USB_DEVICE(0x06b9, 0x0121)}, /* Thomson SpeedTouch 121g */
{USB_DEVICE(0x0707, 0xee13)}, /* SMC 2862W-G version 2 */
{USB_DEVICE(0x083a, 0x4521)}, /* Siemens Gigaset USB Adapter 54 version 2 */
+ {USB_DEVICE(0x083a, 0xc501)}, /* Zoom Wireless-G 4410 */
{USB_DEVICE(0x083a, 0xf503)}, /* Accton FD7050E ver 1010ec */
{USB_DEVICE(0x0846, 0x4240)}, /* Netgear WG111 (v2) */
{USB_DEVICE(0x0915, 0x2000)}, /* Cohiba Proto board */
^ permalink raw reply
* [35/91] xhci: Fix full speed bInterval encoding.
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Sarah Sharp, Dmitry Torokhov
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Sarah Sharp <sarah.a.sharp@linux.intel.com>
commit b513d44751bfb609a3c20463f764c8ce822d63e9 upstream.
Dmitry's patch
dfa49c4ad120a784ef1ff0717168aa79f55a483a USB: xhci - fix math in xhci_get_endpoint_interval()
introduced a bug. The USB 2.0 spec says that full speed isochronous endpoints'
bInterval must be decoded as an exponent to a power of two (e.g. interval =
2^(bInterval - 1)). Full speed interrupt endpoints, on the other hand, don't
use exponents, and the interval in frames is encoded straight into bInterval.
Dmitry's patch was supposed to fix up the full speed isochronous to parse
bInterval as an exponent, but instead it changed the *interrupt* endpoint
bInterval decoding. The isochronous endpoint encoding was the same.
This caused full speed devices with interrupt endpoints (including mice, hubs,
and USB to ethernet devices) to fail under NEC 0.96 xHCI host controllers:
[ 100.909818] xhci_hcd 0000:06:00.0: add ep 0x83, slot id 1, new drop flags = 0x0, new add flags = 0x99, new slot info = 0x38100000
[ 100.909821] xhci_hcd 0000:06:00.0: xhci_check_bandwidth called for udev ffff88011f0ea000
...
[ 100.910187] xhci_hcd 0000:06:00.0: ERROR: unexpected command completion code 0x11.
[ 100.910190] xhci_hcd 0000:06:00.0: xhci_reset_bandwidth called for udev ffff88011f0ea000
When the interrupt endpoint was added and a Configure Endpoint command was
issued to the host, the host controller would return a very odd error message
(0x11 means "Slot Not Enabled", which isn't true because the slot was enabled).
Probably the host controller was getting very confused with the bad encoding.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Tested-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/usb/host/xhci-mem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -511,12 +511,12 @@ static inline unsigned int xhci_get_endp
break;
case USB_SPEED_FULL:
- if (usb_endpoint_xfer_int(&ep->desc)) {
+ if (usb_endpoint_xfer_isoc(&ep->desc)) {
interval = xhci_parse_exponent_interval(udev, ep);
break;
}
/*
- * Fall through for isochronous endpoint interval decoding
+ * Fall through for interrupt endpoint interval decoding
* since it uses the same rules as low speed interrupt
* endpoints.
*/
^ permalink raw reply
* [29/91] USB: CP210x Add 4 Device IDs for AC-Services Devices
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: stable-review, torvalds, akpm, alan, Craig Shelley
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Craig Shelley <craig@microtron.org.uk>
commit 4eff0b40a7174896b860312910e0db51f2dcc567 upstream.
This patch adds 4 device IDs for CP2102 based devices manufactured by
AC-Services. See http://www.ac-services.eu for further info.
Signed-off-by: Craig Shelley <craig@microtron.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/usb/serial/cp210x.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -114,6 +114,10 @@ static struct usb_device_id id_table []
{ USB_DEVICE(0x10C4, 0x8418) }, /* IRZ Automation Teleport SG-10 GSM/GPRS Modem */
{ USB_DEVICE(0x10C4, 0x846E) }, /* BEI USB Sensor Interface (VCP) */
{ USB_DEVICE(0x10C4, 0x8477) }, /* Balluff RFID */
+ { USB_DEVICE(0x10C4, 0x85EA) }, /* AC-Services IBUS-IF */
+ { USB_DEVICE(0x10C4, 0x85EB) }, /* AC-Services CIS-IBUS */
+ { USB_DEVICE(0x10C4, 0x8664) }, /* AC-Services CAN-IF */
+ { USB_DEVICE(0x10C4, 0x8665) }, /* AC-Services OBD-IF */
{ USB_DEVICE(0x10C4, 0xEA60) }, /* Silicon Labs factory default */
{ USB_DEVICE(0x10C4, 0xEA61) }, /* Silicon Labs factory default */
{ USB_DEVICE(0x10C4, 0xEA71) }, /* Infinity GPS-MIC-1 Radio Monophone */
^ permalink raw reply
* [31/91] USB: serial: ftdi_sio: adding support for TavIR STK500
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan,
Benedek László
In-Reply-To: <20110616001900.GA25375@kroah.com>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1285 bytes --]
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: =?UTF-8?q?Benedek=20L=C3=A1szl=C3=B3?= <benedekl@gmail.com>
commit 37909fe588c9e09ab57cd267e98678a17ceda64a upstream.
Adding support for the TavIR STK500 (id 0403:FA33)
Atmel AVR programmer device based on FTDI FT232RL.
Signed-off-by: Benedek László <benedekl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 5 +++++
2 files changed, 6 insertions(+)
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -570,6 +570,7 @@ static struct usb_device_id id_table_com
{ USB_DEVICE(FTDI_VID, FTDI_IBS_APP70_PID) },
{ USB_DEVICE(FTDI_VID, FTDI_IBS_PEDO_PID) },
{ USB_DEVICE(FTDI_VID, FTDI_IBS_PROD_PID) },
+ { USB_DEVICE(FTDI_VID, FTDI_TAVIR_STK500_PID) },
/*
* ELV devices:
*/
--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -491,6 +491,11 @@
/* www.canusb.com Lawicel CANUSB device (FTDI_VID) */
#define FTDI_CANUSB_PID 0xFFA8 /* Product Id */
+/*
+ * TavIR AVR product ids (FTDI_VID)
+ */
+#define FTDI_TAVIR_STK500_PID 0xFA33 /* STK500 AVR programmer */
+
/********************************/
^ permalink raw reply
* [30/91] USB: moto_modem: Add USB identifier for the Motorola VE240.
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Elizabeth Jennifer Myers
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Elizabeth Jennifer Myers <elizabeth@sporksirc.net>
commit 3938a0b32dc12229e76735679b37095bc2bc1578 upstream.
Tested on my phone, the ttyUSB device is created and is fully
functional.
Signed-off-by: Elizabeth Jennifer Myers <elizabeth@sporksirc.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/usb/serial/moto_modem.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/usb/serial/moto_modem.c
+++ b/drivers/usb/serial/moto_modem.c
@@ -25,6 +25,7 @@ static struct usb_device_id id_table []
{ USB_DEVICE(0x05c6, 0x3197) }, /* unknown Motorola phone */
{ USB_DEVICE(0x0c44, 0x0022) }, /* unknown Mororola phone */
{ USB_DEVICE(0x22b8, 0x2a64) }, /* Motorola KRZR K1m */
+ { USB_DEVICE(0x22b8, 0x2c84) }, /* Motorola VE240 phone */
{ USB_DEVICE(0x22b8, 0x2c64) }, /* Motorola V950 phone */
{ },
};
^ permalink raw reply
* [27/91] loop: limit max_part module param to DISK_MAX_PARTS
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Namhyung Kim, Laurent Vivier,
Jens Axboe
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Namhyung Kim <namhyung@gmail.com>
commit 78f4bb367fd147a0e7e3998ba6e47109999d8814 upstream.
The 'max_part' parameter controls the number of maximum partition
a loop block device can have. However if a user specifies very
large value it would exceed the limitation of device minor number
and can cause a kernel panic (or, at least, produce invalid
device nodes in some cases).
On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:
$ sudo modprobe loop max_part0000
------------[ cut here ]------------
kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
invalid opcode: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in: loop(+)
Pid: 43, comm: insmod Tainted: G W 2.6.39-qemu+ #155 Bochs Bochs
RIP: 0010:[<ffffffff8113ce61>] [<ffffffff8113ce61>] internal_create_group=
+0x2a/0x170
RSP: 0018:ffff880007b3fde8 EFLAGS: 00000246
RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4
RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878
RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000
R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50
R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800
FS: 0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:00000000000000=
00
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c=
0)
Stack:
ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b
0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868
0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8
Call Trace:
[<ffffffff811e66dd>] ? device_add+0x4bc/0x5af
[<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e
[<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12
[<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16
[<ffffffff8116a090>] blk_register_queue+0x47/0xf7
[<ffffffff8116f527>] add_disk+0xdf/0x290
[<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop]
[<ffffffffa0006000>] ? 0xffffffffa0005fff
[<ffffffff8100020a>] do_one_initcall+0x7a/0x12e
[<ffffffff81096804>] sys_init_module+0x9c/0x1e0
[<ffffffff813329bb>] system_call_fastpath+0x16/0x1b
Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb=
48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe =
48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49
RIP [<ffffffff8113ce61>] internal_create_group+0x2a/0x170
RSP <ffff880007b3fde8>
---[ end trace a123eb592043acad ]---
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/block/loop.c | 3 +++
1 file changed, 3 insertions(+)
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1605,6 +1605,9 @@ static int __init loop_init(void)
if (max_part > 0)
part_shift = fls(max_part);
+ if ((1UL << part_shift) > DISK_MAX_PARTS)
+ return -EINVAL;
+
if (max_loop > 1UL << (MINORBITS - part_shift))
return -EINVAL;
^ permalink raw reply
* [20/91] Fix for buffer overflow in ldm_frag_add not sufficient
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: stable-review, torvalds, akpm, alan, Timo Warns
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Timo Warns <Warns@pre-sense.de>
commit cae13fe4cc3f24820ffb990c09110626837e85d4 upstream.
As Ben Hutchings discovered [1], the patch for CVE-2011-1017 (buffer
overflow in ldm_frag_add) is not sufficient. The original patch in
commit c340b1d64000 ("fs/partitions/ldm.c: fix oops caused by corrupted
partition table") does not consider that, for subsequent fragments,
previously allocated memory is used.
[1] http://lkml.org/lkml/2011/5/6/407
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Timo Warns <warns@pre-sense.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/partitions/ldm.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/fs/partitions/ldm.c
+++ b/fs/partitions/ldm.c
@@ -1335,6 +1335,11 @@ static bool ldm_frag_add (const u8 *data
list_add_tail (&f->list, frags);
found:
+ if (rec >= f->num) {
+ ldm_error("REC value (%d) exceeds NUM value (%d)", rec, f->num);
+ return false;
+ }
+
if (f->map & (1 << rec)) {
ldm_error ("Duplicate VBLK, part %d.", rec);
f->map &= 0x7F; /* Mark the group as broken */
^ permalink raw reply
* [18/91] rcu: Fix unpaired rcu_irq_enter() from locking selftests
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Frederic Weisbecker,
Paul E. McKenney, Ingo Molnar, Peter Zijlstra
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Frederic Weisbecker <fweisbec@gmail.com>
commit ba9f207c9f82115aba4ce04b22e0081af0ae300f upstream.
HARDIRQ_ENTER() maps to irq_enter() which calls rcu_irq_enter().
But HARDIRQ_EXIT() maps to __irq_exit() which doesn't call
rcu_irq_exit().
So for every locking selftest that simulates hardirq disabled,
we create an imbalance in the rcu extended quiescent state
internal state.
As a result, after the first missing rcu_irq_exit(), subsequent
irqs won't exit dyntick-idle mode after leaving the interrupt
handler. This means that RCU won't see the affected CPU as being
in an extended quiescent state, resulting in long grace-period
delays (as in grace periods extending for hours).
To fix this, just use __irq_enter() to simulate the hardirq
context. This is sufficient for the locking selftests as we
don't need to exit any extended quiescent state or perform
any check that irqs normally do when they wake up from idle.
As a side effect, this patch makes it possible to restore
"rcu: Decrease memory-barrier usage based on semi-formal proof",
which eventually helped finding this bug.
Reported-and-tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
lib/locking-selftest.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -144,7 +144,7 @@ static void init_shared_classes(void)
#define HARDIRQ_ENTER() \
local_irq_disable(); \
- irq_enter(); \
+ __irq_enter(); \
WARN_ON(!in_irq());
#define HARDIRQ_EXIT() \
^ permalink raw reply
* [17/91] x86, amd: Use _safe() msr access for GartTlbWlk disable code
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Joerg Roedel,
Rafael J. Wysocki, Maciej Rutecki, Avi Kivity, Ingo Molnar
In-Reply-To: <20110616001900.GA25375@kroah.com>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1673 bytes --]
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: "Roedel, Joerg" <Joerg.Roedel@amd.com>
commit d47cc0db8fd6011de2248df505fc34990b7451bf upstream.
The workaround for Bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=33012
introduced a read and a write to the MC4 mask msr.
Unfortunatly this MSR is not emulated by the KVM hypervisor
so that the kernel will get a #GP and crashes when applying
this workaround when running inside KVM.
This issue was reported as:
https://bugzilla.kernel.org/show_bug.cgi?id=35132
and is fixed with this patch. The change just let the kernel
ignore any #GP it gets while accessing this MSR by using the
_safe msr access methods.
Reported-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
arch/x86/kernel/cpu/amd.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -587,10 +587,13 @@ static void __cpuinit init_amd(struct cp
* Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=33012
*/
u64 mask;
+ int err;
- rdmsrl(MSR_AMD64_MCx_MASK(4), mask);
- mask |= (1 << 10);
- wrmsrl(MSR_AMD64_MCx_MASK(4), mask);
+ err = rdmsrl_safe(MSR_AMD64_MCx_MASK(4), &mask);
+ if (err == 0) {
+ mask |= (1 << 10);
+ checking_wrmsrl(MSR_AMD64_MCx_MASK(4), mask);
+ }
}
}
^ permalink raw reply
* [21/91] seqlock: Dont smp_rmb in seqlock reader spin loop
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Nick Piggin, Eric Dumazet, torvalds, Milton Miller, Andi Kleen,
Thomas Gleixner, Anton Blanchard, akpm, Paul McKenney,
linuxppc-dev, stable-review, alan
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Milton Miller <miltonm@bga.com>
commit 5db1256a5131d3b133946fa02ac9770a784e6eb2 upstream.
Move the smp_rmb after cpu_relax loop in read_seqlock and add
ACCESS_ONCE to make sure the test and return are consistent.
A multi-threaded core in the lab didn't like the update
from 2.6.35 to 2.6.36, to the point it would hang during
boot when multiple threads were active. Bisection showed
af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 (clockevents:
Remove the per cpu tick skew) as the culprit and it is
supported with stack traces showing xtime_lock waits including
tick_do_update_jiffies64 and/or update_vsyscall.
Experimentation showed the combination of cpu_relax and smp_rmb
was significantly slowing the progress of other threads sharing
the core, and this patch is effective in avoiding the hang.
A theory is the rmb is affecting the whole core while the
cpu_relax is causing a resource rebalance flush, together they
cause an interfernce cadance that is unbroken when the seqlock
reader has interrupts disabled.
At first I was confused why the refactor in
3c22cd5709e8143444a6d08682a87f4c57902df3 (kernel: optimise
seqlock) didn't affect this patch application, but after some
study that affected seqcount not seqlock. The new seqcount was
not factored back into the seqlock. I defer that the future.
While the removal of the timer interrupt offset created
contention for the xtime lock while a cpu does the
additonal work to update the system clock, the seqlock
implementation with the tight rmb spin loop goes back much
further, and is just waiting for the right trigger.
Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Link: http://lkml.kernel.org/r/%3Cseqlock-rmb%40mdm.bga.com%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
include/linux/seqlock.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -88,12 +88,12 @@ static __always_inline unsigned read_seq
unsigned ret;
repeat:
- ret = sl->sequence;
- smp_rmb();
+ ret = ACCESS_ONCE(sl->sequence);
if (unlikely(ret & 1)) {
cpu_relax();
goto repeat;
}
+ smp_rmb();
return ret;
}
^ permalink raw reply
* [16/91] x86, amd: Do not enable ARAT feature on AMD processors below
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Boris Ostrovsky,
Hans Rosenfeld, Andreas Herrmann, Chuck Ebbert, H. Peter Anvin
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
family 0x12
From: Boris Ostrovsky <ostr@amd64.org>
commit e9cdd343a5e42c43bcda01e609fa23089e026470 upstream.
Commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 added support for
ARAT (Always Running APIC timer) on AMD processors that are not
affected by erratum 400. This erratum is present on certain processor
families and prevents APIC timer from waking up the CPU when it
is in a deep C state, including C1E state.
Determining whether a processor is affected by this erratum may
have some corner cases and handling these cases is somewhat
complicated. In the interest of simplicity we won't claim ARAT
support on processor families below 0x12 and will go back to
broadcasting timer when going idle.
Signed-off-by: Boris Ostrovsky <ostr@amd64.org>
Link: http://lkml.kernel.org/r/1306423192-19774-1-git-send-email-ostr@amd64.org
Tested-by: Boris Petkov <borislav.petkov@amd.com>
Cc: Hans Rosenfeld <Hans.Rosenfeld@amd.com>
Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
arch/x86/kernel/cpu/amd.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -567,8 +567,11 @@ static void __cpuinit init_amd(struct cp
}
#endif
- /* As a rule processors have APIC timer running in deep C states */
- if (c->x86 > 0xf && !cpu_has_amd_erratum(amd_erratum_400))
+ /*
+ * Family 0x12 and above processors have APIC timer
+ * running in deep C states.
+ */
+ if (c->x86 > 0x11)
set_cpu_cap(c, X86_FEATURE_ARAT);
/*
^ permalink raw reply
* [11/91] ext3: Fix fs corruption when make_indexed_dir() fails
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: stable-review, torvalds, akpm, alan, Jan Kara
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Jan Kara <jack@suse.cz>
commit 86c4f6d85595cd7da635dc6985d27bfa43b1ae10 upstream.
When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated
block for index tree root, we did not properly mark all changed buffers dirty.
This lead to only some of these buffers being written out and thus effectively
corrupting the directory.
Fix the issue by marking all changed data dirty even in the error failure case.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/ext3/namei.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -1425,10 +1425,19 @@ static int make_indexed_dir(handle_t *ha
frame->at = entries;
frame->bh = bh;
bh = bh2;
+ /*
+ * Mark buffers dirty here so that if do_split() fails we write a
+ * consistent set of buffers to disk.
+ */
+ ext3_journal_dirty_metadata(handle, frame->bh);
+ ext3_journal_dirty_metadata(handle, bh);
de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
- dx_release (frames);
- if (!(de))
+ if (!de) {
+ ext3_mark_inode_dirty(handle, dir);
+ dx_release(frames);
return retval;
+ }
+ dx_release(frames);
return add_dirent_to_buf(handle, dentry, inode, de, bh);
}
^ permalink raw reply
* [10/91] x86, 64-bit: Fix copy_[to/from]_user() checks for the
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Jiri Olsa, Brian Gerst,
Ingo Molnar
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
userspace address limit
From: Jiri Olsa <jolsa@redhat.com>
commit 26afb7c661080ae3f1f13ddf7f0c58c4f931c22b upstream.
As reported in BZ #30352:
https://bugzilla.kernel.org/show_bug.cgi?id=30352
there's a kernel bug related to reading the last allowed page on x86_64.
The _copy_to_user() and _copy_from_user() functions use the following
check for address limit:
if (buf + size >= limit)
fail();
while it should be more permissive:
if (buf + size > limit)
fail();
That's because the size represents the number of bytes being
read/write from/to buf address AND including the buf address.
So the copy function will actually never touch the limit
address even if "buf + size == limit".
Following program fails to use the last page as buffer
due to the wrong limit check:
#include <sys/mman.h>
#include <sys/socket.h>
#include <assert.h>
#define PAGE_SIZE (4096)
#define LAST_PAGE ((void*)(0x7fffffffe000))
int main()
{
int fds[2], err;
void * ptr = mmap(LAST_PAGE, PAGE_SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
assert(ptr == LAST_PAGE);
err = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
assert(err == 0);
err = send(fds[0], ptr, PAGE_SIZE, 0);
perror("send");
assert(err == PAGE_SIZE);
err = recv(fds[1], ptr, PAGE_SIZE, MSG_WAITALL);
perror("recv");
assert(err == PAGE_SIZE);
return 0;
}
The other place checking the addr limit is the access_ok() function,
which is working properly. There's just a misleading comment
for the __range_not_ok() macro - which this patch fixes as well.
The last page of the user-space address range is a guard page and
Brian Gerst observed that the guard page itself due to an erratum on K8 cpus
(#121 Sequential Execution Across Non-Canonical Boundary Causes Processor
Hang).
However, the test code is using the last valid page before the guard page.
The bug is that the last byte before the guard page can't be read
because of the off-by-one error. The guard page is left in place.
This bug would normally not show up because the last page is
part of the process stack and never accessed via syscalls.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1305210630-7136-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
arch/x86/include/asm/uaccess.h | 2 +-
arch/x86/lib/copy_user_64.S | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -42,7 +42,7 @@
* Returns 0 if the range is valid, nonzero otherwise.
*
* This is equivalent to the following test:
- * (u33)addr + (u33)size >= (u33)current->addr_limit.seg (u65 for x86_64)
+ * (u33)addr + (u33)size > (u33)current->addr_limit.seg (u65 for x86_64)
*
* This needs 33-bit (65-bit for x86_64) arithmetic. We have a carry...
*/
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -72,7 +72,7 @@ ENTRY(copy_to_user)
addq %rdx,%rcx
jc bad_to_user
cmpq TI_addr_limit(%rax),%rcx
- jae bad_to_user
+ ja bad_to_user
ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
CFI_ENDPROC
ENDPROC(copy_to_user)
@@ -85,7 +85,7 @@ ENTRY(copy_from_user)
addq %rdx,%rcx
jc bad_from_user
cmpq TI_addr_limit(%rax),%rcx
- jae bad_from_user
+ ja bad_from_user
ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
CFI_ENDPROC
ENDPROC(copy_from_user)
^ permalink raw reply
* [07/91] block: rescan partitions on invalidated devices on -ENOMEDIA
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Tejun Heo, Jens Axboe
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
too
From: Tejun Heo <tj@kernel.org>
commit 02e352287a40bd456eb78df705bf888bc3161d3f upstream.
__blkdev_get() doesn't rescan partitions if disk->fops->open() fails,
which leads to ghost partition devices lingering after medimum removal
is known to both the kernel and userland. The behavior also creates a
subtle inconsistency where O_NONBLOCK open, which doesn't fail even if
there's no medium, clears the ghots partitions, which is exploited to
work around the problem from userland.
Fix it by updating __blkdev_get() to issue partition rescan after
-ENOMEDIA too.
This was reported in the following bz.
https://bugzilla.kernel.org/show_bug.cgi?id=13029
Stable: 2.6.38
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Zeuthen <zeuthen@gmail.com>
Reported-by: Martin Pitt <martin.pitt@ubuntu.com>
Reported-by: Kay Sievers <kay.sievers@vrfy.org>
Tested-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/block_dev.c | 27 ++++++++++++++++++---------
1 file changed, 18 insertions(+), 9 deletions(-)
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1203,6 +1203,7 @@ static int __blkdev_get(struct block_dev
if (!bdev->bd_part)
goto out_clear;
+ ret = 0;
if (disk->fops->open) {
ret = disk->fops->open(bdev, mode);
if (ret == -ERESTARTSYS) {
@@ -1218,9 +1219,18 @@ static int __blkdev_get(struct block_dev
mutex_unlock(&bdev->bd_mutex);
goto restart;
}
- if (ret)
- goto out_clear;
}
+ /*
+ * If the device is invalidated, rescan partition
+ * if open succeeded or failed with -ENOMEDIUM.
+ * The latter is necessary to prevent ghost
+ * partitions on a removed medium.
+ */
+ if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
+ rescan_partitions(disk, bdev);
+ if (ret)
+ goto out_clear;
+
if (!bdev->bd_openers) {
bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);
bdi = blk_get_backing_dev_info(bdev);
@@ -1228,8 +1238,6 @@ static int __blkdev_get(struct block_dev
bdi = &default_backing_dev_info;
bdev->bd_inode->i_data.backing_dev_info = bdi;
}
- if (bdev->bd_invalidated)
- rescan_partitions(disk, bdev);
} else {
struct block_device *whole;
whole = bdget_disk(disk, 0);
@@ -1256,13 +1264,14 @@ static int __blkdev_get(struct block_dev
put_disk(disk);
disk = NULL;
if (bdev->bd_contains == bdev) {
- if (bdev->bd_disk->fops->open) {
+ ret = 0;
+ if (bdev->bd_disk->fops->open)
ret = bdev->bd_disk->fops->open(bdev, mode);
- if (ret)
- goto out_unlock_bdev;
- }
- if (bdev->bd_invalidated)
+ /* the same as first opener case, read comment there */
+ if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
rescan_partitions(bdev->bd_disk, bdev);
+ if (ret)
+ goto out_unlock_bdev;
}
}
bdev->bd_openers++;
^ permalink raw reply
* [04/91] [CPUFREQ] Fix memory leak in cpufreq_stat
From: Greg KH @ 2011-06-16 0:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Steven Finney, Dave Jones
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: steven finney <Steven.Finney@palm.com>
commit 98586ed8b8878e10691203687e89a42fa3355300 upstream.
When a CPU is taken offline in an SMP system, cpufreq_remove_dev()
nulls out the per-cpu policy before cpufreq_stats_free_table() can
make use of it. cpufreq_stats_free_table() then skips the
call to sysfs_remove_group(), leaving about 100 bytes of sysfs-related
memory unclaimed each time a CPU-removal occurs. Break up
cpu_stats_free_table into sysfs and table portions, and
call the sysfs portion early.
Signed-off-by: Steven Finney <steven.finney@palm.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/cpufreq/cpufreq_stats.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -164,17 +164,27 @@ static int freq_table_get_index(struct c
return -1;
}
+/* should be called late in the CPU removal sequence so that the stats
+ * memory is still available in case someone tries to use it.
+ */
static void cpufreq_stats_free_table(unsigned int cpu)
{
struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, cpu);
- struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
- if (policy && policy->cpu == cpu)
- sysfs_remove_group(&policy->kobj, &stats_attr_group);
if (stat) {
kfree(stat->time_in_state);
kfree(stat);
}
per_cpu(cpufreq_stats_table, cpu) = NULL;
+}
+
+/* must be called early in the CPU removal sequence (before
+ * cpufreq_remove_dev) so that policy is still valid.
+ */
+static void cpufreq_stats_free_sysfs(unsigned int cpu)
+{
+ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+ if (policy && policy->cpu == cpu)
+ sysfs_remove_group(&policy->kobj, &stats_attr_group);
if (policy)
cpufreq_cpu_put(policy);
}
@@ -315,6 +325,9 @@ static int __cpuinit cpufreq_stat_cpu_ca
case CPU_ONLINE_FROZEN:
cpufreq_update_policy(cpu);
break;
+ case CPU_DOWN_PREPARE:
+ cpufreq_stats_free_sysfs(cpu);
+ break;
case CPU_DEAD:
case CPU_DEAD_FROZEN:
cpufreq_stats_free_table(cpu);
@@ -323,9 +336,11 @@ static int __cpuinit cpufreq_stat_cpu_ca
return NOTIFY_OK;
}
+/* priority=1 so this will get called before cpufreq_remove_dev */
static struct notifier_block cpufreq_stat_cpu_notifier __refdata =
{
.notifier_call = cpufreq_stat_cpu_callback,
+ .priority = 1,
};
static struct notifier_block notifier_policy_block = {
^ permalink raw reply
* [02/91] kmemleak: Do not return a pointer to an object that kmemleak did not get
From: Greg KH @ 2011-06-16 0:14 UTC (permalink / raw)
To: linux-kernel, stable
Cc: stable-review, torvalds, akpm, alan, Catalin Marinas,
Phil Carmody
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Catalin Marinas <catalin.marinas@arm.com>
commit 52c3ce4ec5601ee383a14f1485f6bac7b278896e upstream.
The kmemleak_seq_next() function tries to get an object (and increment
its use count) before returning it. If it could not get the last object
during list traversal (because it may have been freed), the function
should return NULL rather than a pointer to such object that it did not
get.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
mm/kmemleak.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1354,9 +1354,12 @@ static void *kmemleak_seq_next(struct se
++(*pos);
list_for_each_continue_rcu(n, &object_list) {
- next_obj = list_entry(n, struct kmemleak_object, object_list);
- if (get_object(next_obj))
+ struct kmemleak_object *obj =
+ list_entry(n, struct kmemleak_object, object_list);
+ if (get_object(obj)) {
+ next_obj = obj;
break;
+ }
}
put_object(prev_obj);
^ permalink raw reply
* [PATCH] gcc warnings again
From: Luca Berra @ 2011-06-16 7:05 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1481 bytes --]
hello.
yesterday i tried rebuilding both mdadm 3.1.5 and 3.2.1 with gcc 4.6,
with the following CXFLAGS
x86: -O2 -g -frecord-gcc-switches -Wstrict-aliasing=2 -pipe -Wformat
-Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector
--param=ssp-buffer-size=4 -fomit-frame-pointer -mtune=generic
-march=i586 -fasynchronous-unwind-tables
x86_64: -O2 -g -frecord-gcc-switches -Wstrict-aliasing=2 -pipe -Wformat
-Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector
--param=ssp-buffer-size=4 -fPIC
i found a good number of warnings
unused but set variable
strict aliasing
comparison between signed and unsigned values *on 32bit*
for the unused variables i found fedora already had a patch which is
sensible enough, i did not see it reported here, so i will attach it.
I know -Wstrict-aliasing=2 can give false positive but those looked real
to me, so i fixed those.
looking at the gpt code in util.c i found i did not like it at all, a
gpt partition entry is currently 128 bytes, but the spec does not say it
is a fixed value, so the code that reads into a buffer with 512bytes
chunk expecting this to be a multiplier of part_size is imho incorrect.
my fix was to read each partition entry directly into a struct
GPT_part_entry, the advantage is that the code is very simple to read,
the disadvantage it is 128 reads of 128 bytes each, which is
sub-optimal, but i believe readahead will mitigate this a lot.
regards,
L.
--
Luca Berra -- bluca@comedia.it
[-- Attachment #2: mdadm-3.1.5-unused-param.patch --]
[-- Type: text/plain, Size: 4951 bytes --]
--- mdadm-3.2.1/sysfs.c.param 2011-03-28 11:28:13.599402233 -0400
+++ mdadm-3.2.1/sysfs.c 2011-03-28 11:48:02.593714836 -0400
@@ -418,7 +418,7 @@ int sysfs_set_num(struct mdinfo *sra, st
int sysfs_uevent(struct mdinfo *sra, char *event)
{
char fname[50];
- int n;
+ unsigned int n;
int fd;
sprintf(fname, "/sys/block/%s/uevent",
@@ -428,6 +428,11 @@ int sysfs_uevent(struct mdinfo *sra, cha
return -1;
n = write(fd, event, strlen(event));
close(fd);
+ if (n != strlen(event)) {
+ dprintf(Name ": failed to write '%s' to '%s' (%s)\n",
+ event, fname, strerror(errno));
+ return -1;
+ }
return 0;
}
--- mdadm-3.2.1/mdadm.c.param 2011-03-28 10:38:12.035258787 -0400
+++ mdadm-3.2.1/mdadm.c 2011-03-28 10:39:33.346082070 -0400
@@ -103,7 +103,9 @@ int main(int argc, char *argv[])
char *shortopt = short_options;
int dosyslog = 0;
int rebuild_map = 0;
+#if 0
int auto_update_home = 0;
+#endif
char *subarray = NULL;
char *remove_path = NULL;
char *udev_filename = NULL;
@@ -1325,11 +1327,13 @@ int main(int argc, char *argv[])
cnt++;
acnt++;
}
+#if 0
if (rv2 == 1)
/* found something so even though assembly failed we
* want to avoid auto-updates
*/
auto_update_home = 0;
+#endif
} while (rv2!=2);
/* Incase there are stacked devices, we need to go around again */
} while (acnt);
--- mdadm-3.2.1/mdmon.c.param 2011-03-28 11:29:41.128681560 -0400
+++ mdadm-3.2.1/mdmon.c 2011-03-28 11:30:54.514946394 -0400
@@ -513,6 +513,9 @@ static int mdmon(char *devname, int devn
ignore = dup(0);
#endif
+ if (ignore)
+ ignore++;
+
do_manager(container);
exit(0);
--- mdadm-3.2.1/Grow.c.param 2011-03-28 10:38:12.038259001 -0400
+++ mdadm-3.2.1/Grow.c 2011-03-28 10:45:28.174500010 -0400
@@ -1312,7 +1312,6 @@ int Grow_reshape(char *devname, int fd,
char *subarray = NULL;
int frozen;
- int changed = 0;
char *container = NULL;
char container_buf[20];
int cfd = -1;
@@ -1479,7 +1478,6 @@ int Grow_reshape(char *devname, int fd,
if (!quiet)
fprintf(stderr, Name ": component size of %s has been set to %lluK\n",
devname, size);
- changed = 1;
} else if (array.level != LEVEL_CONTAINER) {
size = get_component_size(fd)/2;
if (size == 0)
--- mdadm-3.2.1/Query.c.param 2011-03-28 10:38:12.040259145 -0400
+++ mdadm-3.2.1/Query.c 2011-03-28 10:41:19.272668999 -0400
@@ -35,7 +35,7 @@ int Query(char *dev)
int fd = open(dev, O_RDONLY);
int vers;
int ioctlerr;
- int superror, superrno;
+ int superror;
struct mdinfo info;
mdu_array_info_t array;
struct supertype *st = NULL;
@@ -84,7 +84,6 @@ int Query(char *dev)
st = guess_super(fd);
if (st) {
superror = st->ss->load_super(st, fd, dev);
- superrno = errno;
} else
superror = -1;
close(fd);
--- mdadm-3.2.1/super1.c.param 2011-03-28 10:38:12.043259360 -0400
+++ mdadm-3.2.1/super1.c 2011-03-28 10:53:14.423905054 -0400
@@ -111,7 +111,6 @@ static unsigned int calc_sb_1_csum(struc
unsigned long long newcsum;
int size = sizeof(*sb) + __le32_to_cpu(sb->max_dev)*2;
unsigned int *isuper = (unsigned int*)sb;
- int i;
/* make sure I can count... */
if (offsetof(struct mdp_superblock_1,data_offset) != 128 ||
@@ -123,7 +122,7 @@ static unsigned int calc_sb_1_csum(struc
disk_csum = sb->sb_csum;
sb->sb_csum = 0;
newcsum = 0;
- for (i=0; size>=4; size -= 4 ) {
+ for (; size>=4; size -= 4 ) {
newcsum += __le32_to_cpu(*isuper);
isuper++;
}
@@ -387,15 +386,11 @@ static void examine_super1(struct supert
printf(" Array State : ");
for (d=0; d<__le32_to_cpu(sb->raid_disks) + delta_extra; d++) {
int cnt = 0;
- int me = 0;
unsigned int i;
for (i=0; i< __le32_to_cpu(sb->max_dev); i++) {
unsigned int role = __le16_to_cpu(sb->dev_roles[i]);
- if (role == d) {
- if (i == __le32_to_cpu(sb->dev_number))
- me = 1;
+ if (role == d)
cnt++;
- }
}
if (cnt > 1) printf("?");
else if (cnt == 1) printf("A");
--- mdadm-3.2.1/Incremental.c.param 2011-03-28 10:38:12.045259502 -0400
+++ mdadm-3.2.1/Incremental.c 2011-03-28 11:31:41.924347665 -0400
@@ -707,7 +707,7 @@ static int count_active(struct supertype
int cnt = 0;
__u64 max_events = 0;
char *avail = NULL;
- int *best;
+ int *best = NULL;
char *devmap = NULL;
int numdevs = 0;
int devnum;
--- mdadm-3.2.1/super-intel.c.param 2011-03-28 10:38:12.048259718 -0400
+++ mdadm-3.2.1/super-intel.c 2011-03-28 11:33:53.898816208 -0400
@@ -6164,7 +6164,7 @@ static int apply_takeover_update(struct
{
struct imsm_dev *dev = NULL;
struct intel_dev *dv;
- struct imsm_dev *dev_new;
+ struct imsm_dev *dev_new = NULL;
struct imsm_map *map;
struct dl *dm, *du;
int i;
@@ -7008,7 +7008,7 @@ static int imsm_create_metadata_update_f
int update_memory_size = 0;
struct imsm_update_reshape *u = NULL;
struct mdinfo *spares = NULL;
- int i;
+ int i = -1;
int delta_disks = 0;
struct mdinfo *dev;
[-- Attachment #3: mdadm-3.2.1-strictalias.patch --]
[-- Type: text/plain, Size: 2975 bytes --]
Workaround for strict-aliasing warning
Signed-off-by: Luca Berra <bluca@vodka.it>
---
--- mdadm-3.2.1/Grow.c.strictalias 2011-06-15 14:46:48.281409916 +0000
+++ mdadm-3.2.1/Grow.c 2011-06-15 14:46:48.321410099 +0000
@@ -2914,6 +2914,7 @@ int child_monitor(int afd, struct mdinfo
int chunk = sra->array.chunk_size;
struct mdinfo *sd;
unsigned long stripes;
+ int uuid[4];
/* set up the backup-super-block. This requires the
* uuid from the array.
@@ -2941,7 +2942,8 @@ int child_monitor(int afd, struct mdinfo
memset(&bsb, 0, 512);
memcpy(bsb.magic, "md_backup_data-1", 16);
- st->ss->uuid_from_super(st, (int*)&bsb.set_uuid);
+ st->ss->uuid_from_super(st, uuid);
+ memcpy(bsb.set_uuid, uuid, 16);
bsb.mtime = __cpu_to_le64(time(0));
bsb.devstart2 = blocks;
--- mdadm-3.2.1/super0.c.strictalias 2011-03-28 02:31:20.000000000 +0000
+++ mdadm-3.2.1/super0.c 2011-06-15 14:46:48.321410099 +0000
@@ -423,6 +423,7 @@ static int update_super0(struct supertyp
* ignored.
*/
int rv = 0;
+ int uuid[4];
mdp_super_t *sb = st->sb;
if (strcmp(update, "sparc2.2")==0 ) {
/* 2.2 sparc put the events in the wrong place
@@ -561,7 +562,8 @@ static int update_super0(struct supertyp
if (sb->state & (1<<MD_SB_BITMAP_PRESENT)) {
struct bitmap_super_s *bm;
bm = (struct bitmap_super_s*)(sb+1);
- uuid_from_super0(st, (int*)bm->uuid);
+ uuid_from_super0(st, uuid);
+ memcpy(bm->uuid, uuid, 16);
}
} else if (strcmp(update, "no-bitmap") == 0) {
sb->state &= ~(1<<MD_SB_BITMAP_PRESENT);
@@ -987,6 +989,7 @@ static int add_internal_bitmap0(struct s
int chunk = *chunkp;
mdp_super_t *sb = st->sb;
bitmap_super_t *bms = (bitmap_super_t*)(((char*)sb) + MD_SB_BYTES);
+ int uuid[4];
min_chunk = 4096; /* sub-page chunks don't work yet.. */
@@ -1010,7 +1013,8 @@ static int add_internal_bitmap0(struct s
memset(bms, 0, sizeof(*bms));
bms->magic = __cpu_to_le32(BITMAP_MAGIC);
bms->version = __cpu_to_le32(major);
- uuid_from_super0(st, (int*)bms->uuid);
+ uuid_from_super0(st, uuid);
+ memcpy(bms->uuid, uuid, 16);
bms->chunksize = __cpu_to_le32(chunk);
bms->daemon_sleep = __cpu_to_le32(delay);
bms->sync_size = __cpu_to_le64(size);
--- mdadm-3.2.1/super1.c.strictalias 2011-06-15 14:46:48.281409916 +0000
+++ mdadm-3.2.1/super1.c 2011-06-15 14:46:48.321410099 +0000
@@ -1492,6 +1492,7 @@ add_internal_bitmap1(struct supertype *s
int room = 0;
struct mdp_superblock_1 *sb = st->sb;
bitmap_super_t *bms = (bitmap_super_t*)(((char*)sb) + 1024);
+ int uuid[4];
switch(st->minor_version) {
case 0:
@@ -1579,7 +1580,8 @@ add_internal_bitmap1(struct supertype *s
memset(bms, 0, sizeof(*bms));
bms->magic = __cpu_to_le32(BITMAP_MAGIC);
bms->version = __cpu_to_le32(major);
- uuid_from_super1(st, (int*)bms->uuid);
+ uuid_from_super1(st, uuid);
+ memcpy(bms->uuid, uuid, 16);
bms->chunksize = __cpu_to_le32(chunk);
bms->daemon_sleep = __cpu_to_le32(delay);
bms->sync_size = __cpu_to_le64(size);
[-- Attachment #4: mdadm-3.2.1-gpt.patch --]
[-- Type: text/plain, Size: 1975 bytes --]
Workaround for strict-aliasing warning
read() returns a ssize_t, not an unsigned
Rework code to not depend on assumptions about part_entry size
Signed-off-by: Luca Berra <bluca@vodka.it>
--- mdadm-3.2.1/util.c.gpt 2011-03-28 02:31:20.000000000 +0000
+++ mdadm-3.2.1/util.c 2011-06-15 21:14:07.039082716 +0000
@@ -1280,9 +1280,8 @@ int must_be_container(int fd)
static int get_gpt_last_partition_end(int fd, unsigned long long *endofpart)
{
struct GPT gpt;
- unsigned char buf[512];
unsigned char empty_gpt_entry[16]= {0};
- struct GPT_part_entry *part;
+ struct GPT_part_entry part;
unsigned long long curr_part_end;
unsigned all_partitions, entry_size;
unsigned part_nr;
@@ -1290,8 +1289,9 @@ static int get_gpt_last_partition_end(in
*endofpart = 0;
BUILD_BUG_ON(sizeof(gpt) != 512);
- /* read GPT header */
+ /* skip protective MBR */
lseek(fd, 512, SEEK_SET);
+ /* read GPT header */
if (read(fd, &gpt, 512) != 512)
return 0;
@@ -1308,28 +1308,19 @@ static int get_gpt_last_partition_end(in
entry_size > 512)
return -1;
- /* read first GPT partition entries */
- if (read(fd, buf, 512) != 512)
- return 0;
-
- part = (struct GPT_part_entry*)buf;
-
for (part_nr=0; part_nr < all_partitions; part_nr++) {
+ /* read partition entry */
+ if (read(fd, &part, entry_size) != (ssize_t)entry_size)
+ return 0;
+
/* is this valid partition? */
- if (memcmp(part->type_guid, empty_gpt_entry, 16) != 0) {
+ if (memcmp(part.type_guid, empty_gpt_entry, 16) != 0) {
/* check the last lba for the current partition */
- curr_part_end = __le64_to_cpu(part->ending_lba);
+ curr_part_end = __le64_to_cpu(part.ending_lba);
if (curr_part_end > *endofpart)
*endofpart = curr_part_end;
}
- part = (struct GPT_part_entry*)((unsigned char*)part + entry_size);
-
- if ((unsigned char *)part >= buf + 512) {
- if (read(fd, buf, 512) != 512)
- return 0;
- part = (struct GPT_part_entry*)buf;
- }
}
return 1;
}
^ permalink raw reply
* [01/91] ftrace: Only update the function code on write to filter files
From: Greg KH @ 2011-06-16 0:14 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: stable-review, torvalds, akpm, alan, Steven Rostedt
In-Reply-To: <20110616001900.GA25375@kroah.com>
2.6.32-longterm review patch. If anyone has any objections, please let us know.
------------------
From: Steven Rostedt <srostedt@redhat.com>
commit 058e297d34a404caaa5ed277de15698d8dc43000 upstream.
If function tracing is enabled, a read of the filter files will
cause the call to stop_machine to update the function trace sites.
It should only call stop_machine on write.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
kernel/trace/ftrace.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2360,14 +2360,16 @@ ftrace_regex_release(struct inode *inode
ftrace_match_records(parser->buffer, parser->idx, enable);
}
- mutex_lock(&ftrace_lock);
- if (ftrace_start_up && ftrace_enabled)
- ftrace_run_update_code(FTRACE_ENABLE_CALLS);
- mutex_unlock(&ftrace_lock);
-
trace_parser_put(parser);
kfree(iter);
+ if (file->f_mode & FMODE_WRITE) {
+ mutex_lock(&ftrace_lock);
+ if (ftrace_start_up && ftrace_enabled)
+ ftrace_run_update_code(FTRACE_ENABLE_CALLS);
+ mutex_unlock(&ftrace_lock);
+ }
+
mutex_unlock(&ftrace_regex_lock);
return 0;
}
^ permalink raw reply
* Re: [GIT PULL] Re: REGRESSION: Performance regressions from switching anon_vma->lock to mutex
From: Ingo Molnar @ 2011-06-16 7:03 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Zijlstra, Paul McKenney, Tim Chen, Andrew Morton,
Hugh Dickins, KOSAKI Motohiro, Benjamin Herrenschmidt,
David Miller, Martin Schwidefsky, Russell King, Paul Mundt,
Jeff Dike, Richard Weinberger, Tony Luck, KAMEZAWA Hiroyuki,
Mel Gorman, Nick Piggin, Namhyung Kim, ak, shaohua.li, alex.shi,
linux-kernel, linux-mm, Rafael J. Wysocki
In-Reply-To: <35c0ff16-bd58-4b9c-9d9f-d1a4df2ae7b9@email.android.com>
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> Ingo Molnar <mingo@elte.hu> wrote:
> >
> > I have this fix queued up currently:
> >
> > 09223371deac: rcu: Use softirq to address performance regression
>
> I really don't think that is even close to enough.
Yeah.
> It still does all the callbacks in the threads, and according to
> Peter, about half the rcu time in the threads remained..
You are right - things that are a few percent on a 24 core machine
will definitely go exponentially worse on larger boxen. We'll get rid
of the kthreads entirely.
The funny thing about this workload is that context-switches are
really a fastpath here and we are using anonymous IRQ-triggered
softirqs embedded in random task contexts as a workaround for that.
[ I think we'll have to revisit this issue and do it properly:
quiescent state is mostly defined by context-switches here, so we
could do the RCU callbacks from the task that turns a CPU
quiescent, right in the scheduler context-switch path - perhaps
with an option for SCHED_FIFO tasks to *not* do GC.
That could possibly be more cache-efficient than softirq execution,
as we'll process a still-hot pool of callbacks instead of doing
them only once per timer tick. It will also make the RCU GC
behavior HZ independent. ]
In any case the proxy kthread model clearly sucked, no argument about
that.
Thanks,
Ingo
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.