LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 8/8] v3 Update memory-hotplug documentation
From: Dave Hansen @ 2010-07-20 19:23 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: linux-mm, greg, linux-kernel, KAMEZAWA Hiroyuki, linuxppc-dev
In-Reply-To: <4C451F3F.8000207@austin.ibm.com>

On Mon, 2010-07-19 at 22:59 -0500, Nathan Fontenot wrote:
> 
> 
> -Now, XXX is defined as start_address_of_section / section_size.
> +Now, XXX is defined as (start_address_of_section / section_size) of
> the first
> +section conatined in the memory block.
> 
>  For example, assume 1GiB section size. A device for a memory starting
> at
>  0x100000000 is /sys/device/system/memory/memory4
>  (0x100000000 / 1Gib = 4)
>  This device covers address range [0x100000000 ... 0x140000000)
> 
> -Under each section, you can see 4 files.
> +Under each section, you can see 5 files.
> 
> -/sys/devices/system/memory/memoryXXX/phys_index
> +/sys/devices/system/memory/memoryXXX/start_phys_index
> +/sys/devices/system/memory/memoryXXX/end_phys_index 

Just wanted to make sure you didn't forget to update this after KAME's
comments on the first couple of patches.


-- Dave

^ permalink raw reply

* Re: [PATCH 4/8] v3 Allow memory_block to span multiple memory sections
From: Dave Hansen @ 2010-07-20 19:21 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: linux-mm, greg, linux-kernel, KAMEZAWA Hiroyuki, linuxppc-dev
In-Reply-To: <4C451E1C.8070907@austin.ibm.com>

On Mon, 2010-07-19 at 22:55 -0500, Nathan Fontenot wrote:
> +static u32 get_memory_block_size(void)
> +{
> +       u32 block_sz;
> +
> +       block_sz = memory_block_size_bytes();
> +
> +       /* Validate blk_sz is a power of 2 and not less than section size */
> +       if ((block_sz & (block_sz - 1)) || (block_sz < MIN_MEMORY_BLOCK_SIZE))
> +               block_sz = MIN_MEMORY_BLOCK_SIZE;

Is this worth a WARN_ON()?  Seems pretty bogus if someone is returning
funky block sizes.  

-- Dave

^ permalink raw reply

* Re: [PATCH 4/8] v3 Allow memory_block to span multiple memory sections
From: Dave Hansen @ 2010-07-20 19:18 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: linux-mm, greg, linux-kernel, KAMEZAWA Hiroyuki, linuxppc-dev
In-Reply-To: <4C451E1C.8070907@austin.ibm.com>

On Mon, 2010-07-19 at 22:55 -0500, Nathan Fontenot wrote:
> +static int add_memory_section(int nid, struct mem_section *section,
> +                       unsigned long state, enum mem_add_context context)
> +{
> +       struct memory_block *mem;
> +       int ret = 0;
> +
> +       mem = find_memory_block(section);
> +       if (mem) {
> +               atomic_inc(&mem->section_count);
> +               kobject_put(&mem->sysdev.kobj);
> +       } else
> +               ret = init_memory_block(&mem, section, state);
> +
>         if (!ret) {
> -               if (context == HOTPLUG)
> +               if (context == HOTPLUG &&
> +                   atomic_read(&mem->section_count) == sections_per_block)
>                         ret = register_mem_sect_under_node(mem, nid);
>         } 

I think the atomic_inc() can race with the atomic_dec_and_test() in
remove_memory_block().

Thread 1 does:

	mem = find_memory_block(section);

Thread 2 does 

	atomic_dec_and_test(&mem->section_count);

and destroys the memory block,  Thread 1 runs again:
	
       if (mem) {
               atomic_inc(&mem->section_count);
               kobject_put(&mem->sysdev.kobj);
       } else

but now mem got destroyed by Thread 2.  You probably need to change
find_memory_block() to itself take a reference, and to use
atomic_inc_unless().

-- Dave

^ permalink raw reply

* Re: [PATCH 2/8] v3 Add new phys_index properties
From: Dave Hansen @ 2010-07-20 19:10 UTC (permalink / raw)
  To: Nathan Fontenot
  Cc: linux-mm, greg, linux-kernel, KAMEZAWA Hiroyuki, linuxppc-dev
In-Reply-To: <4C45A3AB.6090407@austin.ibm.com>

On Tue, 2010-07-20 at 08:24 -0500, Nathan Fontenot wrote:
> Update the 'phys_index' properties of a memory block to include a
> 'start_phys_index' which is the same as the current 'phys_index' property.
> This also adds an 'end_phys_index' property to indicate the id of the
> last section in th memory block.
> 
> Patch updated to keep the name of the phys_index property instead of
> renaming it to start_phys_index.

KAME is right on this.  We should keep the old one if at all possible.  

The only other thing we might want to do is move 'phys_index' to
'start_phys_index', and make a new 'phys_index' that does a WARN_ONCE(),
gives a deprecated warning, then calls the new 'start_phys_index' code.

So, basically make the new, more clear name, but keep the old one for a
while and deprecate it.  Maybe we could get away with removing it in ten
years. :)

-- Dave

^ permalink raw reply

* RT Kernel crash in SMP mode on Marvell board
From: Manikandan Ramachandran @ 2010-07-20 18:15 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2758 bytes --]

Hello All,

   I'm trying to bring up a Marvell board [db644xx] with dual CPU[7447A]
with kernel 2.6.33.5-rt23[Ingo's RT patch]. For conveniance I have
registered IPI interrupt with IRQF_NODELAY flag, all other interrupts are
threaded. I'm able to boot up the system fine, however after few hours I see
random kernel crash in Marvell's PIC module. I understand that the RT kernel
doesn't like the way spinlock is used, I tried to convert to raw_spin_lock,
but the system would get frozen. I can make the system stable by just not
using any locks, but thats not an ideal solution. Can someone please suggest
a better locking mechanism? Crash dump I have given below:

NIP: 00000000 LR: 70030258 CTR: 00000000
REGS: deeebcf0 TRAP: 0400   Tainted: P            (2.6.33.5-rt23)
MSR: 40001032 <ME,IR,DR>  CR: 24004024  XER: 20000004
TASK = df915240[788] 'irq/74-ide0' THREAD: deeea000 CPU: 1
GPR00: 70030390 deeebda0 df915240 72e17ac0 703683f8 00000001 00000000
645d28e8
GPR08: 00000000 00000000 00000000 00000005 46f0b20f 00000000 00fe4600
007fff00
GPR16: 04000000 003d4ba2 007ffeb0 00800000 00ff690c 03000040 00f7eba2
deeebf6c
GPR24: 00000001 00000000 00000000 deeea000 72e17ac0 00000000 703683f8
703683f8
NIP [00000000] (null)
LR [70030258] enqueue_task+0x3c/0x58
Call Trace:
[deeebda0] [0000001e] 0x1e (unreliable)
[deeebdb0] [70030390] activate_task+0x40/0x60
[deeebdc0] [70036850] try_to_wake_up+0x248/0x334
[deeebe00] [70069380] wakeup_next_waiter+0x148/0x14c
[deeebe20] [7029b1d4] rt_spin_lock_slowunlock+0x60/0x84
[deeebe30] [7001a168] marvell_disable_IPI_irq+0xfc/0x110
[deeebe50] [70073934] handle_level_irq+0x40/0x15c
[deeebe70] [70006418] do_IRQ+0xc8/0xf4
[deeebe90] [7001276c] ret_from_except+0x0/0x14
--- Exception: 501 at schedule+0x30/0x50
    LR = schedule+0x24/0x50
[deeebf60] [700717bc] irq_thread+0x98/0x230
[deeebfa0] [7005686c] kthread+0x78/0x7c
[deeebff0] [70011ef4] kernel_thread+0x4c/0x68
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Kernel panic - not syncing: Fatal exception in interrupt
Call Trace:
[deeebc20
---------------------------------------------------------------------------------------------------------------------

IRQ CONFIGs:

CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
# CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
# CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set
CONFIG_IRQ_PER_CPU=y
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y

-- 
Thanks,
Manik

Think twice about a tree before you take a printout

[-- Attachment #2: Type: text/html, Size: 3111 bytes --]

^ permalink raw reply

* Re: [lm-sensors] [PATCH] hwmon: (tmp421) Add nfactor support (2nd attempt)
From: Andre Prendel @ 2010-07-20 15:09 UTC (permalink / raw)
  To: Jeff Angielski; +Cc: linuxppc-dev, lm-sensors
In-Reply-To: <20100520193556.GB2153@andre-laptop>

On Thu, May 20, 2010 at 09:35:56PM +0200, Andre Prendel wrote:
> On Thu, May 20, 2010 at 03:07:05PM -0400, Jeff Angielski wrote:
> > In any event, here it is again:
> 
> Acked-by: Andre Prendel <andre.prendel@gmx.de>

Hi Jeff,

I'de suggest to resend the patch with my acked-by to the lm-sensors list and
Andrew Morton. It looks like Jean is too busy at the moment.

Regards,
Andre
 
> > 
> > >From 9acd29ff48c64e58a7f5cdb888c86e737c56281c Mon Sep 17 00:00:00 2001
> > From: Jeff Angielski <jeff@theptrgroup.com>
> > Date: Mon, 10 May 2010 10:26:34 -0400
> > Subject: [PATCH] hwmon: (tmp421) Add nfactor support
> > 
> > Add support for reading and writing the n-factor correction
> > registers.  This is needed to compensate for the characteristics
> > of a particular sensor hanging off of the remote channels.
> > 
> > Signed-off-by: Jeff Angielski <jeff@theptrgroup.com>
> > ---
> >  Documentation/hwmon/tmp421 |   19 +++++++++++++++++++
> >  drivers/hwmon/tmp421.c     |   41 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 60 insertions(+), 0 deletions(-)
> > 
> > diff --git a/Documentation/hwmon/tmp421 b/Documentation/hwmon/tmp421
> > index 0cf07f8..668228a 100644
> > --- a/Documentation/hwmon/tmp421
> > +++ b/Documentation/hwmon/tmp421
> > @@ -17,6 +17,7 @@ Supported chips:
> >  
> >  Authors:
> >  	Andre Prendel <andre.prendel@gmx.de>
> > +	Jeff Angielski <jeff@theptrgroup.com>
> >  
> >  Description
> >  -----------
> > @@ -34,3 +35,21 @@ the temperature values via the following sysfs files:
> >  
> >  temp[1-4]_input
> >  temp[2-4]_fault
> > +
> > +The chips allow the user to adjust the n-factor value that is used
> > +when converting the remote channel measurements to temperature. The
> > +adjustment has a range of -128 to +127 that yields an effective
> > +n-factor range of 0.706542 to 1.747977.  The power on reset value
> > +for the adjustment is 0 which results in an n-factor of 1.008.
> > +
> > +The effective n-factor is calculated according to the following
> > +equation:
> > +
> > +n_factor = (1.008 * 300) / (300 - nfactor_adjust)
> > +
> > +The driver exports the n-factor adjustment value via the following 
> > +sysfs files:
> > +
> > +temp[2-4]_n_adjust
> > +
> > +
> > diff --git a/drivers/hwmon/tmp421.c b/drivers/hwmon/tmp421.c
> > index 738c472..dfd62be 100644
> > --- a/drivers/hwmon/tmp421.c
> > +++ b/drivers/hwmon/tmp421.c
> > @@ -49,6 +49,7 @@ enum chips { tmp421, tmp422, tmp423 };
> >  
> >  static const u8 TMP421_TEMP_MSB[4]		= { 0x00, 0x01, 0x02, 0x03 };
> >  static const u8 TMP421_TEMP_LSB[4]		= { 0x10, 0x11, 0x12, 0x13 };
> > +static const u8 TMP421_N_CORRECT[3]		= { 0x21, 0x22, 0x23 };
> >  
> >  /* Flags */
> >  #define TMP421_CONFIG_SHUTDOWN			0x40
> > @@ -76,6 +77,7 @@ struct tmp421_data {
> >  	int channels;
> >  	u8 config;
> >  	s16 temp[4];
> > +	s8 n_adjust[3];
> >  };
> >  
> >  static int temp_from_s16(s16 reg)
> > @@ -115,6 +117,10 @@ static struct tmp421_data *tmp421_update_device(struct device *dev)
> >  			data->temp[i] |= i2c_smbus_read_byte_data(client,
> >  				TMP421_TEMP_LSB[i]);
> >  		}
> > +		for (i = 1; i < data->channels; i++) {
> > +			data->n_adjust[i - 1] = i2c_smbus_read_byte_data(client,
> > +				TMP421_N_CORRECT[i - 1]);
> > +		}
> >  		data->last_updated = jiffies;
> >  		data->valid = 1;
> >  	}
> > @@ -157,6 +163,32 @@ static ssize_t show_fault(struct device *dev,
> >  		return sprintf(buf, "0\n");
> >  }
> >  
> > +static ssize_t show_n_adjust(struct device *dev,
> > +			     struct device_attribute *devattr, char *buf)
> > +{
> > +	int index = to_sensor_dev_attr(devattr)->index;
> > +	struct tmp421_data *data = tmp421_update_device(dev);
> > +
> > +	return sprintf(buf, "%d\n", data->n_adjust[index - 1]);
> > +}
> > +
> > +static ssize_t set_n_adjust(struct device *dev,
> > +			    struct device_attribute *devattr,
> > +			    const char *buf, size_t count)
> > +{
> > +	struct i2c_client *client = to_i2c_client(dev);
> > +	struct tmp421_data *data = i2c_get_clientdata(client);
> > +	int index = to_sensor_dev_attr(devattr)->index;
> > +	int n_adjust = simple_strtol(buf, NULL, 10);
> > +
> > +	mutex_lock(&data->update_lock);
> > +	i2c_smbus_write_byte_data(client, TMP421_N_CORRECT[index - 1],
> > +				  SENSORS_LIMIT(n_adjust, -128, 127));
> > +	mutex_unlock(&data->update_lock);
> > +
> > +	return count;
> > +}
> > +
> >  static mode_t tmp421_is_visible(struct kobject *kobj, struct attribute *a,
> >  				int n)
> >  {
> > @@ -177,19 +209,28 @@ static mode_t tmp421_is_visible(struct kobject *kobj, struct attribute *a,
> >  static SENSOR_DEVICE_ATTR(temp1_input, S_IRUGO, show_temp_value, NULL, 0);
> >  static SENSOR_DEVICE_ATTR(temp2_input, S_IRUGO, show_temp_value, NULL, 1);
> >  static SENSOR_DEVICE_ATTR(temp2_fault, S_IRUGO, show_fault, NULL, 1);
> > +static SENSOR_DEVICE_ATTR(temp2_n_adjust, S_IRUSR | S_IWUSR | S_IRGRP,
> > +			  show_n_adjust, set_n_adjust, 1);
> >  static SENSOR_DEVICE_ATTR(temp3_input, S_IRUGO, show_temp_value, NULL, 2);
> >  static SENSOR_DEVICE_ATTR(temp3_fault, S_IRUGO, show_fault, NULL, 2);
> > +static SENSOR_DEVICE_ATTR(temp3_n_adjust, S_IRUSR | S_IWUSR | S_IRGRP,
> > +			  show_n_adjust, set_n_adjust, 2);
> >  static SENSOR_DEVICE_ATTR(temp4_input, S_IRUGO, show_temp_value, NULL, 3);
> >  static SENSOR_DEVICE_ATTR(temp4_fault, S_IRUGO, show_fault, NULL, 3);
> > +static SENSOR_DEVICE_ATTR(temp4_n_adjust, S_IRUSR | S_IWUSR | S_IRGRP,
> > +			  show_n_adjust, set_n_adjust, 3);
> >  
> >  static struct attribute *tmp421_attr[] = {
> >  	&sensor_dev_attr_temp1_input.dev_attr.attr,
> >  	&sensor_dev_attr_temp2_input.dev_attr.attr,
> >  	&sensor_dev_attr_temp2_fault.dev_attr.attr,
> > +	&sensor_dev_attr_temp2_n_adjust.dev_attr.attr,
> >  	&sensor_dev_attr_temp3_input.dev_attr.attr,
> >  	&sensor_dev_attr_temp3_fault.dev_attr.attr,
> > +	&sensor_dev_attr_temp3_n_adjust.dev_attr.attr,
> >  	&sensor_dev_attr_temp4_input.dev_attr.attr,
> >  	&sensor_dev_attr_temp4_fault.dev_attr.attr,
> > +	&sensor_dev_attr_temp4_n_adjust.dev_attr.attr,
> >  	NULL
> >  };
> >  
> > -- 
> > 1.7.0.4
> > 
> > 
> > -- 
> > Jeff Angielski
> > The PTR Group
> > www.theptrgroup.com
> > 
> > _______________________________________________
> > lm-sensors mailing list
> > lm-sensors@lm-sensors.org
> > http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
> 
> _______________________________________________
> lm-sensors mailing list
> lm-sensors@lm-sensors.org
> http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply

* Re: [PATCH] math-emu: correct test for downshifting fraction in _FP_FROM_INT()
From: Mikael Pettersson @ 2010-07-20 13:35 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: linux-s390, linux-sh, linux-kernel, linuxppc-dev, linux-alpha,
	sparclinux
In-Reply-To: <19524.51858.992299.119315@pilspetsen.it.uu.se>

Mikael Pettersson writes:
 > The kernel's math-emu code contains a macro _FP_FROM_INT() which is
 > used to convert an integer to a raw normalized floating-point value.
 > It does this basically in three steps:
 > 
 > 1. Compute the exponent from the number of leading zero bits.
 > 2. Downshift large fractions to put the MSB in the right position
 >    for normalized fractions.
 > 3. Upshift small fractions to put the MSB in the right position.
 > 
 > There is an boundary error in step 2, causing a fraction with its
 > MSB exactly one bit above the normalized MSB position to not be
 > downshifted.  This results in a non-normalized raw float, which when
 > packed becomes a massively inaccurate representation for that input.
 > 
 > The impact of this depends on a number of arch-specific factors,
 > but it is known to have broken emulation of FXTOD instructions
 > on UltraSPARC III, which was originally reported as GCC bug 44631
 > <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44631>.
 > 
 > Any arch which uses math-emu to emulate conversions from integers to
 > same-size floats may be affected.
 > 
 > The fix is simple: the exponent comparison used to determine if the
 > fraction should be downshifted must be "<=" not "<".
 > 
 > I'm sending a kernel module to test this as a reply to this message.
 > There are also SPARC user-space test cases in the GCC bug entry.
 > 
 > Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>

I forgot to mention that this needs to be backported to older kernels,
so can the maintainer who picks this up please add

	Cc: stable@kernel.org

Thanks,

/Mikael

 > ---
 >  include/math-emu/op-common.h |    2 +-
 >  1 file changed, 1 insertion(+), 1 deletion(-)
 > 
 > diff -rupN linux-2.6.35-rc5/include/math-emu/op-common.h linux-2.6.35-rc5.mathemu-FP_FROM_INT-fraction-downshift-condition/include/math-emu/op-common.h
 > --- linux-2.6.35-rc5/include/math-emu/op-common.h	2010-05-17 19:51:32.000000000 +0200
 > +++ linux-2.6.35-rc5.mathemu-FP_FROM_INT-fraction-downshift-condition/include/math-emu/op-common.h	2010-07-18 22:33:46.000000000 +0200
 > @@ -799,7 +799,7 @@ do {									\
 >  		X##_e -= (_FP_W_TYPE_SIZE - rsize);			\
 >  	X##_e = rsize - X##_e - 1;					\
 >  									\
 > -	if (_FP_FRACBITS_##fs < rsize && _FP_WFRACBITS_##fs < X##_e)	\
 > +	if (_FP_FRACBITS_##fs < rsize && _FP_WFRACBITS_##fs <= X##_e)	\
 >  	  __FP_FRAC_SRS_1(ur_, (X##_e - _FP_WFRACBITS_##fs + 1), rsize);\
 >  	_FP_FRAC_DISASSEMBLE_##wc(X, ur_, rsize);			\
 >  	if ((_FP_WFRACBITS_##fs - X##_e - 1) > 0)			\
 > --
 > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 > the body of a message to majordomo@vger.kernel.org
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 > Please read the FAQ at  http://www.tux.org/lkml/
 > 

^ permalink raw reply

* Re: [PATCH 4/8] v3 Allow memory_block to span multiple memory sections
From: Nathan Fontenot @ 2010-07-20 13:28 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: greg, KAMEZAWA Hiroyuki
In-Reply-To: <4C451E1C.8070907@austin.ibm.com>

Update the memory sysfs code that each sysfs memory directory is now
considered a memory block that can contain multiple memory sections per
memory block.  The default size of each memory block is SECTION_SIZE_BITS
to maintain the current behavior of having a single memory section per
memory block (i.e. one sysfs directory per memory section).

For architectures that want to have memory blocks span multiple
memory sections they need only define their own memory_block_size_bytes()
routine.

Patch refreshed to apply cleanly with previous two patch updates.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
 drivers/base/memory.c |  141 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 98 insertions(+), 43 deletions(-)

Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c	2010-07-20 06:43:29.000000000 -0500
+++ linux-2.6/drivers/base/memory.c	2010-07-20 06:44:30.000000000 -0500
@@ -28,6 +28,14 @@
 #include <asm/uaccess.h>
 
 #define MEMORY_CLASS_NAME	"memory"
+#define MIN_MEMORY_BLOCK_SIZE	(1 << SECTION_SIZE_BITS)
+
+static int sections_per_block;
+
+static inline int base_memory_block_id(int section_nr)
+{
+	return (section_nr / sections_per_block) * sections_per_block;
+}
 
 static struct sysdev_class memory_sysdev_class = {
 	.name = MEMORY_CLASS_NAME,
@@ -82,22 +90,21 @@ EXPORT_SYMBOL(unregister_memory_isolate_
  * register_memory - Setup a sysfs device for a memory block
  */
 static
-int register_memory(struct memory_block *memory, struct mem_section *section)
+int register_memory(struct memory_block *memory)
 {
 	int error;
 
 	memory->sysdev.cls = &memory_sysdev_class;
-	memory->sysdev.id = __section_nr(section);
+	memory->sysdev.id = memory->start_phys_index;
 
 	error = sysdev_register(&memory->sysdev);
 	return error;
 }
 
 static void
-unregister_memory(struct memory_block *memory, struct mem_section *section)
+unregister_memory(struct memory_block *memory)
 {
 	BUG_ON(memory->sysdev.cls != &memory_sysdev_class);
-	BUG_ON(memory->sysdev.id != __section_nr(section));
 
 	/* drop the ref. we got in remove_memory_block() */
 	kobject_put(&memory->sysdev.kobj);
@@ -131,13 +138,16 @@ static ssize_t show_mem_end_phys_index(s
 static ssize_t show_mem_removable(struct sys_device *dev,
 			struct sysdev_attribute *attr, char *buf)
 {
-	unsigned long start_pfn;
-	int ret;
+	unsigned long i, pfn;
+	int ret = 1;
 	struct memory_block *mem =
 		container_of(dev, struct memory_block, sysdev);
 
-	start_pfn = section_nr_to_pfn(mem->start_phys_index);
-	ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
+	for (i = mem->start_phys_index; i <= mem->end_phys_index; i++) {
+		pfn = section_nr_to_pfn(i);
+		ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
+	}
+
 	return sprintf(buf, "%d\n", ret);
 }
 
@@ -190,17 +200,14 @@ int memory_isolate_notify(unsigned long
  * OK to have direct references to sparsemem variables in here.
  */
 static int
-memory_block_action(struct memory_block *mem, unsigned long action)
+memory_section_action(unsigned long phys_index, unsigned long action)
 {
 	int i;
-	unsigned long psection;
 	unsigned long start_pfn, start_paddr;
 	struct page *first_page;
 	int ret;
-	int old_state = mem->state;
 
-	psection = mem->start_phys_index;
-	first_page = pfn_to_page(psection << PFN_SECTION_SHIFT);
+	first_page = pfn_to_page(phys_index << PFN_SECTION_SHIFT);
 
 	/*
 	 * The probe routines leave the pages reserved, just
@@ -213,8 +220,8 @@ memory_block_action(struct memory_block
 				continue;
 
 			printk(KERN_WARNING "section number %ld page number %d "
-				"not reserved, was it already online? \n",
-				psection, i);
+				"not reserved, was it already online?\n",
+				phys_index, i);
 			return -EBUSY;
 		}
 	}
@@ -225,18 +232,13 @@ memory_block_action(struct memory_block
 			ret = online_pages(start_pfn, PAGES_PER_SECTION);
 			break;
 		case MEM_OFFLINE:
-			mem->state = MEM_GOING_OFFLINE;
 			start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
 			ret = remove_memory(start_paddr,
 					    PAGES_PER_SECTION << PAGE_SHIFT);
-			if (ret) {
-				mem->state = old_state;
-				break;
-			}
 			break;
 		default:
-			WARN(1, KERN_WARNING "%s(%p, %ld) unknown action: %ld\n",
-					__func__, mem, action, action);
+			WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
+			     "%ld\n", __func__, phys_index, action, action);
 			ret = -EINVAL;
 	}
 
@@ -246,7 +248,7 @@ memory_block_action(struct memory_block
 static int memory_block_change_state(struct memory_block *mem,
 		unsigned long to_state, unsigned long from_state_req)
 {
-	int ret = 0;
+	int i, ret = 0;
 	mutex_lock(&mem->state_mutex);
 
 	if (mem->state != from_state_req) {
@@ -254,8 +256,21 @@ static int memory_block_change_state(str
 		goto out;
 	}
 
-	ret = memory_block_action(mem, to_state);
-	if (!ret)
+	if (to_state == MEM_OFFLINE)
+		mem->state = MEM_GOING_OFFLINE;
+
+	for (i = mem->start_phys_index; i <= mem->end_phys_index; i++) {
+		ret = memory_section_action(i, to_state);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		for (i = mem->start_phys_index; i <= mem->end_phys_index; i++)
+			memory_section_action(i, from_state_req);
+
+		mem->state = from_state_req;
+	} else
 		mem->state = to_state;
 
 out:
@@ -268,20 +283,15 @@ store_mem_state(struct sys_device *dev,
 		struct sysdev_attribute *attr, const char *buf, size_t count)
 {
 	struct memory_block *mem;
-	unsigned int phys_section_nr;
 	int ret = -EINVAL;
 
 	mem = container_of(dev, struct memory_block, sysdev);
-	phys_section_nr = mem->start_phys_index;
-
-	if (!present_section_nr(phys_section_nr))
-		goto out;
 
 	if (!strncmp(buf, "online", min((int)count, 6)))
 		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
 	else if(!strncmp(buf, "offline", min((int)count, 7)))
 		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
-out:
+
 	if (ret)
 		return ret;
 	return count;
@@ -458,12 +468,13 @@ struct memory_block *find_memory_block(s
 	struct sys_device *sysdev;
 	struct memory_block *mem;
 	char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
+	int block_id = base_memory_block_id(__section_nr(section));
 
 	/*
 	 * This only works because we know that section == sysdev->id
 	 * slightly redundant with sysdev_register()
 	 */
-	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
+	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, block_id);
 
 	kobj = kset_find_obj(&memory_sysdev_class.kset, name);
 	if (!kobj)
@@ -475,24 +486,26 @@ struct memory_block *find_memory_block(s
 	return mem;
 }
 
-static int add_memory_block(int nid, struct mem_section *section,
-			unsigned long state, enum mem_add_context context)
+static int init_memory_block(struct memory_block **memory,
+			     struct mem_section *section, unsigned long state)
 {
-	struct memory_block *mem = kzalloc(sizeof(*mem), GFP_KERNEL);
+	struct memory_block *mem;
 	unsigned long start_pfn;
 	int ret = 0;
 
+	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
 
-	mem->start_phys_index = __section_nr(section);
+	mem->start_phys_index = base_memory_block_id(__section_nr(section));
+	mem->end_phys_index = mem->start_phys_index + sections_per_block - 1;
 	mem->state = state;
 	atomic_inc(&mem->section_count);
 	mutex_init(&mem->state_mutex);
 	start_pfn = section_nr_to_pfn(mem->start_phys_index);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
 
-	ret = register_memory(mem, section);
+	ret = register_memory(mem);
 	if (!ret)
 		ret = mem_create_simple_file(mem, phys_index);
 	if (!ret)
@@ -503,8 +516,27 @@ static int add_memory_block(int nid, str
 		ret = mem_create_simple_file(mem, phys_device);
 	if (!ret)
 		ret = mem_create_simple_file(mem, removable);
+
+	*memory = mem;
+	return ret;
+}
+
+static int add_memory_section(int nid, struct mem_section *section,
+			unsigned long state, enum mem_add_context context)
+{
+	struct memory_block *mem;
+	int ret = 0;
+
+	mem = find_memory_block(section);
+	if (mem) {
+		atomic_inc(&mem->section_count);
+		kobject_put(&mem->sysdev.kobj);
+	} else
+		ret = init_memory_block(&mem, section, state);
+
 	if (!ret) {
-		if (context == HOTPLUG)
+		if (context == HOTPLUG &&
+		    atomic_read(&mem->section_count) == sections_per_block)
 			ret = register_mem_sect_under_node(mem, nid);
 	}
 
@@ -525,8 +557,9 @@ int remove_memory_block(unsigned long no
 		mem_remove_simple_file(mem, state);
 		mem_remove_simple_file(mem, phys_device);
 		mem_remove_simple_file(mem, removable);
-		unregister_memory(mem, section);
-	}
+		unregister_memory(mem);
+	} else
+		kobject_put(&mem->sysdev.kobj);
 
 	return 0;
 }
@@ -537,7 +570,7 @@ int remove_memory_block(unsigned long no
  */
 int register_new_memory(int nid, struct mem_section *section)
 {
-	return add_memory_block(nid, section, MEM_OFFLINE, HOTPLUG);
+	return add_memory_section(nid, section, MEM_OFFLINE, HOTPLUG);
 }
 
 int unregister_memory_section(struct mem_section *section)
@@ -548,6 +581,24 @@ int unregister_memory_section(struct mem
 	return remove_memory_block(0, section, 0);
 }
 
+u32 __weak memory_block_size_bytes(void)
+{
+	return MIN_MEMORY_BLOCK_SIZE;
+}
+
+static u32 get_memory_block_size(void)
+{
+	u32 block_sz;
+
+	block_sz = memory_block_size_bytes();
+
+	/* Validate blk_sz is a power of 2 and not less than section size */
+	if ((block_sz & (block_sz - 1)) || (block_sz < MIN_MEMORY_BLOCK_SIZE))
+		block_sz = MIN_MEMORY_BLOCK_SIZE;
+
+	return block_sz;
+}
+
 /*
  * Initialize the sysfs support for memory devices...
  */
@@ -556,12 +607,16 @@ int __init memory_dev_init(void)
 	unsigned int i;
 	int ret;
 	int err;
+	int block_sz;
 
 	memory_sysdev_class.kset.uevent_ops = &memory_uevent_ops;
 	ret = sysdev_class_register(&memory_sysdev_class);
 	if (ret)
 		goto out;
 
+	block_sz = get_memory_block_size();
+	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
+
 	/*
 	 * Create entries for memory sections that were found
 	 * during boot and have been initialized
@@ -569,8 +624,8 @@ int __init memory_dev_init(void)
 	for (i = 0; i < NR_MEM_SECTIONS; i++) {
 		if (!present_section_nr(i))
 			continue;
-		err = add_memory_block(0, __nr_to_section(i), MEM_ONLINE,
-				       BOOT);
+		err = add_memory_section(0, __nr_to_section(i), MEM_ONLINE,
+					 BOOT);
 		if (!ret)
 			ret = err;
 	}

^ permalink raw reply

* Re: [PATCH 3/8] v3 Add section count to memory_block
From: Nathan Fontenot @ 2010-07-20 13:26 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: greg, KAMEZAWA Hiroyuki
In-Reply-To: <4C451DD6.3080005@austin.ibm.com>

Add a section count property to the memory_block struct to track the number
of memory sections that have been added/removed from a memory block.  Updated
to use atomic_dec_and_test().

Signed-off-by: Nathan Fontenot <nfont@asutin.ibm.com>
---
 drivers/base/memory.c  |   18 +++++++++++-------
 include/linux/memory.h |    2 ++
 2 files changed, 13 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c	2010-07-20 06:38:21.000000000 -0500
+++ linux-2.6/drivers/base/memory.c	2010-07-20 06:43:29.000000000 -0500
@@ -487,6 +487,7 @@ static int add_memory_block(int nid, str
 
 	mem->start_phys_index = __section_nr(section);
 	mem->state = state;
+	atomic_inc(&mem->section_count);
 	mutex_init(&mem->state_mutex);
 	start_pfn = section_nr_to_pfn(mem->start_phys_index);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
@@ -516,13 +517,16 @@ int remove_memory_block(unsigned long no
 	struct memory_block *mem;
 
 	mem = find_memory_block(section);
-	unregister_mem_sect_under_nodes(mem);
-	mem_remove_simple_file(mem, phys_index);
-	mem_remove_simple_file(mem, end_phys_index);
-	mem_remove_simple_file(mem, state);
-	mem_remove_simple_file(mem, phys_device);
-	mem_remove_simple_file(mem, removable);
-	unregister_memory(mem, section);
+
+	if (atomic_dec_and_test(&mem->section_count)) {
+		unregister_mem_sect_under_nodes(mem);
+		mem_remove_simple_file(mem, phys_index);
+		mem_remove_simple_file(mem, end_phys_index);
+		mem_remove_simple_file(mem, state);
+		mem_remove_simple_file(mem, phys_device);
+		mem_remove_simple_file(mem, removable);
+		unregister_memory(mem, section);
+	}
 
 	return 0;
 }
Index: linux-2.6/include/linux/memory.h
===================================================================
--- linux-2.6.orig/include/linux/memory.h	2010-07-20 06:35:38.000000000 -0500
+++ linux-2.6/include/linux/memory.h	2010-07-20 06:38:59.000000000 -0500
@@ -19,11 +19,13 @@
 #include <linux/node.h>
 #include <linux/compiler.h>
 #include <linux/mutex.h>
+#include <asm/atomic.h>
 
 struct memory_block {
 	unsigned long start_phys_index;
 	unsigned long end_phys_index;
 	unsigned long state;
+	atomic_t section_count;
 	/*
 	 * This serializes all state change requests.  It isn't
 	 * held during creation because the control files are

^ permalink raw reply

* Re: [PATCH 2/8] v3 Add new phys_index properties
From: Nathan Fontenot @ 2010-07-20 13:24 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: greg, KAMEZAWA Hiroyuki
In-Reply-To: <4C451D92.6020406@austin.ibm.com>

Update the 'phys_index' properties of a memory block to include a
'start_phys_index' which is the same as the current 'phys_index' property.
This also adds an 'end_phys_index' property to indicate the id of the
last section in th memory block.

Patch updated to keep the name of the phys_index property instead of
renaming it to start_phys_index.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
 drivers/base/memory.c  |   28 ++++++++++++++++++++--------
 include/linux/memory.h |    3 ++-
 2 files changed, 22 insertions(+), 9 deletions(-)

Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c	2010-07-19 20:42:11.000000000 -0500
+++ linux-2.6/drivers/base/memory.c	2010-07-20 06:38:21.000000000 -0500
@@ -109,12 +109,20 @@ unregister_memory(struct memory_block *m
  * uses.
  */
 
-static ssize_t show_mem_phys_index(struct sys_device *dev,
+static ssize_t show_mem_start_phys_index(struct sys_device *dev,
 			struct sysdev_attribute *attr, char *buf)
 {
 	struct memory_block *mem =
 		container_of(dev, struct memory_block, sysdev);
-	return sprintf(buf, "%08lx\n", mem->phys_index);
+	return sprintf(buf, "%08lx\n", mem->start_phys_index);
+}
+
+static ssize_t show_mem_end_phys_index(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf)
+{
+	struct memory_block *mem =
+		container_of(dev, struct memory_block, sysdev);
+	return sprintf(buf, "%08lx\n", mem->end_phys_index);
 }
 
 /*
@@ -128,7 +136,7 @@ static ssize_t show_mem_removable(struct
 	struct memory_block *mem =
 		container_of(dev, struct memory_block, sysdev);
 
-	start_pfn = section_nr_to_pfn(mem->phys_index);
+	start_pfn = section_nr_to_pfn(mem->start_phys_index);
 	ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
 	return sprintf(buf, "%d\n", ret);
 }
@@ -191,7 +199,7 @@ memory_block_action(struct memory_block
 	int ret;
 	int old_state = mem->state;
 
-	psection = mem->phys_index;
+	psection = mem->start_phys_index;
 	first_page = pfn_to_page(psection << PFN_SECTION_SHIFT);
 
 	/*
@@ -264,7 +272,7 @@ store_mem_state(struct sys_device *dev,
 	int ret = -EINVAL;
 
 	mem = container_of(dev, struct memory_block, sysdev);
-	phys_section_nr = mem->phys_index;
+	phys_section_nr = mem->start_phys_index;
 
 	if (!present_section_nr(phys_section_nr))
 		goto out;
@@ -296,7 +304,8 @@ static ssize_t show_phys_device(struct s
 	return sprintf(buf, "%d\n", mem->phys_device);
 }
 
-static SYSDEV_ATTR(phys_index, 0444, show_mem_phys_index, NULL);
+static SYSDEV_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
+static SYSDEV_ATTR(end_phys_index, 0444, show_mem_end_phys_index, NULL);
 static SYSDEV_ATTR(state, 0644, show_mem_state, store_mem_state);
 static SYSDEV_ATTR(phys_device, 0444, show_phys_device, NULL);
 static SYSDEV_ATTR(removable, 0444, show_mem_removable, NULL);
@@ -476,16 +485,18 @@ static int add_memory_block(int nid, str
 	if (!mem)
 		return -ENOMEM;
 
-	mem->phys_index = __section_nr(section);
+	mem->start_phys_index = __section_nr(section);
 	mem->state = state;
 	mutex_init(&mem->state_mutex);
-	start_pfn = section_nr_to_pfn(mem->phys_index);
+	start_pfn = section_nr_to_pfn(mem->start_phys_index);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
 
 	ret = register_memory(mem, section);
 	if (!ret)
 		ret = mem_create_simple_file(mem, phys_index);
 	if (!ret)
+		ret = mem_create_simple_file(mem, end_phys_index);
+	if (!ret)
 		ret = mem_create_simple_file(mem, state);
 	if (!ret)
 		ret = mem_create_simple_file(mem, phys_device);
@@ -507,6 +518,7 @@ int remove_memory_block(unsigned long no
 	mem = find_memory_block(section);
 	unregister_mem_sect_under_nodes(mem);
 	mem_remove_simple_file(mem, phys_index);
+	mem_remove_simple_file(mem, end_phys_index);
 	mem_remove_simple_file(mem, state);
 	mem_remove_simple_file(mem, phys_device);
 	mem_remove_simple_file(mem, removable);
Index: linux-2.6/include/linux/memory.h
===================================================================
--- linux-2.6.orig/include/linux/memory.h	2010-07-19 20:42:11.000000000 -0500
+++ linux-2.6/include/linux/memory.h	2010-07-20 06:35:38.000000000 -0500
@@ -21,7 +21,8 @@
 #include <linux/mutex.h>
 
 struct memory_block {
-	unsigned long phys_index;
+	unsigned long start_phys_index;
+	unsigned long end_phys_index;
 	unsigned long state;
 	/*
 	 * This serializes all state change requests.  It isn't

^ permalink raw reply

* Re: Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
From: divya @ 2010-07-20  9:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: sachinp, netdev, LKML, linuxppc-dev, Jan-Bernd Themann,
	David Miller
In-Reply-To: <1279274185.2549.14.camel@edumazet-laptop>

On Friday 16 July 2010 03:26 PM, Eric Dumazet wrote:
> Le vendredi 16 juillet 2010 à 14:20 +0530, divya a écrit :
>    
>> Hi ,
>>
>> With the latest kernel version 2.6.35-rc5-git1(2f7989efd4398) running on power(p6) box came across the following
>> call trace
>>
>> Call Trace:
>> [c000000006a0e800] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
>> [c000000006a0e8b0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
>> [c000000006a0ea30] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
>> [c000000006a0ead0] [c00000000015b1a0] .new_slab+0xe0/0x314
>> [c000000006a0eb70] [c00000000015b6fc] .__slab_alloc+0x328/0x644
>> [c000000006a0ec50] [c00000000015cc34] .__kmalloc_node_track_caller+0x114/0x194
>> [c000000006a0ed00] [c000000000599f6c] .__alloc_skb+0x94/0x180
>> [c000000006a0edb0] [c00000000059af5c] .__netdev_alloc_skb+0x3c/0x74
>> [c000000006a0ee30] [c0000000004f9480] .ehea_refill_rq_def+0xf8/0x2d0
>> [c000000006a0ef30] [c0000000004fab8c] .ehea_up+0x5b8/0x69c
>> [c000000006a0f040] [c0000000004facd4] .ehea_open+0x64/0x118
>> [c000000006a0f0e0] [c0000000005a6e9c] .__dev_open+0x100/0x168
>> [c000000006a0f170] [c0000000005a3ac0] .__dev_change_flags+0x10c/0x1ac
>> [c000000006a0f210] [c0000000005a6d44] .dev_change_flags+0x24/0x7c
>> [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
>> [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
>> [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
>> [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
>> [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
>> [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
>> [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
>> [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
>> [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
>> [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
>> [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
>> Mem-Info:
>> Node 0 DMA per-cpu:
>> CPU    0: hi:    0, btch:   1 usd:   0
>> CPU    1: hi:    0, btch:   1 usd:   0
>> CPU    2: hi:    0, btch:   1 usd:   0
>> CPU    3: hi:    0, btch:   1 usd:   0
>> active_anon:50 inactive_anon:260 isolated_anon:0
>>    active_file:159 inactive_file:139 isolated_file:0
>>    unevictable:0 dirty:2 writeback:1 unstable:0
>>    free:16 slab_reclaimable:66 slab_unreclaimable:502
>>    mapped:120 shmem:2 pagetables:37 bounce:0
>> Node 0 DMA free:1024kB min:1408kB low:1728kB high:2112kB active_anon:3200kB inactive_anon:16640kB active_file:10176kB inactive_file:8896kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:130944kB mlocked:0kB dirty:128kB writeback:64kB mapped:7680kB shmem:128kB slab_reclaimable:4224kB slab_unreclaimable:32128kB kernel_stack:2528kB pagetables:2368kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>> lowmem_reserve[]: 0 0 0
>> Node 0 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
>> 496 total pagecache pages
>> 178 pages in swap cache
>> Swap cache stats: add 780, delete 602, find 467/551
>> Free swap  = 1027904kB
>> Total swap = 1044160kB
>> 2048 pages RAM
>> 683 pages reserved
>> 582 pages shared
>> 1075 pages non-shared
>> SLUB: Unable to allocate memory on node -1 (gfp=0x20)
>>     cache: kmalloc-16384, object size: 16384, buffer size: 16384, default order: 2, min order: 0
>>     node 0: slabs: 28, objs: 292, free: 0
>> ip: page allocation failure. order:0, mode:0x8020
>> Call Trace:
>> [c000000006a0eb40] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
>> [c000000006a0ebf0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
>> [c000000006a0ed70] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
>> [c000000006a0ee10] [c00000000011fca4] .__get_free_pages+0x18/0x90
>> [c000000006a0ee90] [c0000000004f7058] .ehea_get_stats+0x4c/0x1bc
>> [c000000006a0ef30] [c0000000005a0a04] .dev_get_stats+0x38/0x64
>> [c000000006a0efc0] [c0000000005b456c] .rtnl_fill_ifinfo+0x35c/0x85c
>> [c000000006a0f150] [c0000000005b5920] .rtmsg_ifinfo+0x164/0x204
>> [c000000006a0f210] [c0000000005a6d6c] .dev_change_flags+0x4c/0x7c
>> [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
>> [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
>> [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
>> [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
>> [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
>> [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
>> [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
>> [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
>> [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
>> [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
>> [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
>> Mem-Info:
>> Node 0 DMA per-cpu:
>> CPU    0: hi:    0, btch:   1 usd:   0
>> CPU    1: hi:    0, btch:   1 usd:   0
>> CPU    2: hi:    0, btch:   1 usd:   0
>> CPU    3: hi:    0, btch:   1 usd:   0
>>
>> The mainline 2.6.35-rc5 worked fine.
>>      
> Maybe you were lucky with 2.6.35-rc5
>
> Anyway ehea should not use GFP_ATOMIC in its ehea_get_stats() method,
> called in process context, but GFP_KERNEL.
>
> Another patch is needed for ehea_refill_rq_def() as well.
>
>
>
> [PATCH] ehea: ehea_get_stats() should use GFP_KERNEL
>
> ehea_get_stats() is called in process context and should use GFP_KERNEL
> allocation instead of GFP_ATOMIC.
>
> Clearing stats at beginning of ehea_get_stats() is racy in case of
> concurrent stat readers.
>
> get_stats() can also use netdev net_device_stats, instead of a private
> copy.
>
> Reported-by: divya<dipraksh@linux.vnet.ibm.com>
> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
> ---
>   drivers/net/ehea/ehea.h      |    1 -
>   drivers/net/ehea/ehea_main.c |    6 ++----
>   2 files changed, 2 insertions(+), 5 deletions(-)
>    
Hi,

The call trace mentioned above still appears on upstream kernel and linux-next tree too.
The mentioned patch hasn't still been merged into upstream yet - hence getting call traces for both ehea_get_stats()
and ehea_refill_rq_def() methods.
However w.r.t to linux-next getting call trace only for ehea_refill_rq_def() method.

Thanks
Divya

^ permalink raw reply

* Re: [PPC64/Power7 - 2.6.35-rc5] Bad relocation warnings whileBuilding a CONFIG_RELOCATABLE kernel with CONFIG_ISERIES enabled
From: Alexander Graf @ 2010-07-20  7:37 UTC (permalink / raw)
  To: Milton Miller
  Cc: KVM list, linuxppc-dev, kvm-ppc, linux-kernel List, Subrata Modak
In-Reply-To: <reloc-2010-07-19-3@mdm.bga.com>


On 20.07.2010, at 09:27, Milton Miller wrote:

> On Mon, 19 Jul 2010 about 14:00:56 +0200, Alexander Graf wrote:
>> Milton Miller wrote:
>>> I wrote:
>>>=20
>>> Oh yea, and for book-3s, the code copies from 0x100 to =
__end_interrupts
>>> in arch/powerpc/kernel/exceptions-64s.h down to the real 0, but the =
rest
>>> of the kernel is at some disjointed address.  The interrupt will go =
to
>>> the copy at the real zero.  Any references to code outside that =
region
>>> must be done via a full indrect branch (not a relative one), simiar =
to
>>> the secondary startup (via following the function pointer in a =
descriptor
>>> set in very low memory), or syscall entry and exception vectors via =
paca.
>>>=20
>>=20
>> That would still break on normal PPC boxes, as any address accessed =
in
>> real mode has to be inside the RMA. And the #include for
>> kvm/book3s_rmhandlers.S happens after __end_interrupts. So I'd end up
>> with code that gets executed outside of the RMA after a relocation, =
right?
>>=20
>> Alex
>>=20
>=20
> Weither its outside of the RMA or not, DO_KVM is creating a branch =
outside
> of code copied to lowmem.
>=20
> This is BROKEN.
>=20
> We have a hard limit that we can't extend _end_interrupts past 0x7000, =
and
> a soft limit that we can't exceed 0x6000.  If there is space, we could
> move the real mode handler extensions inside end_interrupts in
> exceptions-64s.S, and store the full address in a .quad so it gets
> relocated properly.  Don't subtract the start, we have designed the =
kernel
> to run with start at a VA that can be used as a EA in real mode.

Moving everything to exceptions-64s.S sounds like the best thing to do. =
All the code in real mode really is there so it stays inside the RMA. I =
don't think we can guarantee that for any code that is not copied, =
right?

> Otherwise we need to mark KVM_BOOK3S_64 depends on (!RELOCATABLE ||
> BROKEN) for 2.6.35 until we get fixes.

Well - it's only broken when really getting relocated. But I agree, the =
current state doesn't cope with Linux's relocation logic.

> I took a read though the book3s code as of 2.6.34.   A few things I =
noticed:
>=20
> (1) The code is using slb large to control the segment size.   It =
should
> be using SLB B field (or just impliment 256M segments only).

I'm not sure I understand this part? We only use 256MB segments for now.

> (2) It appears that the mtspr and mfspr code is using the same storage =
for
> bats 4-7 as 0-3 ... I would have expected a 4 + a few places.

Yes, that one is fixed in more recent versions already.

> (3) Its not clear to me that you clear RI when transitioning to the =
guest
> but its obviously required because you place state in srr0 & srr1.

Uh - do I have to clear RI? I'm not prepared to take an interrupt =
anyways and RI is just a soft flag for Linux's handlers, right?

> (4) I don't understand why __kvmppc_vcpu_run turns on interrupts so =
that
> __kvmppc_vcpu_entry can turn them back off.   Something to do with
> irq trace annotations?

__kvmppc_vcpu_run turns on soft interrupts while __kvmppc_vcpu_entry =
turns them off in MSR. This is so that when enabling interrupts again on =
guest exit, we have the soft enable bit set.


Alex

^ permalink raw reply

* Re: [PATCH] math-emu: correct test for downshifting fraction in _FP_FROM_INT()
From: Martin Schwidefsky @ 2010-07-20  7:34 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: linux-s390, linux-sh, linux-kernel, linuxppc-dev, linux-alpha,
	sparclinux
In-Reply-To: <19524.52658.716540.932975@pilspetsen.it.uu.se>

On Tue, 20 Jul 2010 00:12:02 +0200
Mikael Pettersson <mikpe@it.uu.se> wrote:

> Unfortunately it seems difficult to write a generic module
> which uses math-emu:
> - <math-emu/soft-fp.h> includes <asm/sfp-machine.h>,
>   but only a handful of archs have it
> - <asm/sfp-machine.h> isn't always self-contained and may depend
>   on various $arch-specific declarations being present
> 
> The given test module works on sparc64 and ppc64, where it uses
> the kernel's sfp-machine.h, and on x86 where it uses a stub
> sfp-machine.h supplied by itself.  I tried to cross-compile it
> for alpha, but that failed due to its sfp-machine.h not being
> self-contained.  I didn't try sh or s390.

It would be challange to try this on s390. The math emulation code is
only used for really old 31 bit machines. Starting with the G5 the fpu
can do IEEE754, I would say the math emulation code is irrelevant for
s390 by now.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply

* Re: [PPC64/Power7 - 2.6.35-rc5] Bad relocation warnings whileBuilding a CONFIG_RELOCATABLE kernel with CONFIG_ISERIES enabled
From: Milton Miller @ 2010-07-20  7:27 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Subrata Modak, linuxppc-dev, linux-kernel
In-Reply-To: <4C443E78.6020902@suse.de>

On Mon, 19 Jul 2010 about 14:00:56 +0200, Alexander Graf wrote:
>Milton Miller wrote:
>> I wrote:
>>   
>>> On Mon Jul 19 2010 at about 03:36:51 EST, Alexander Graf wrote:
>>>     
>>>> On 19.07.2010, at 03:11, Benjamin Herrenschmidt wrote:
>>>>
>>>>       
>>>>> On Thu, 2010-07-15 at 17:05 +0530, Subrata Modak wrote:
>>>>>         
>>>>>> commit e62cee42e66dcca83aae02748535f62e0f564a0c solved the problem for
>>>>>> 2.6.34-rc6. However some other bad relocation warnings generated against
>>>>>> 2.6.35-rc5 on Power7/ppc64 below:
>>>>>>
>>>>>> MODPOST 2004 modules^M
>>>>>> WARNING: 2 bad relocations^M
>>>>>> c000000000008590 R_PPC64_ADDR32 .text+0x4000000000008460^M
>>>>>> c000000000008594 R_PPC64_ADDR32 .text+0x4000000000008598^M
>>>>>>           
>>>>> I think this is KVM + CONFIG_RELOCATABLE. Caused by:
>>>>>
>>>>> .global kvmppc_trampoline_lowmem
>>>>> kvmppc_trampoline_lowmem:
>>>>> .long kvmppc_handler_lowmem_trampoline - CONFIG_KERNEL_START
>>>>>
>>>>> .global kvmppc_trampoline_enter
>>>>> kvmppc_trampoline_enter:
>>>>> .long kvmppc_handler_trampoline_enter - CONFIG_KERNEL_START
>>>>>
>>>>> Alex, can you turn these into 64-bit on ppc64 so the relocator
>>>>> can grok them ?
>>>>>         
>>>> If I turn them into 64-bit, will the values be > RMA? In that case
>>>> things would break anyways. How does relocation work on PPC? Are the
>>>> first few megs copied over to low memory? Would I have to mask anything
>>>> in the above code to make sure I use the real values? 
>>>>
>>>> Alex
>>>>
>>>>       
>>> You can still do the subtraction, but you have to allocate 64 bits for
>>> storage.  Relocatable ppc64 kernels work by adjusting PPC64_RELOC_RELATIVE
>>> entries during early boot (reloc in reloc_64.S called from head_64.S).
>>>
>>> The code purposely only supports 64 bit relative addressing.
>>>     
>>
>> Oh yea, and for book-3s, the code copies from 0x100 to __end_interrupts
>> in arch/powerpc/kernel/exceptions-64s.h down to the real 0, but the rest
>> of the kernel is at some disjointed address.  The interrupt will go to
>> the copy at the real zero.  Any references to code outside that region
>> must be done via a full indrect branch (not a relative one), simiar to
>> the secondary startup (via following the function pointer in a descriptor
>> set in very low memory), or syscall entry and exception vectors via paca.
>>   
>
>That would still break on normal PPC boxes, as any address accessed in
>real mode has to be inside the RMA. And the #include for
>kvm/book3s_rmhandlers.S happens after __end_interrupts. So I'd end up
>with code that gets executed outside of the RMA after a relocation, right?
>
>Alex
>

Weither its outside of the RMA or not, DO_KVM is creating a branch outside
of code copied to lowmem.

This is BROKEN.

We have a hard limit that we can't extend _end_interrupts past 0x7000, and
a soft limit that we can't exceed 0x6000.  If there is space, we could
move the real mode handler extensions inside end_interrupts in
exceptions-64s.S, and store the full address in a .quad so it gets
relocated properly.  Don't subtract the start, we have designed the kernel
to run with start at a VA that can be used as a EA in real mode.

Otherwise we need to mark KVM_BOOK3S_64 depends on (!RELOCATABLE ||
BROKEN) for 2.6.35 until we get fixes.

I took a read though the book3s code as of 2.6.34.   A few things I noticed:

(1) The code is using slb large to control the segment size.   It should
be using SLB B field (or just impliment 256M segments only).

(2) It appears that the mtspr and mfspr code is using the same storage for
bats 4-7 as 0-3 ... I would have expected a 4 + a few places.

(3) Its not clear to me that you clear RI when transitioning to the guest
but its obviously required because you place state in srr0 & srr1.

(4) I don't understand why __kvmppc_vcpu_run turns on interrupts so that
__kvmppc_vcpu_entry can turn them back off.   Something to do with
irq trace annotations?

milton

^ permalink raw reply

* Re: [PATCH 6/8] v3 Update the node sysfs code
From: KAMEZAWA Hiroyuki @ 2010-07-20  7:17 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, greg, linux-kernel, linuxppc-dev
In-Reply-To: <4C451EAF.1080505@austin.ibm.com>

On Mon, 19 Jul 2010 22:57:35 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Update the node sysfs code to be aware of the new capability for a memory
> block to contain multiple memory sections.  This requires an additional
> parameter to unregister_mem_sect_under_nodes so that we know which memory
> section of the memory block to unregister.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply

* Re: [PATCH 5/8] v3 Update the find_memory_block declaration
From: KAMEZAWA Hiroyuki @ 2010-07-20  7:16 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, greg, linux-kernel, linuxppc-dev
In-Reply-To: <4C451E60.8080702@austin.ibm.com>

On Mon, 19 Jul 2010 22:56:16 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Update the find_memory_block declaration to to take a struct mem_section *
> so that it matches the definition.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Reviewd-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply

* Re: [PATCH 4/8] v3 Allow memory_block to span multiple memory sections
From: KAMEZAWA Hiroyuki @ 2010-07-20  7:15 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, greg, linux-kernel, linuxppc-dev
In-Reply-To: <4C451E1C.8070907@austin.ibm.com>

On Mon, 19 Jul 2010 22:55:08 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Update the memory sysfs code that each sysfs memory directory is now
> considered a memory block that can contain multiple memory sections per
> memory block.  The default size of each memory block is SECTION_SIZE_BITS
> to maintain the current behavior of having a single memory section per
> memory block (i.e. one sysfs directory per memory section).
> 
> For architectures that want to have memory blocks span multiple
> memory sections they need only define their own memory_block_size_bytes()
> routine.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
> ---
>  drivers/base/memory.c |  141 ++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 98 insertions(+), 43 deletions(-)
> 
> Index: linux-2.6/drivers/base/memory.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/memory.c	2010-07-19 20:44:01.000000000 -0500
> +++ linux-2.6/drivers/base/memory.c	2010-07-19 21:12:22.000000000 -0500
> @@ -28,6 +28,14 @@
>  #include <asm/uaccess.h>
>  
>  #define MEMORY_CLASS_NAME	"memory"
> +#define MIN_MEMORY_BLOCK_SIZE	(1 << SECTION_SIZE_BITS)
> +
> +static int sections_per_block;
> +
> +static inline int base_memory_block_id(int section_nr)
> +{
> +	return (section_nr / sections_per_block) * sections_per_block;
> +}
>  
>  static struct sysdev_class memory_sysdev_class = {
>  	.name = MEMORY_CLASS_NAME,
> @@ -82,22 +90,21 @@ EXPORT_SYMBOL(unregister_memory_isolate_
>   * register_memory - Setup a sysfs device for a memory block
>   */
>  static
> -int register_memory(struct memory_block *memory, struct mem_section *section)
> +int register_memory(struct memory_block *memory)
>  {
>  	int error;
>  
>  	memory->sysdev.cls = &memory_sysdev_class;
> -	memory->sysdev.id = __section_nr(section);
> +	memory->sysdev.id = memory->start_phys_index;

I'm curious that this memory->start_phys_index can't overflow ?
sysdev.id is 32bit.


Thanks,
-Kame

^ permalink raw reply

* Re: [PATCH 3/8] v3 Add section count to memory_block
From: KAMEZAWA Hiroyuki @ 2010-07-20  7:01 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, greg, linux-kernel, linuxppc-dev
In-Reply-To: <4C451DD6.3080005@austin.ibm.com>

On Mon, 19 Jul 2010 22:53:58 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Add a section count property to the memory_block struct to track the number
> of memory sections that have been added/removed from a emory block.
> 
> Signed-off-by: Nathan Fontenot <nfont@asutin.ibm.com>
> ---
>  drivers/base/memory.c  |   19 ++++++++++++-------
>  include/linux/memory.h |    2 ++
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 
> Index: linux-2.6/drivers/base/memory.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/memory.c	2010-07-19 20:43:49.000000000 -0500
> +++ linux-2.6/drivers/base/memory.c	2010-07-19 20:44:01.000000000 -0500
> @@ -487,6 +487,7 @@ static int add_memory_block(int nid, str
>  
>  	mem->start_phys_index = __section_nr(section);
>  	mem->state = state;
> +	atomic_inc(&mem->section_count);
>  	mutex_init(&mem->state_mutex);
>  	start_pfn = section_nr_to_pfn(mem->start_phys_index);
>  	mem->phys_device = arch_get_memory_phys_device(start_pfn);
> @@ -516,13 +517,17 @@ int remove_memory_block(unsigned long no
>  	struct memory_block *mem;
>  
>  	mem = find_memory_block(section);
> -	unregister_mem_sect_under_nodes(mem);
> -	mem_remove_simple_file(mem, start_phys_index);
> -	mem_remove_simple_file(mem, end_phys_index);
> -	mem_remove_simple_file(mem, state);
> -	mem_remove_simple_file(mem, phys_device);
> -	mem_remove_simple_file(mem, removable);
> -	unregister_memory(mem, section);
> +	atomic_dec(&mem->section_count);
> +
> +	if (atomic_read(&mem->section_count) == 0) {

We use atomic_dec_and_test() in usual.

Otherwise, I don't see other problems in other part. Please fix this nitpick.

Regards,
-Kame

^ permalink raw reply

* Re: [PATCH 2/8] v3 Add new phys_index properties
From: KAMEZAWA Hiroyuki @ 2010-07-20  6:57 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, greg, linux-kernel, linuxppc-dev
In-Reply-To: <4C451D92.6020406@austin.ibm.com>

On Mon, 19 Jul 2010 22:52:50 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Update the 'phys_index' properties of a memory block to include a
> 'start_phys_index' which is the same as the current 'phys_index' property.
> This also adds an 'end_phys_index' property to indicate the id of the
> last section in th memory block.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

No, please remain "phys_index" as it is. please don't rename it.
IMHO, just adding end_phys_index is better.
please avoid interface change AFAP.

Do you have a problem if phys_index means start_phys_index ?

Thanks,
-Kame

^ permalink raw reply

* Re: [PATCH 1/8] v3 Move the find_memory_block() routine up
From: KAMEZAWA Hiroyuki @ 2010-07-20  6:55 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linux-mm, greg, linux-kernel, linuxppc-dev
In-Reply-To: <4C451D4E.8040600@austin.ibm.com>

On Mon, 19 Jul 2010 22:51:42 -0500
Nathan Fontenot <nfont@austin.ibm.com> wrote:

> Move the find_me mory_block() routine up to avoid needing a forward
> declaration in subsequent patches.
> 
> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

> ---
>  drivers/base/memory.c |   62 +++++++++++++++++++++++++-------------------------
>  1 file changed, 31 insertions(+), 31 deletions(-)
> 
> Index: linux-2.6/drivers/base/memory.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/memory.c	2010-07-16 12:41:30.000000000 -0500
> +++ linux-2.6/drivers/base/memory.c	2010-07-19 20:42:11.000000000 -0500
> @@ -435,6 +435,37 @@ int __weak arch_get_memory_phys_device(u
>  	return 0;
>  }
>  
> +/*
> + * For now, we have a linear search to go find the appropriate
> + * memory_block corresponding to a particular phys_index. If
> + * this gets to be a real problem, we can always use a radix
> + * tree or something here.
> + *
> + * This could be made generic for all sysdev classes.
> + */
> +struct memory_block *find_memory_block(struct mem_section *section)
> +{
> +	struct kobject *kobj;
> +	struct sys_device *sysdev;
> +	struct memory_block *mem;
> +	char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
> +
> +	/*
> +	 * This only works because we know that section == sysdev->id
> +	 * slightly redundant with sysdev_register()
> +	 */
> +	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
> +
> +	kobj = kset_find_obj(&memory_sysdev_class.kset, name);
> +	if (!kobj)
> +		return NULL;
> +
> +	sysdev = container_of(kobj, struct sys_device, kobj);
> +	mem = container_of(sysdev, struct memory_block, sysdev);
> +
> +	return mem;
> +}
> +
>  static int add_memory_block(int nid, struct mem_section *section,
>  			unsigned long state, enum mem_add_context context)
>  {
> @@ -468,37 +499,6 @@ static int add_memory_block(int nid, str
>  	return ret;
>  }
>  
> -/*
> - * For now, we have a linear search to go find the appropriate
> - * memory_block corresponding to a particular phys_index. If
> - * this gets to be a real problem, we can always use a radix
> - * tree or something here.
> - *
> - * This could be made generic for all sysdev classes.
> - */
> -struct memory_block *find_memory_block(struct mem_section *section)
> -{
> -	struct kobject *kobj;
> -	struct sys_device *sysdev;
> -	struct memory_block *mem;
> -	char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
> -
> -	/*
> -	 * This only works because we know that section == sysdev->id
> -	 * slightly redundant with sysdev_register()
> -	 */
> -	sprintf(&name[0], "%s%d", MEMORY_CLASS_NAME, __section_nr(section));
> -
> -	kobj = kset_find_obj(&memory_sysdev_class.kset, name);
> -	if (!kobj)
> -		return NULL;
> -
> -	sysdev = container_of(kobj, struct sys_device, kobj);
> -	mem = container_of(sysdev, struct memory_block, sysdev);
> -
> -	return mem;
> -}
> -
>  int remove_memory_block(unsigned long node_id, struct mem_section *section,
>  		int phys_device)
>  {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply

* Re: Badness in xics_ipi_dispatch
From: Darren Hart @ 2010-07-20  5:35 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1271814234.13262.36.camel@concordia>

Michael Ellerman <michael <at> ellerman.id.au> writes:

> 
> On Tue, 2010-04-20 at 17:17 -0500, Brian King wrote:
> > In stress testing enabling and disabling of SMT, we are regularly
> > seeing the badness warning below. Looking through the cpu offline
> > path, this is what I see:
> > 
> > 1. stop_cpu: IRQ's get disabled
> > 2. pseries_cpu_disable: set cpu offline (no barriers after this)
> > 3. xics_migrate_irqs_away: Remove ourselves from the GIQ, but still allow
> >     IPIs
> > 4. stop_cpu: IRQ's get enabled again (local_irq_enable)
> > 
> > It looks to me like there is plenty of opportunity between 1 and 2 for
> > an IPI to get queued, resulting in the badness below. Is there something
> > in xics_migrate_irqs_away that should clear any pending IPIs?

Is that not what this does?

	/* Reject any interrupt that was queued to us... */
	xics_set_cpu_priority(0);

	/* Remove ourselves from the global interrupt queue */
	xics_set_cpu_giq(default_distrib_server, 0);

I thought the above would clear any pending (queued) interrupts and disable 
additional interrupts from coming in. Of course the next line allows IPIs again"

	/* Allow IPIs again... */
	xics_set_cpu_priority(DEFAULT_PRIORITY);

Which I confess I really don't get...


> > If there
> > is, maybe the solution is as simple as adding a barrier after marking
> > the cpu offline. Or is the warning bogus and we should just remove it?
> 
> It looks like xics_migrate_irqs_away() doesn't do anything about IPIs,
> at least the comment says "Allow IPIs again". So I don't see what's to
> stop you just taking another IPI after you reenable interrupts in
> stop_cpu(). Maybe xics_ipi_dispatch() should just return if the cpu is
> offline?

We're seeing something possibly related in real-time. Notice how the decrementer 
handler interrupts stop_cpu(). Is the decrementer interrupt delivered as an IPI?

cpu 0x3: Vector: 700 (Program Check) at [c000000084d02d90]
    pc: c000000000068af4: .__might_sleep+0x11c/0x148
    lr: c000000000068af0: .__might_sleep+0x118/0x148
    sp: c000000084d03010
   msr: 8000000000021032
  current = 0xc000000086658240
  paca    = 0xc000000000bb8a80
    pid   = 4045, comm = kstop/3
kernel BUG at kernel/sched.c:10168!
enter ? for help
[c000000084d030b0] c0000000006a2798 .rt_spin_lock+0x4c/0x9c
[c000000084d03140] c0000000000e3c98 .cpuset_cpus_allowed_locked+0x38/0x74
[c000000084d031e0] c000000000070be0 .select_fallback_rq+0x10c/0x1a4
[c000000084d032a0] c00000000007cda8 .try_to_wake_up+0x1b0/0x540
[c000000084d03370] c00000000007d2e8 .wake_up_process+0x34/0x48
[c000000084d03400] c00000000008c5f8 .wakeup_softirqd+0x78/0x9c
[c000000084d03490] c00000000008c8e4 .raise_softirq+0x6c/0xa4
[c000000084d03520] c000000000099c18 .run_local_timers+0x2c/0x4c
[c000000084d035a0] c000000000099c90 .update_process_times+0x58/0x9c
[c000000084d03640] c0000000000c2e70 .tick_sched_timer+0xd0/0x120
[c000000084d036f0] c0000000000b4bec .__run_hrtimer+0x1a0/0x29c
[c000000084d037a0] c0000000000b558c .hrtimer_interrupt+0x21c/0x394
[c000000084d038d0] c0000000000307d8 .timer_interrupt+0x1dc/0x2e4
[c000000084d03970] c000000000003700 decrementer_common+0x100/0x180
--- Exception: 901 (Decrementer) at c00000000000d144 
.raw_local_irq_restore+0x48/0x54
[link register   ] c0000000000e57ec .stop_cpu+0x1c0/0x1ec
[c000000084d03c60] c00000000104a4f0 (unreliable)
[c000000084d03ca0] c0000000000e5780 .stop_cpu+0x154/0x1ec
[c000000084d03d40] c0000000000a8b84 .worker_thread+0x25c/0x338
[c000000084d03e60] c0000000000af8c8 .kthread+0xb8/0xc4
[c000000084d03f90] c000000000034408 .kernel_thread+0x54/0x70

Thanks,

Darren Hart

^ permalink raw reply

* [PATCH 8/8] v3 Update memory-hotplug documentation
From: Nathan Fontenot @ 2010-07-20  3:59 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: greg, KAMEZAWA Hiroyuki
In-Reply-To: <4C451BF5.50304@austin.ibm.com>


Update the memory hotplug documentation to reflect the new behaviors of
memory blocks reflected in sysfs.

Signed-off-by: Nathan Fontent <nfont@austin.ibm.com>
---
 Documentation/memory-hotplug.txt |   40 +++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

Index: linux-2.6/Documentation/memory-hotplug.txt
===================================================================
--- linux-2.6.orig/Documentation/memory-hotplug.txt	2010-06-23 15:06:53.000000000 -0500
+++ linux-2.6/Documentation/memory-hotplug.txt	2010-07-19 21:00:11.000000000 -0500
@@ -126,36 +126,44 @@ config options.
 --------------------------------
 4 sysfs files for memory hotplug
 --------------------------------
-All sections have their device information under /sys/devices/system/memory as
+All sections have their device information in sysfs.  Each section is part of
+a memory block under /sys/devices/system/memory as
 
 /sys/devices/system/memory/memoryXXX
-(XXX is section id.)
+(XXX is the section id.)
 
-Now, XXX is defined as start_address_of_section / section_size.
+Now, XXX is defined as (start_address_of_section / section_size) of the first
+section conatined in the memory block.
 
 For example, assume 1GiB section size. A device for a memory starting at
 0x100000000 is /sys/device/system/memory/memory4
 (0x100000000 / 1Gib = 4)
 This device covers address range [0x100000000 ... 0x140000000)
 
-Under each section, you can see 4 files.
+Under each section, you can see 5 files.
 
-/sys/devices/system/memory/memoryXXX/phys_index
+/sys/devices/system/memory/memoryXXX/start_phys_index
+/sys/devices/system/memory/memoryXXX/end_phys_index
 /sys/devices/system/memory/memoryXXX/phys_device
 /sys/devices/system/memory/memoryXXX/state
 /sys/devices/system/memory/memoryXXX/removable
 
-'phys_index' : read-only and contains section id, same as XXX.
-'state'      : read-write
-               at read:  contains online/offline state of memory.
-               at write: user can specify "online", "offline" command
-'phys_device': read-only: designed to show the name of physical memory device.
-               This is not well implemented now.
-'removable'  : read-only: contains an integer value indicating
-               whether the memory section is removable or not
-               removable.  A value of 1 indicates that the memory
-               section is removable and a value of 0 indicates that
-               it is not removable.
+'start_phys_index' : read-only and contains section id of the first section
+		     in the memory block, same as XXX.
+'end_phys_index'   : read-only and contains section id of the last section
+		     in the memory block.
+'state'            : read-write
+                     at read:  contains online/offline state of memory.
+                     at write: user can specify "online", "offline" command
+                     which will be performed on al sections in the block.
+'phys_device'      : read-only: designed to show the name of physical memory
+                     device.  This is not well implemented now.
+'removable'        : read-only: contains an integer value indicating
+                     whether the memory block is removable or not
+                     removable.  A value of 1 indicates that the memory
+                     block is removable and a value of 0 indicates that
+                     it is not removable. A memory block is removable only if
+                     every section in the block is removable.
 
 NOTE:
   These directories/files appear after physical memory hotplug phase.

^ permalink raw reply

* [PATCH 7/8] v3 Define memory_block_size_bytes() for ppc/pseries
From: Nathan Fontenot @ 2010-07-20  3:59 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: greg, KAMEZAWA Hiroyuki
In-Reply-To: <4C451BF5.50304@austin.ibm.com>

Define a version of memory_block_size_bytes() for powerpc/pseries such that
a memory block spans an entire lmb.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   66 +++++++++++++++++++-----
 1 file changed, 53 insertions(+), 13 deletions(-)

Index: linux-2.6/arch/powerpc/platforms/pseries/hotplug-memory.c
===================================================================
--- linux-2.6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c	2010-07-19 21:10:24.000000000 -0500
+++ linux-2.6/arch/powerpc/platforms/pseries/hotplug-memory.c	2010-07-19 21:13:32.000000000 -0500
@@ -17,6 +17,54 @@
 #include <asm/pSeries_reconfig.h>
 #include <asm/sparsemem.h>
 
+static u32 get_memblock_size(void)
+{
+	struct device_node *np;
+	unsigned int memblock_size = 0;
+
+	np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+	if (np) {
+		const unsigned int *size;
+
+		size = of_get_property(np, "ibm,lmb-size", NULL);
+		memblock_size = size ? *size : 0;
+
+		of_node_put(np);
+	} else {
+		unsigned int memzero_size = 0;
+		const unsigned int *regs;
+
+		np = of_find_node_by_path("/memory@0");
+		if (np) {
+			regs = of_get_property(np, "reg", NULL);
+			memzero_size = regs ? regs[3] : 0;
+			of_node_put(np);
+		}
+
+		if (memzero_size) {
+			/* We now know the size of memory@0, use this to find
+			 * the first memoryblock and get its size.
+			 */
+			char buf[64];
+
+			sprintf(buf, "/memory@%x", memzero_size);
+			np = of_find_node_by_path(buf);
+			if (np) {
+				regs = of_get_property(np, "reg", NULL);
+				memblock_size = regs ? regs[3] : 0;
+				of_node_put(np);
+			}
+		}
+	}
+
+	return memblock_size;
+}
+
+u32 memory_block_size_bytes(void)
+{
+	return get_memblock_size();
+}
+
 static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
 {
 	unsigned long start, start_pfn;
@@ -127,30 +175,22 @@ static int pseries_add_memory(struct dev
 
 static int pseries_drconf_memory(unsigned long *base, unsigned int action)
 {
-	struct device_node *np;
-	const unsigned long *memblock_size;
+	unsigned long memblock_size;
 	int rc;
 
-	np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
-	if (!np)
+	memblock_size = get_memblock_size();
+	if (!memblock_size)
 		return -EINVAL;
 
-	memblock_size = of_get_property(np, "ibm,memblock-size", NULL);
-	if (!memblock_size) {
-		of_node_put(np);
-		return -EINVAL;
-	}
-
 	if (action == PSERIES_DRCONF_MEM_ADD) {
-		rc = memblock_add(*base, *memblock_size);
+		rc = memblock_add(*base, memblock_size);
 		rc = (rc < 0) ? -EINVAL : 0;
 	} else if (action == PSERIES_DRCONF_MEM_REMOVE) {
-		rc = pseries_remove_memblock(*base, *memblock_size);
+		rc = pseries_remove_memblock(*base, memblock_size);
 	} else {
 		rc = -EINVAL;
 	}
 
-	of_node_put(np);
 	return rc;
 }
 

^ permalink raw reply

* [PATCH 6/8] v3 Update the node sysfs code
From: Nathan Fontenot @ 2010-07-20  3:57 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: greg, KAMEZAWA Hiroyuki
In-Reply-To: <4C451BF5.50304@austin.ibm.com>

Update the node sysfs code to be aware of the new capability for a memory
block to contain multiple memory sections.  This requires an additional
parameter to unregister_mem_sect_under_nodes so that we know which memory
section of the memory block to unregister.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
 drivers/base/memory.c |    2 +-
 drivers/base/node.c   |   12 ++++++++----
 include/linux/node.h  |    3 ++-
 3 files changed, 11 insertions(+), 6 deletions(-)

Index: linux-2.6/drivers/base/node.c
===================================================================
--- linux-2.6.orig/drivers/base/node.c	2010-07-19 21:10:25.000000000 -0500
+++ linux-2.6/drivers/base/node.c	2010-07-19 21:13:11.000000000 -0500
@@ -346,8 +346,10 @@ int register_mem_sect_under_node(struct
 		return -EFAULT;
 	if (!node_online(nid))
 		return 0;
-	sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
-	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+
+	sect_start_pfn = section_nr_to_pfn(mem_blk->start_phys_index);
+	sect_end_pfn = section_nr_to_pfn(mem_blk->end_phys_index);
+	sect_end_pfn += PAGES_PER_SECTION - 1;
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int page_nid;
 
@@ -371,7 +373,8 @@ int register_mem_sect_under_node(struct
 }
 
 /* unregister memory section under all nodes that it spans */
-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk)
+int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+				    unsigned long phys_index)
 {
 	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -383,7 +386,8 @@ int unregister_mem_sect_under_nodes(stru
 	if (!unlinked_nodes)
 		return -ENOMEM;
 	nodes_clear(*unlinked_nodes);
-	sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
+
+	sect_start_pfn = section_nr_to_pfn(phys_index);
 	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int nid;
Index: linux-2.6/drivers/base/memory.c
===================================================================
--- linux-2.6.orig/drivers/base/memory.c	2010-07-19 21:12:22.000000000 -0500
+++ linux-2.6/drivers/base/memory.c	2010-07-19 21:13:11.000000000 -0500
@@ -550,9 +550,9 @@ int remove_memory_block(unsigned long no
 
 	mem = find_memory_block(section);
 	atomic_dec(&mem->section_count);
+	unregister_mem_sect_under_nodes(mem, __section_nr(section));
 
 	if (atomic_read(&mem->section_count) == 0) {
-		unregister_mem_sect_under_nodes(mem);
 		mem_remove_simple_file(mem, start_phys_index);
 		mem_remove_simple_file(mem, end_phys_index);
 		mem_remove_simple_file(mem, state);
Index: linux-2.6/include/linux/node.h
===================================================================
--- linux-2.6.orig/include/linux/node.h	2010-07-19 21:10:25.000000000 -0500
+++ linux-2.6/include/linux/node.h	2010-07-19 21:13:11.000000000 -0500
@@ -44,7 +44,8 @@ extern int register_cpu_under_node(unsig
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
 						int nid);
-extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk);
+extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+					   unsigned long phys_index);
 
 #ifdef CONFIG_HUGETLBFS
 extern void register_hugetlbfs_with_node(node_registration_func_t doregister,

^ permalink raw reply

* [PATCH 5/8] v3 Update the find_memory_block declaration
From: Nathan Fontenot @ 2010-07-20  3:56 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev; +Cc: greg, KAMEZAWA Hiroyuki
In-Reply-To: <4C451BF5.50304@austin.ibm.com>

Update the find_memory_block declaration to to take a struct mem_section *
so that it matches the definition.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
 include/linux/memory.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/include/linux/memory.h
===================================================================
--- linux-2.6.orig/include/linux/memory.h	2010-07-19 21:10:28.000000000 -0500
+++ linux-2.6/include/linux/memory.h	2010-07-19 21:12:46.000000000 -0500
@@ -116,7 +116,7 @@ extern int memory_dev_init(void);
 extern int remove_memory_block(unsigned long, struct mem_section *, int);
 extern int memory_notify(unsigned long val, void *v);
 extern int memory_isolate_notify(unsigned long val, void *v);
-extern struct memory_block *find_memory_block(unsigned long);
+extern struct memory_block *find_memory_block(struct mem_section *);
 extern int memory_is_hidden(struct mem_section *);
 #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
 enum mem_add_context { BOOT, HOTPLUG };

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox