LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] powerpc: Fix 44x Machine Check handling
From: Kumar Gala @ 2007-11-16 21:55 UTC (permalink / raw)
  To: benh; +Cc: Olof Johansson, linuxppc-dev
In-Reply-To: <1195199141.28865.144.camel@pasglop>


On Nov 16, 2007, at 1:45 AM, Benjamin Herrenschmidt wrote:

>
> On Fri, 2007-11-16 at 18:41 +1100, Benjamin Herrenschmidt wrote:
>> On Fri, 2007-11-16 at 01:40 -0600, Olof Johansson wrote:
>>> I'm not sure I like this. It introduces another cpu feature flag,
>>> that we'll soon run out of if it's used to signify version info per
>>> implementation like this.
>>>
>>> 1) The SET_IVOR could be done from the cpu_setups for 440A instead
>>> (i.e. introduce one).
>>>
>>> 2) Please just move the machine check handlers out to individual  
>>> ones
>>> instead of using the generic one. That way you don't need runtime  
>>> checks
>>> between the two (they don't seem to share much of it as-is anyway).
>>>
>>> With the above two changes, you shouldn't need the feature bit any  
>>> more.
>>
>> We can easily make the cpu features bigger ... But ok, I'll have a  
>> look
>> at doing it the way you suggest.
>
> Note that first, I'd like to figure out if there are other relevant
> differences with 440A ... arch/ppc didn't list any and diff'ing PDFs  
> is
> not fun but if people around here know, please speak up


I think it added isel support.

- k

^ permalink raw reply

* Re: 85xx software reset problems from paulus.git
From: robert lazarski @ 2007-11-16 22:01 UTC (permalink / raw)
  Cc: linuxppc-embedded
In-Reply-To: <0B4C1069-9D89-44E8-89F7-9E7CE07B03DF@kernel.crashing.org>

On Nov 16, 2007 4:46 PM, Kumar Gala <galak@kernel.crashing.org> wrote:
>
>
> On Nov 16, 2007, at 3:28 PM, robert lazarski wrote:
>
> > On Nov 16, 2007 3:44 PM, robert lazarski <robertlazarski@gmail.com>
> > wrote:
> >> On Nov 16, 2007 10:27 AM, Clemens Koller
> >> <clemens.koller@anagramm.de> wrote:
> >>> The SRESET# (pin AF20) is the soft reset input, causes
> >>> an mcp assertion to the core.... (RTFM)
> >>>
> >>
> >> That's what we are doing. The 85xx docs say "Soft reset. Causes a
> >> machine check interrupt to the e500 core. Note that if the e500 core
> >> is not configured to process machine check interrupts, the assertion
> >> of SRESET causes a core checkstop. SRESET need not be asserted during
> >> a hard reset."
> >>
> >
> > Sorry for replying to myself, but thought I'd mention SRESET works
> > fine on 85xx 2.6.23 , ie, the board resets after kernel panic. It
> > doesn't work for me on 2.6.24rc2 .
>
> What actual 85xx are you using?
>
> - k
>

Custom 8548 board. I'm using the cds 85xx code for a reference and I
calling the same reset functions.

Robert

^ permalink raw reply

* Re: [RFC/PATCH] powerpc: Fix powerpc 32 bits resource fixup for 64 bits resources
From: Vitaly Bordug @ 2007-11-16 22:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <20071116072916.ECE2FDDDFE@ozlabs.org>

On Fri, 16 Nov 2007 18:28:34 +1100
Benjamin Herrenschmidt wrote:

> The 32bits powerpc resource fixup code uses unsigned longs to do the
> offseting of resources which overflows on platforms such as 4xx where
> resources can be 64 bits.
> 
> This fixes it by using resource_size_t instead.
> 
> However, the IO stuff does rely on some 32 bits arithmetic, so we hack
> by cropping the result of the fixups for IO resources with a 32 bits
> mask.
> 
> This isn't the prettiest but should work for now until we change the
> 32 bits PCI code to do IO mappings like 64 bits does, within a
> reserved are of the kernel address space.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> 
> DO NOT MERGE YET ! This has only been tested with some preliminary PCI
> support code I have for Ebony, I haven't yet verified that the masking
> stuff works fine on 32 bits machines with multiple busses and negative
> offsets.
> 
I can give it a try with some FSL boxes... This approach makes sense sense I think.

-V
>  arch/powerpc/kernel/pci_32.c |   44
> +++++++++++++++++++++++-------------------- 1 file changed, 24
> insertions(+), 20 deletions(-)
> 
> Index: linux-work/arch/powerpc/kernel/pci_32.c
> ===================================================================
> --- linux-work.orig/arch/powerpc/kernel/pci_32.c	2007-11-16
> 15:48:27.000000000 +1100 +++
> linux-work/arch/powerpc/kernel/pci_32.c	2007-11-16
> 15:55:54.000000000 +1100 @@ -104,7 +104,7 @@
> pcibios_fixup_resources(struct pci_dev * { struct pci_controller*
> hose = (struct pci_controller *)dev->sysdata; int i;
> -	unsigned long offset;
> +	resource_size_t offset, mask;
>  
>  	if (!hose) {
>  		printk(KERN_ERR "No hose for PCI dev %s!\n",
> pci_name(dev)); @@ -123,15 +123,17 @@ pcibios_fixup_resources(struct
> pci_dev * continue;
>  		}
>  		offset = 0;
> +		mask = (resource_size_t)-1;
>  		if (res->flags & IORESOURCE_MEM) {
>  			offset = hose->pci_mem_offset;
>  		} else if (res->flags & IORESOURCE_IO) {
>  			offset = (unsigned long) hose->io_base_virt
>  				- isa_io_base;
> +			mask = 0xffffffffu;
>  		}
>  		if (offset != 0) {
> -			res->start += offset;
> -			res->end += offset;
> +			res->start = (res->start + offset) & mask;
> +			res->end = (res->end + offset) & mask;
>  			DBG("Fixup res %d (%lx) of dev %s: %llx ->
> %llx\n", i, res->flags, pci_name(dev),
>  			    (u64)res->start - offset,
> (u64)res->start); @@ -147,30 +149,32 @@
> DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID,		PC void
> pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region
> *region, struct resource *res) {
> -	unsigned long offset = 0;
> +	resource_size_t offset = 0, mask = (resource_size_t)-1;
>  	struct pci_controller *hose = dev->sysdata;
>  
> -	if (hose && res->flags & IORESOURCE_IO)
> +	if (hose && res->flags & IORESOURCE_IO) {
>  		offset = (unsigned long)hose->io_base_virt -
> isa_io_base;
> -	else if (hose && res->flags & IORESOURCE_MEM)
> +		mask = 0xffffffffu;
> +	} else if (hose && res->flags & IORESOURCE_MEM)
>  		offset = hose->pci_mem_offset;
> -	region->start = res->start - offset;
> -	region->end = res->end - offset;
> +	region->start = (res->start - offset) & mask;
> +	region->end = (res->end - offset) & mask;
>  }
>  EXPORT_SYMBOL(pcibios_resource_to_bus);
>  
>  void pcibios_bus_to_resource(struct pci_dev *dev, struct resource
> *res, struct pci_bus_region *region)
>  {
> -	unsigned long offset = 0;
> +	resource_size_t offset = 0, mask = (resource_size_t)-1;
>  	struct pci_controller *hose = dev->sysdata;
>  
> -	if (hose && res->flags & IORESOURCE_IO)
> +	if (hose && res->flags & IORESOURCE_IO) {
>  		offset = (unsigned long)hose->io_base_virt -
> isa_io_base;
> -	else if (hose && res->flags & IORESOURCE_MEM)
> +		mask = 0xffffffffu;
> +	} else if (hose && res->flags & IORESOURCE_MEM)
>  		offset = hose->pci_mem_offset;
> -	res->start = region->start + offset;
> -	res->end = region->end + offset;
> +	res->start = (region->start + offset) & mask;
> +	res->end = (region->end + offset) & mask;
>  }
>  EXPORT_SYMBOL(pcibios_bus_to_resource);
>  
> @@ -334,9 +338,9 @@ static int __init
>  pci_relocate_bridge_resource(struct pci_bus *bus, int i)
>  {
>  	struct resource *res, *pr, *conflict;
> -	unsigned long try, size;
> -	int j;
> +	resource_size_t try, size;
>  	struct pci_bus *parent = bus->parent;
> +	int j;
>  
>  	if (parent == NULL) {
>  		/* shouldn't ever happen */
> @@ -438,7 +442,7 @@ update_bridge_resource(struct pci_dev *d
>  	u8 io_base_lo, io_limit_lo;
>  	u16 mem_base, mem_limit;
>  	u16 cmd;
> -	unsigned long start, end, off;
> +	resource_size_t start, end, off;
>  	struct pci_controller *hose = dev->sysdata;
>  
>  	if (!hose) {
> @@ -1157,8 +1161,8 @@ void pcibios_fixup_bus(struct pci_bus *b
>  			res->end = IO_SPACE_LIMIT;
>  			res->flags = IORESOURCE_IO;
>  		}
> -		res->start += io_offset;
> -		res->end += io_offset;
> +		res->start = (res->start + io_offset) & 0xffffffffu;
> +		res->end = (res->end + io_offset) & 0xffffffffu;
>  
>  		for (i = 0; i < 3; ++i) {
>  			res = &hose->mem_resources[i];
> @@ -1183,8 +1187,8 @@ void pcibios_fixup_bus(struct pci_bus *b
>  			if (!res->flags || bus->self->transparent)
>  				continue;
>  			if (io_offset && (res->flags &
> IORESOURCE_IO)) {
> -				res->start += io_offset;
> -				res->end += io_offset;
> +				res->start = (res->start +
> io_offset) & 0xffffffffu;
> +				res->end = (res->end + io_offset) &
> 0xffffffffu; } else if (hose->pci_mem_offset
>  				   && (res->flags & IORESOURCE_MEM))
> { res->start += hose->pci_mem_offset;
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev


-- 
Sincerely, Vitaly

^ permalink raw reply

* Re: [POWERPC] [RFC] Fix 8xx tlbie definition
From: Vitaly Bordug @ 2007-11-16 22:28 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev
In-Reply-To: <1195247189.28865.174.camel@pasglop>

On Sat, 17 Nov 2007 08:06:29 +1100
Benjamin Herrenschmidt wrote:

> 
> On Fri, 2007-11-16 at 11:28 -0600, Josh Boyer wrote:
> > Git commit e701d269aa28996f3502780951fe1b12d5d66b49 introduced an
> > incorrect definition for _tlbie on PowerPC 8xx platforms.  Only the
> > address should be passed to the function.  This patch corrects the
> > definition of _tlbie and the related tlb flushing functions for 8xx.
> > 
> > Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
> 
> That conflicts with the patch I posted to fix it differently (I kept
> the additional argument).
> 

Where can I grab it to give a try? My linuxppc archive is silent for some reason..
> Which one do we take ?
> 

If your solution will work out, I'd agree with Kumar to have this thing consistent.
If not - It might be cheapier just to fix it this gross way, to keep 8xx stuff running. 

-- 
Sincerely, Vitaly

^ permalink raw reply

* Re: [PATCH] powerpc: Fix 44x Machine Check handling
From: Benjamin Herrenschmidt @ 2007-11-16 22:21 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev
In-Reply-To: <6612E7C6-C570-4748-BCEE-AC6733C0BB14@kernel.crashing.org>


> > Index: linux-work/include/asm-powerpc/cputable.h
> > ===================================================================
> > --- linux-work.orig/include/asm-powerpc/cputable.h	2007-11-16  
> > 16:14:29.000000000 +1100
> > +++ linux-work/include/asm-powerpc/cputable.h	2007-11-16  
> > 16:19:35.000000000 +1100
> > @@ -138,6 +138,7 @@ extern void do_feature_fixups(unsigned l
> > #define CPU_FTR_FPU_UNAVAILABLE		ASM_CONST(0x0000000000800000)
> > #define CPU_FTR_UNIFIED_ID_CACHE	ASM_CONST(0x0000000001000000)
> > #define CPU_FTR_SPE			ASM_CONST(0x0000000002000000)
> > +#define CPU_FTR_440A			ASM_CONST(0x0000000004000000)
> 
> Can we be more specific about what this feature really means.
> 
> How about something like CPU_FTR_ENH_MCHCK or something like that.

Did that at first, then figured out that I indeed had 2 core manuals one
of them being labelled "A"... I'm trying to figure out what other
differences they may have to see whether I should stick to that CPU
feature or just remove it completely and do as Olof suggested.

Ben.

^ permalink raw reply

* Re: [POWERPC] [RFC] Fix 8xx tlbie definition
From: Benjamin Herrenschmidt @ 2007-11-17  1:05 UTC (permalink / raw)
  To: Vitaly Bordug; +Cc: linuxppc-dev
In-Reply-To: <20071117012837.4811d394@kernel.crashing.org>


On Sat, 2007-11-17 at 01:28 +0300, Vitaly Bordug wrote:
> On Sat, 17 Nov 2007 08:06:29 +1100
> Benjamin Herrenschmidt wrote:
> 
> > 
> > On Fri, 2007-11-16 at 11:28 -0600, Josh Boyer wrote:
> > > Git commit e701d269aa28996f3502780951fe1b12d5d66b49 introduced an
> > > incorrect definition for _tlbie on PowerPC 8xx platforms.  Only the
> > > address should be passed to the function.  This patch corrects the
> > > definition of _tlbie and the related tlb flushing functions for 8xx.
> > > 
> > > Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
> > 
> > That conflicts with the patch I posted to fix it differently (I kept
> > the additional argument).
> > 
> 
> Where can I grab it to give a try? My linuxppc archive is silent for some reason..

Looks like I may have failed to post it ... weird, I was sure I posted
that days ago, when Olof first mentioned the breakage. I'll check &
resend.

> > Which one do we take ?
> > 
> 
> If your solution will work out, I'd agree with Kumar to have this thing consistent.
> If not - It might be cheapier just to fix it this gross way, to keep 8xx stuff running. 

Ben.

^ permalink raw reply

* multiprocessor
From: keng_629 @ 2007-11-17  6:45 UTC (permalink / raw)
  To: linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 202 bytes --]

i am trying to make a exam about Multiprocessor on the Xilinx Virtex-4 with two PowerPc hard core.
please give me some advices about  bootloader and os.
how can i start my work.

keng_629
2007-11-17

[-- Attachment #2: Type: text/html, Size: 855 bytes --]

^ permalink raw reply

* MPC880: i2cer register says tx is done but tx buf descriptor is still ready
From: DI BACCO ANTONIO - technolabs @ 2007-11-17 10:07 UTC (permalink / raw)
  To: linuxppc-embedded

How could it be possible? It happens during the first i2c transactions
and then no more.=20

Bye,
Antonio.

^ permalink raw reply

* Re: [POWERPC] [RFC] Fix 8xx tlbie definition
From: Josh Boyer @ 2007-11-17 17:06 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev
In-Reply-To: <1195261547.28865.184.camel@pasglop>

On Sat, 17 Nov 2007 12:05:47 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> 
> On Sat, 2007-11-17 at 01:28 +0300, Vitaly Bordug wrote:
> > On Sat, 17 Nov 2007 08:06:29 +1100
> > Benjamin Herrenschmidt wrote:
> > 
> > > 
> > > On Fri, 2007-11-16 at 11:28 -0600, Josh Boyer wrote:
> > > > Git commit e701d269aa28996f3502780951fe1b12d5d66b49 introduced an
> > > > incorrect definition for _tlbie on PowerPC 8xx platforms.  Only the
> > > > address should be passed to the function.  This patch corrects the
> > > > definition of _tlbie and the related tlb flushing functions for 8xx.
> > > > 
> > > > Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
> > > 
> > > That conflicts with the patch I posted to fix it differently (I kept
> > > the additional argument).
> > > 
> > 
> > Where can I grab it to give a try? My linuxppc archive is silent for some reason..
> 
> Looks like I may have failed to post it ... weird, I was sure I posted
> that days ago, when Olof first mentioned the breakage. I'll check &
> resend.

I never saw it.  If I had, I wouldn't have bothered to post my own
version :)

> > > Which one do we take ?
> > > 
> > 
> > If your solution will work out, I'd agree with Kumar to have this thing consistent.
> > If not - It might be cheapier just to fix it this gross way, to keep 8xx stuff running. 

Consistency is fine with me.  I was going for quick and dirty to make
sure it wasn't broken in .24.

josh

^ permalink raw reply

* Re: [PATCH] powerpc: Fix 44x Machine Check handling
From: Josh Boyer @ 2007-11-17 17:09 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <BC413657-C715-44E3-B5B5-D173E33E862F@kernel.crashing.org>

On Fri, 16 Nov 2007 15:55:25 -0600
Kumar Gala <galak@kernel.crashing.org> wrote:

> 
> On Nov 16, 2007, at 1:45 AM, Benjamin Herrenschmidt wrote:
> 
> >
> > On Fri, 2007-11-16 at 18:41 +1100, Benjamin Herrenschmidt wrote:
> >> On Fri, 2007-11-16 at 01:40 -0600, Olof Johansson wrote:
> >>> I'm not sure I like this. It introduces another cpu feature flag,
> >>> that we'll soon run out of if it's used to signify version info per
> >>> implementation like this.
> >>>
> >>> 1) The SET_IVOR could be done from the cpu_setups for 440A instead
> >>> (i.e. introduce one).
> >>>
> >>> 2) Please just move the machine check handlers out to individual  
> >>> ones
> >>> instead of using the generic one. That way you don't need runtime  
> >>> checks
> >>> between the two (they don't seem to share much of it as-is anyway).
> >>>
> >>> With the above two changes, you shouldn't need the feature bit any  
> >>> more.
> >>
> >> We can easily make the cpu features bigger ... But ok, I'll have a  
> >> look
> >> at doing it the way you suggest.
> >
> > Note that first, I'd like to figure out if there are other relevant
> > differences with 440A ... arch/ppc didn't list any and diff'ing PDFs  
> > is
> > not fun but if people around here know, please speak up
> 
> 
> I think it added isel support.

I'm not entirely sure about that, but I'll check.  440x4 cores lack
isel, 440x5, 440x6 have it.  I don't think it was tied to the 'A'
moniker.

josh

^ permalink raw reply

* Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
From: Torsten Kaiser @ 2007-11-17 17:53 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: LKML, Trond Myklebust, linuxppc-dev, nfs, Andrew Morton,
	Jan Blunck, Balbir Singh
In-Reply-To: <473DA608.1020804@linux.vnet.ibm.com>

On Nov 16, 2007 3:15 PM, Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> Hi Andrew,
>
> The kernel enters the xmon state while running the file system
> stress on nfs v4 mounted partition.
[snip]
> 0:mon> t
> [c0000000dbd4fb50] c000000000069768 .__wake_up+0x54/0x88
> [c0000000dbd4fc00] d00000000086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> [c0000000dbd4fc80] d000000000872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> [c0000000dbd4fd10] d000000000598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
> [c0000000dbd4fda0] c00000000008d960 .run_workqueue+0x10c/0x1f4
> [c0000000dbd4fe50] c00000000008ec70 .worker_thread+0x118/0x138
> [c0000000dbd4ff00] c0000000000939f4 .kthread+0x78/0xc4
> [c0000000dbd4ff90] c00000000002b060 .kernel_thread+0x4c/0x68

Definitely not a ppc problem.
Got nearly the same backtrace on 64bit x86:
[  966.712167] BUG: soft lockup - CPU#3 stuck for 11s! [rpciod/3:605]
[  966.718522] CPU 3:
[  966.720589] Modules linked in: radeon drm nfsd exportfs ipv6
w83792d tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat hid sg i2c_nforce2 pata_amd
[  966.748306] Pid: 605, comm: rpciod/3 Not tainted 2.6.24-rc2-mm1 #4
[  966.754653] RIP: 0010:[<ffffffff805b0542>]  [<ffffffff805b0542>]
_spin_lock_irqsave+0x12/0x30
[  966.763424] RSP: 0018:ffff81007ef33e28  EFLAGS: 00000286
[  966.768879] RAX: 0000000000000286 RBX: ffff81007ef33e60 RCX: 0000000000000000
[  966.776204] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff81011e107960
[  966.783511] RBP: ffff81011cc6c588 R08: ffff8100db918130 R09: ffff81011cc6c540
[  966.790837] R10: 0000000000000000 R11: ffffffff80266390 R12: ffff8100d2d693a8
[  966.798170] R13: ffff81011cc6c588 R14: ffff8100d2d693a8 R15: ffffffff80302726
[  966.805505] FS:  00007f9e739d96f0(0000) GS:ffff81011ff12700(0000)
knlGS:0000000000000000
[  966.813805] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  966.819703] CR2: 0000000001b691d0 CR3: 0000000069861000 CR4: 00000000000006e0
[  966.827039] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  966.834362] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  966.841687]
[  966.841687] Call Trace:
[  966.845728]  [<ffffffff8022cf4d>] __wake_up+0x2d/0x70
[  966.850900]  [<ffffffff802f5e6e>] nfs_free_unlinkdata+0x1e/0x50
[  966.857004]  [<ffffffff80593f66>] rpc_release_calldata+0x26/0x50
[  966.863161]  [<ffffffff80594930>] rpc_async_schedule+0x0/0x10
[  966.869078]  [<ffffffff80245cec>] run_workqueue+0xcc/0x170
[  966.874705]  [<ffffffff802467a0>] worker_thread+0x0/0xb0
[  966.880163]  [<ffffffff802467a0>] worker_thread+0x0/0xb0
[  966.885610]  [<ffffffff8024680d>] worker_thread+0x6d/0xb0
[  966.891148]  [<ffffffff8024a140>] autoremove_wake_function+0x0/0x30
[  966.897606]  [<ffffffff802467a0>] worker_thread+0x0/0xb0
[  966.903045]  [<ffffffff802467a0>] worker_thread+0x0/0xb0
[  966.908485]  [<ffffffff80249d5b>] kthread+0x4b/0x80
[  966.913484]  [<ffffffff8020ca28>] child_rip+0xa/0x12
[  966.918579]  [<ffffffff80249d10>] kthread+0x0/0x80
[  966.923498]  [<ffffffff8020ca1e>] child_rip+0x0/0x12
[  966.928584]

Sadly lockdep does not work for me, as it gets turned off early:
[   39.851594] ---------------------------------
[   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[   39.866963]  (&n->list_lock){-+..}, at: [<ffffffff802935c1>]
add_partial+0x31/0xa0
[   39.874712] {softirq-on-W} state was registered at:
[   39.879788]   [<ffffffff80259fb8>] __lock_acquire+0x3e8/0x1140
[   39.885763]   [<ffffffff80259838>] debug_check_no_locks_freed+0x188/0x1a0
[   39.892682]   [<ffffffff8025ad65>] lock_acquire+0x55/0x70
[   39.898840]   [<ffffffff802935c1>] add_partial+0x31/0xa0
[   39.904288]   [<ffffffff805c76de>] _spin_lock+0x1e/0x30
[   39.909650]   [<ffffffff802935c1>] add_partial+0x31/0xa0
[   39.915097]   [<ffffffff80296f9c>] kmem_cache_open+0x1cc/0x330
[   39.921066]   [<ffffffff805c7984>] _spin_unlock_irq+0x24/0x30
[   39.926946]   [<ffffffff802974f4>] create_kmalloc_cache+0x64/0xf0
[   39.933172]   [<ffffffff80295640>] init_alloc_cpu_cpu+0x70/0x90
[   39.939226]   [<ffffffff8080ada5>] kmem_cache_init+0x65/0x1d0
[   39.945289]   [<ffffffff807f1b4e>] start_kernel+0x23e/0x350
[   39.950996]   [<ffffffff807f112d>] _sinittext+0x12d/0x140
[   39.956529]   [<ffffffffffffffff>] 0xffffffffffffffff
[   39.961720] irq event stamp: 1207
[   39.965048] hardirqs last  enabled at (1206): [<ffffffff80259838>]
debug_check_no_locks_freed+0x188/0x1a0
[   39.974701] hardirqs last disabled at (1207): [<ffffffff802952eb>]
__slab_free+0x3b/0x190
[   39.982968] softirqs last  enabled at (570): [<ffffffff8020cf0c>]
call_softirq+0x1c/0x30
[   39.991148] softirqs last disabled at (1197): [<ffffffff8020cf0c>]
call_softirq+0x1c/0x30
[   39.999415]
[   39.999416] other info that might help us debug this:
[   40.005990] no locks held by swapper/0.
[   40.010018]
[   40.010018] stack backtrace:
[   40.014429]
[   40.014429] Call Trace:
[   40.018407]  <IRQ>  [<ffffffff8025847c>] print_usage_bug+0x18c/0x1a0
[   40.024817]  [<ffffffff802593ec>] mark_lock+0x64c/0x660
[   40.030057]  [<ffffffff80259f6e>] __lock_acquire+0x39e/0x1140
[   40.035818]  [<ffffffff80257717>] save_trace+0x37/0xa0
[   40.040972]  [<ffffffff802492cd>] __rcu_process_callbacks+0x8d/0x250
[   40.047335]  [<ffffffff8025ad65>] lock_acquire+0x55/0x70
[   40.052663]  [<ffffffff802935c1>] add_partial+0x31/0xa0
[   40.057905]  [<ffffffff802595d3>] trace_hardirqs_on+0x83/0x160
[   40.063750]  [<ffffffff805c76de>] _spin_lock+0x1e/0x30
[   40.068905]  [<ffffffff802935c1>] add_partial+0x31/0xa0
[   40.074311]  [<ffffffff802953b0>] __slab_free+0x100/0x190
[   40.079724]  [<ffffffff802492cd>] __rcu_process_callbacks+0x8d/0x250
[   40.086088]  [<ffffffff8023b79c>] tasklet_action+0x2c/0xc0
[   40.091588]  [<ffffffff802494b3>] rcu_process_callbacks+0x23/0x50
[   40.097694]  [<ffffffff8023b7ba>] tasklet_action+0x4a/0xc0
[   40.103194]  [<ffffffff8023b67a>] __do_softirq+0x7a/0x100
[   40.108607]  [<ffffffff8020cf0c>] call_softirq+0x1c/0x30
[   40.113935]  [<ffffffff8020f125>] do_softirq+0x55/0xb0
[   40.119089]  [<ffffffff8023b5f7>] irq_exit+0x97/0xa0
[   40.124073]  [<ffffffff8021bf2c>] smp_apic_timer_interrupt+0x7c/0xc0
[   40.130434]  [<ffffffff8020ac70>] default_idle+0x0/0x60
[   40.135840]  [<ffffffff8020ac70>] default_idle+0x0/0x60
[   40.141080]  [<ffffffff8020c9bb>] apic_timer_interrupt+0x6b/0x70
[   40.147100]  <EOI>  [<ffffffff8020aca7>] default_idle+0x37/0x60
[   40.153066]  [<ffffffff8020aca5>] default_idle+0x35/0x60
[   40.158393]  [<ffffffff8020ad2f>] cpu_idle+0x5f/0x90
[   40.163374]
[   40.164888] INFO: lockdep is turned off.

Don't know who to bug about that.

Torsten

^ permalink raw reply

* Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
From: Andrew Morton @ 2007-11-17 18:05 UTC (permalink / raw)
  To: Torsten Kaiser
  Cc: Trond Myklebust, LKML, Kamalesh Babulal, linuxppc-dev, nfs,
	Christoph Lameter, Jan Blunck, Balbir Singh
In-Reply-To: <64bb37e0711170953p67d1be49lf4eaa190d662e2b4@mail.gmail.com>

On Sat, 17 Nov 2007 18:53:45 +0100 "Torsten Kaiser" <just.for.lkml@googlemail.com> wrote:

> On Nov 16, 2007 3:15 PM, Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> > Hi Andrew,
> >
> > The kernel enters the xmon state while running the file system
> > stress on nfs v4 mounted partition.
> [snip]
> > 0:mon> t
> > [c0000000dbd4fb50] c000000000069768 .__wake_up+0x54/0x88
> > [c0000000dbd4fc00] d00000000086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> > [c0000000dbd4fc80] d000000000872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> > [c0000000dbd4fd10] d000000000598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
> > [c0000000dbd4fda0] c00000000008d960 .run_workqueue+0x10c/0x1f4
> > [c0000000dbd4fe50] c00000000008ec70 .worker_thread+0x118/0x138
> > [c0000000dbd4ff00] c0000000000939f4 .kthread+0x78/0xc4
> > [c0000000dbd4ff90] c00000000002b060 .kernel_thread+0x4c/0x68
> 
> Definitely not a ppc problem.
> Got nearly the same backtrace on 64bit x86:
> [  966.712167] BUG: soft lockup - CPU#3 stuck for 11s! [rpciod/3:605]
> [  966.718522] CPU 3:
> [  966.720589] Modules linked in: radeon drm nfsd exportfs ipv6
> w83792d tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
> tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
> videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
> v4l1_compat hid sg i2c_nforce2 pata_amd
> [  966.748306] Pid: 605, comm: rpciod/3 Not tainted 2.6.24-rc2-mm1 #4
> [  966.754653] RIP: 0010:[<ffffffff805b0542>]  [<ffffffff805b0542>]
> _spin_lock_irqsave+0x12/0x30
> [  966.763424] RSP: 0018:ffff81007ef33e28  EFLAGS: 00000286
> [  966.768879] RAX: 0000000000000286 RBX: ffff81007ef33e60 RCX: 0000000000000000
> [  966.776204] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff81011e107960
> [  966.783511] RBP: ffff81011cc6c588 R08: ffff8100db918130 R09: ffff81011cc6c540
> [  966.790837] R10: 0000000000000000 R11: ffffffff80266390 R12: ffff8100d2d693a8
> [  966.798170] R13: ffff81011cc6c588 R14: ffff8100d2d693a8 R15: ffffffff80302726
> [  966.805505] FS:  00007f9e739d96f0(0000) GS:ffff81011ff12700(0000)
> knlGS:0000000000000000
> [  966.813805] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [  966.819703] CR2: 0000000001b691d0 CR3: 0000000069861000 CR4: 00000000000006e0
> [  966.827039] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  966.834362] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  966.841687]
> [  966.841687] Call Trace:
> [  966.845728]  [<ffffffff8022cf4d>] __wake_up+0x2d/0x70
> [  966.850900]  [<ffffffff802f5e6e>] nfs_free_unlinkdata+0x1e/0x50
> [  966.857004]  [<ffffffff80593f66>] rpc_release_calldata+0x26/0x50
> [  966.863161]  [<ffffffff80594930>] rpc_async_schedule+0x0/0x10
> [  966.869078]  [<ffffffff80245cec>] run_workqueue+0xcc/0x170
> [  966.874705]  [<ffffffff802467a0>] worker_thread+0x0/0xb0
> [  966.880163]  [<ffffffff802467a0>] worker_thread+0x0/0xb0
> [  966.885610]  [<ffffffff8024680d>] worker_thread+0x6d/0xb0
> [  966.891148]  [<ffffffff8024a140>] autoremove_wake_function+0x0/0x30
> [  966.897606]  [<ffffffff802467a0>] worker_thread+0x0/0xb0
> [  966.903045]  [<ffffffff802467a0>] worker_thread+0x0/0xb0
> [  966.908485]  [<ffffffff80249d5b>] kthread+0x4b/0x80
> [  966.913484]  [<ffffffff8020ca28>] child_rip+0xa/0x12
> [  966.918579]  [<ffffffff80249d10>] kthread+0x0/0x80
> [  966.923498]  [<ffffffff8020ca1e>] child_rip+0x0/0x12
> [  966.928584]

I don't know what'a causing that.  I spose I should set up nfs4.

> Sadly lockdep does not work for me, as it gets turned off early:
> [   39.851594] ---------------------------------
> [   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> [   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> [   39.866963]  (&n->list_lock){-+..}, at: [<ffffffff802935c1>]
> add_partial+0x31/0xa0
> [   39.874712] {softirq-on-W} state was registered at:
> [   39.879788]   [<ffffffff80259fb8>] __lock_acquire+0x3e8/0x1140
> [   39.885763]   [<ffffffff80259838>] debug_check_no_locks_freed+0x188/0x1a0
> [   39.892682]   [<ffffffff8025ad65>] lock_acquire+0x55/0x70
> [   39.898840]   [<ffffffff802935c1>] add_partial+0x31/0xa0
> [   39.904288]   [<ffffffff805c76de>] _spin_lock+0x1e/0x30
> [   39.909650]   [<ffffffff802935c1>] add_partial+0x31/0xa0
> [   39.915097]   [<ffffffff80296f9c>] kmem_cache_open+0x1cc/0x330
> [   39.921066]   [<ffffffff805c7984>] _spin_unlock_irq+0x24/0x30
> [   39.926946]   [<ffffffff802974f4>] create_kmalloc_cache+0x64/0xf0
> [   39.933172]   [<ffffffff80295640>] init_alloc_cpu_cpu+0x70/0x90
> [   39.939226]   [<ffffffff8080ada5>] kmem_cache_init+0x65/0x1d0
> [   39.945289]   [<ffffffff807f1b4e>] start_kernel+0x23e/0x350
> [   39.950996]   [<ffffffff807f112d>] _sinittext+0x12d/0x140
> [   39.956529]   [<ffffffffffffffff>] 0xffffffffffffffff
> [   39.961720] irq event stamp: 1207
> [   39.965048] hardirqs last  enabled at (1206): [<ffffffff80259838>]
> debug_check_no_locks_freed+0x188/0x1a0
> [   39.974701] hardirqs last disabled at (1207): [<ffffffff802952eb>]
> __slab_free+0x3b/0x190
> [   39.982968] softirqs last  enabled at (570): [<ffffffff8020cf0c>]
> call_softirq+0x1c/0x30
> [   39.991148] softirqs last disabled at (1197): [<ffffffff8020cf0c>]
> call_softirq+0x1c/0x30
> [   39.999415]
> [   39.999416] other info that might help us debug this:
> [   40.005990] no locks held by swapper/0.
> [   40.010018]
> [   40.010018] stack backtrace:
> [   40.014429]
> [   40.014429] Call Trace:
> [   40.018407]  <IRQ>  [<ffffffff8025847c>] print_usage_bug+0x18c/0x1a0
> [   40.024817]  [<ffffffff802593ec>] mark_lock+0x64c/0x660
> [   40.030057]  [<ffffffff80259f6e>] __lock_acquire+0x39e/0x1140
> [   40.035818]  [<ffffffff80257717>] save_trace+0x37/0xa0
> [   40.040972]  [<ffffffff802492cd>] __rcu_process_callbacks+0x8d/0x250
> [   40.047335]  [<ffffffff8025ad65>] lock_acquire+0x55/0x70
> [   40.052663]  [<ffffffff802935c1>] add_partial+0x31/0xa0
> [   40.057905]  [<ffffffff802595d3>] trace_hardirqs_on+0x83/0x160
> [   40.063750]  [<ffffffff805c76de>] _spin_lock+0x1e/0x30
> [   40.068905]  [<ffffffff802935c1>] add_partial+0x31/0xa0
> [   40.074311]  [<ffffffff802953b0>] __slab_free+0x100/0x190
> [   40.079724]  [<ffffffff802492cd>] __rcu_process_callbacks+0x8d/0x250
> [   40.086088]  [<ffffffff8023b79c>] tasklet_action+0x2c/0xc0
> [   40.091588]  [<ffffffff802494b3>] rcu_process_callbacks+0x23/0x50
> [   40.097694]  [<ffffffff8023b7ba>] tasklet_action+0x4a/0xc0
> [   40.103194]  [<ffffffff8023b67a>] __do_softirq+0x7a/0x100
> [   40.108607]  [<ffffffff8020cf0c>] call_softirq+0x1c/0x30
> [   40.113935]  [<ffffffff8020f125>] do_softirq+0x55/0xb0
> [   40.119089]  [<ffffffff8023b5f7>] irq_exit+0x97/0xa0
> [   40.124073]  [<ffffffff8021bf2c>] smp_apic_timer_interrupt+0x7c/0xc0
> [   40.130434]  [<ffffffff8020ac70>] default_idle+0x0/0x60
> [   40.135840]  [<ffffffff8020ac70>] default_idle+0x0/0x60
> [   40.141080]  [<ffffffff8020c9bb>] apic_timer_interrupt+0x6b/0x70
> [   40.147100]  <EOI>  [<ffffffff8020aca7>] default_idle+0x37/0x60
> [   40.153066]  [<ffffffff8020aca5>] default_idle+0x35/0x60
> [   40.158393]  [<ffffffff8020ad2f>] cpu_idle+0x5f/0x90
> [   40.163374]
> [   40.164888] INFO: lockdep is turned off.
> 
> Don't know who to bug about that.
> 

That's slub.  It appears that list_lock is being taken from process context
in one place and from softirq in another.

^ permalink raw reply

* Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
From: Ingo Molnar @ 2007-11-17 18:09 UTC (permalink / raw)
  To: Torsten Kaiser
  Cc: Trond Myklebust, Peter Zijlstra, LKML, Kamalesh Babulal,
	linuxppc-dev, nfs, Andrew Morton, Jan Blunck, Balbir Singh
In-Reply-To: <64bb37e0711170953p67d1be49lf4eaa190d662e2b4@mail.gmail.com>


* Torsten Kaiser <just.for.lkml@googlemail.com> wrote:

> Sadly lockdep does not work for me, as it gets turned off early:
> [   39.851594] ---------------------------------
> [   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> [   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> [   39.866963]  (&n->list_lock){-+..}, at: [<ffffffff802935c1>]

hey, that means it found a bug - which is not sad at all :-)

	Ingo

^ permalink raw reply

* Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
From: Andrew Morton @ 2007-11-17 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Trond Myklebust, Peter, Zijlstra, LKML, Torsten Kaiser,
	Kamalesh Babulal, linuxppc-dev, nfs, Jan Blunck, Balbir Singh
In-Reply-To: <20071117180946.GA14055@elte.hu>

On Sat, 17 Nov 2007 19:09:46 +0100 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Torsten Kaiser <just.for.lkml@googlemail.com> wrote:
> 
> > Sadly lockdep does not work for me, as it gets turned off early:
> > [   39.851594] ---------------------------------
> > [   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> > [   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> > [   39.866963]  (&n->list_lock){-+..}, at: [<ffffffff802935c1>]
> 
> hey, that means it found a bug - which is not sad at all :-)
> 

mutter.

Torsten, you could try CONFIG_SLAB=y, CONFIG_SLUB=n to see if you can make
some progress on the NFS problem.

^ permalink raw reply

* Re: MPC880: i2cer register says tx is done but tx buf descriptor is still ready
From: Jochen Friedrich @ 2007-11-17 18:32 UTC (permalink / raw)
  To: DI BACCO ANTONIO - technolabs; +Cc: linuxppc-embedded
In-Reply-To: <F1F6EC0C8B75034F9E3A79FC85122E8EB7C778@aquib01a>

Hi Antonio,

> How could it be possible? It happens during the first i2c transactions
> and then no more. 
>   

What linux version? Which driver?

Thanks,
Jochen

^ permalink raw reply

* Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
From: Trond Myklebust @ 2007-11-17 18:58 UTC (permalink / raw)
  To: Torsten Kaiser
  Cc: LKML, Kamalesh Babulal, linuxppc-dev, nfs, Andrew Morton,
	Jan Blunck, Balbir Singh
In-Reply-To: <64bb37e0711170953p67d1be49lf4eaa190d662e2b4@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 893 bytes --]


On Sat, 2007-11-17 at 18:53 +0100, Torsten Kaiser wrote:
> On Nov 16, 2007 3:15 PM, Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> > Hi Andrew,
> >
> > The kernel enters the xmon state while running the file system
> > stress on nfs v4 mounted partition.
> [snip]
> > 0:mon> t
> > [c0000000dbd4fb50] c000000000069768 .__wake_up+0x54/0x88
> > [c0000000dbd4fc00] d00000000086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> > [c0000000dbd4fc80] d000000000872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> > [c0000000dbd4fd10] d000000000598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
> > [c0000000dbd4fda0] c00000000008d960 .run_workqueue+0x10c/0x1f4
> > [c0000000dbd4fe50] c00000000008ec70 .worker_thread+0x118/0x138
> > [c0000000dbd4ff00] c0000000000939f4 .kthread+0x78/0xc4
> > [c0000000dbd4ff90] c00000000002b060 .kernel_thread+0x4c/0x68

Could you try with the attached patch.

Cheers
  Trond

[-- Attachment #2: linux-2.6.24-007-fix_nfs_free_unlinkdata.dif --]
[-- Type: message/rfc822, Size: 1254 bytes --]

From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: NFS: Fix nfs_free_unlinkdata()
Date: Sat, 17 Nov 2007 13:52:36 -0500
Message-ID: <1195325920.7484.2.camel@localhost.localdomain>

We should really only be calling nfs_sb_deactive() at the end of an RPC
call, to balance the nfs_sb_active() call in nfs_do_call_unlink(). OTOH,
nfs_free_unlinkdata() can be called from a variety of other situations.

Fix is to move the call to nfs_sb_deactive() into
nfs_async_unlink_release().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/unlink.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index b97d3bb..c90862a 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -31,7 +31,6 @@ struct nfs_unlinkdata {
 static void
 nfs_free_unlinkdata(struct nfs_unlinkdata *data)
 {
-	nfs_sb_deactive(NFS_SERVER(data->dir));
 	iput(data->dir);
 	put_rpccred(data->cred);
 	kfree(data->args.name.name);
@@ -116,6 +115,7 @@ static void nfs_async_unlink_release(void *calldata)
 	struct nfs_unlinkdata	*data = calldata;
 
 	nfs_dec_sillycount(data->dir);
+	nfs_sb_deactive(NFS_SERVER(data->dir));
 	nfs_free_unlinkdata(data);
 }
 

^ permalink raw reply related

* Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
From: Torsten Kaiser @ 2007-11-17 19:18 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: LKML, Kamalesh Babulal, linuxppc-dev, nfs, Andrew Morton,
	Jan Blunck, Balbir Singh
In-Reply-To: <1195325920.7484.1.camel@localhost.localdomain>

On Nov 17, 2007 7:58 PM, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>
> On Sat, 2007-11-17 at 18:53 +0100, Torsten Kaiser wrote:
> > On Nov 16, 2007 3:15 PM, Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> > > Hi Andrew,
> > >
> > > The kernel enters the xmon state while running the file system
> > > stress on nfs v4 mounted partition.
> > [snip]
> > > 0:mon> t
> > > [c0000000dbd4fb50] c000000000069768 .__wake_up+0x54/0x88
> > > [c0000000dbd4fc00] d00000000086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> > > [c0000000dbd4fc80] d000000000872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> > > [c0000000dbd4fd10] d000000000598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
> > > [c0000000dbd4fda0] c00000000008d960 .run_workqueue+0x10c/0x1f4
> > > [c0000000dbd4fe50] c00000000008ec70 .worker_thread+0x118/0x138
> > > [c0000000dbd4ff00] c0000000000939f4 .kthread+0x78/0xc4
> > > [c0000000dbd4ff90] c00000000002b060 .kernel_thread+0x4c/0x68
>
> Could you try with the attached patch.
[snip]
> Fix is to move the call to nfs_sb_deactive() into
> nfs_async_unlink_release().

I realley doubt that will fix it.

My stacktrace was like:
run_workqueue
called: rpc_async_schedule
  that called: rpc_release_calldata
    which points to: nfs_async_unlink_release
       that called: nfs_free_unlinkdata

So it does not matter for me if nfs_sb_deactive is called one step earlier.

Currently building with SLAB instead SLUB to see if lockdep tells something...

Torsten

^ permalink raw reply

* Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
From: Christoph Lameter @ 2007-11-17 19:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Trond Myklebust, LKML, Torsten Kaiser, Kamalesh Babulal,
	linuxppc-dev, nfs, Jan Blunck, Balbir Singh
In-Reply-To: <20071117100507.912c5e5c.akpm@linux-foundation.org>

On Sat, 17 Nov 2007, Andrew Morton wrote:

> > Don't know who to bug about that.
> 
> That's slub.  It appears that list_lock is being taken from process context
> in one place and from softirq in another.

I kicked out some weird interrupt disable code in mm that was only run during
NUMA bootstrap.

This should fix it but isnt there some mechanism to convince lockdep that 
it is okay to do these things during bootstrap?

---
 mm/slub.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-11-17 11:31:21.044136631 -0800
+++ linux-2.6/mm/slub.c	2007-11-17 11:32:17.364386560 -0800
@@ -2044,7 +2044,9 @@ static struct kmem_cache_node *early_kme
 #endif
 	init_kmem_cache_node(n);
 	atomic_long_inc(&n->nr_slabs);
+	local_irq_disable();
 	add_partial(kmalloc_caches, page, 0);
+	local_irq_enable();
 	return n;
 }
 

^ permalink raw reply

* Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
From: Torsten Kaiser @ 2007-11-17 19:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Trond Myklebust, Peter Zijlstra, LKML, Kamalesh Babulal,
	linuxppc-dev, nfs, Ingo Molnar, Jan Blunck, Balbir Singh
In-Reply-To: <20071117101957.7562639d.akpm@linux-foundation.org>

On Nov 17, 2007 7:19 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Sat, 17 Nov 2007 19:09:46 +0100 Ingo Molnar <mingo@elte.hu> wrote:
>
> >
> > * Torsten Kaiser <just.for.lkml@googlemail.com> wrote:
> >
> > > Sadly lockdep does not work for me, as it gets turned off early:
> > > [   39.851594] ---------------------------------
> > > [   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> > > [   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> > > [   39.866963]  (&n->list_lock){-+..}, at: [<ffffffff802935c1>]
> >
> > hey, that means it found a bug - which is not sad at all :-)

It was sad, that it found a bug that I was not searching for. ;)

> mutter.
>
> Torsten, you could try CONFIG_SLAB=y, CONFIG_SLUB=n to see if you can make
> some progress on the NFS problem.

I should had thought of that myself... OK anyway here is the result:

The hang is reproducable, emerge froze the system again after download
the source.
Lockdep triggers immedetly before the freeze, but the result is still
not helpful:

[  221.565011] INFO: trying to register non-static key.
[  221.566999] the code is fine but needs lockdep annotation.
[  221.569206] turning off the locking correctness validator.
[  221.571404]
[  221.571405] Call Trace:
[  221.572996]  [<ffffffff8025a1b4>] __lock_acquire+0x4c4/0x1140
[  221.575298]  [<ffffffff8025ae85>] lock_acquire+0x55/0x70
[  221.577429]  [<ffffffff8022d6fd>] __wake_up+0x2d/0x70
[  221.579457]  [<ffffffff805c5f04>] _spin_lock_irqsave+0x34/0x50
[  221.581800]  [<ffffffff805c5e45>] _spin_unlock_irqrestore+0x55/0x70
[  221.584317]  [<ffffffff8022d6fd>] __wake_up+0x2d/0x70
[  221.586344]  [<ffffffff805a88b0>] rpc_async_schedule+0x0/0x10
[  221.588648]  [<ffffffff802fface>] nfs_free_unlinkdata+0x1e/0x50
[  221.591023]  [<ffffffff805a7e96>] rpc_release_calldata+0x26/0x50
[  221.593428]  [<ffffffff8024778f>] run_workqueue+0x16f/0x210
[  221.595662]  [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
[  221.598004]  [<ffffffff802483d0>] worker_thread+0x0/0xb0
[  221.600130]  [<ffffffff802483d0>] worker_thread+0x0/0xb0
[  221.602265]  [<ffffffff8024843d>] worker_thread+0x6d/0xb0
[  221.604431]  [<ffffffff8024bfc0>] autoremove_wake_function+0x0/0x30
[  221.606939]  [<ffffffff802483d0>] worker_thread+0x0/0xb0
[  221.609067]  [<ffffffff802483d0>] worker_thread+0x0/0xb0
[  221.611199]  [<ffffffff8024bbeb>] kthread+0x4b/0x80
[  221.613156]  [<ffffffff8020cb98>] child_rip+0xa/0x12
[  221.615151]  [<ffffffff8020c2af>] restore_args+0x0/0x30
[  221.617247]  [<ffffffff8024bba0>] kthread+0x0/0x80
[  221.619162]  [<ffffffff8020cb8e>] child_rip+0x0/0x12
[  221.621147]
[  221.621749] INFO: lockdep is turned off.
[  226.369259] SysRq : Emergency Sync
[  226.331342] Emergency Sync complete
[  227.064545] SysRq : Emergency Remount R/O
[  228.193491] SysRq : Emergency Sync
[  228.155593] Emergency Sync complete
[  228.767931] SysRq : Resetting

I also had another BUG output during system startup, but that should
be unrelated:
[  103.254681] BUG: sleeping function called from invalid context at
kernel/rwsem.c:20
[  103.257757] in_atomic():0, irqs_disabled():1
[  103.259469] 1 lock held by artsd/5883:
[  103.259470]  #0:  (pm_qos_lock){....}, at: [<ffffffff80250efb>]
pm_qos_add_requirement+0x6b/0xf0
[  103.263316] irq event stamp: 49712
[  103.263318] hardirqs last  enabled at (49711): [<ffffffff802941ed>]
__kmalloc+0x10d/0x180
[  103.263321] hardirqs last disabled at (49712): [<ffffffff805c5eea>]
_spin_lock_irqsave+0x1a/0x50
[  103.263326] softirqs last  enabled at (48820): [<ffffffff805954d9>]
unix_release_sock+0x79/0x240
[  103.263330] softirqs last disabled at (48818): [<ffffffff805c5b89>]
_write_lock_bh+0x9/0x30
[  103.263333]
[  103.263333] Call Trace:
[  103.263335]  [<ffffffff8024fc25>] down_read+0x15/0x40
[  103.263338]  [<ffffffff802507e6>] __blocking_notifier_call_chain+0x46/0x90
[  103.263341]  [<ffffffff80250f23>] pm_qos_add_requirement+0x93/0xf0
[  103.263344]  [<ffffffff804fdc4a>] snd_pcm_hw_params+0x2fa/0x380
[  103.263347]  [<ffffffff804fe93c>] snd_pcm_common_ioctl1+0xb4c/0xdc0
[  103.263350]  [<ffffffff8027b167>] __do_fault+0x227/0x470
[  103.263353]  [<ffffffff8025a435>] __lock_acquire+0x745/0x1140
[  103.263357]  [<ffffffff805c5e45>] _spin_unlock_irqrestore+0x55/0x70
[  103.263359]  [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
[  103.263362]  [<ffffffff804fee88>] snd_pcm_playback_ioctl1+0x48/0x240
[  103.263365]  [<ffffffff804ffa36>] snd_pcm_playback_ioctl+0x36/0x50
[  103.263367]  [<ffffffff802a80bf>] vfs_ioctl+0x2f/0xa0
[  103.263369]  [<ffffffff802a8390>] do_vfs_ioctl+0x260/0x2e0
[  103.263371]  [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
[  103.263373]  [<ffffffff802a84a1>] sys_ioctl+0x91/0xb0
[  103.263376]  [<ffffffff8020bc5e>] system_call+0x7e/0x83
[  103.263379]

Torsten

^ permalink raw reply

* Re: 85xx software reset problems from paulus.git
From: Kumar Gala @ 2007-11-17 19:52 UTC (permalink / raw)
  To: robert lazarski; +Cc: linuxppc-embedded
In-Reply-To: <f87675ee0711161401k600b658ao3d6ae572fb367c47@mail.gmail.com>


On Nov 16, 2007, at 4:01 PM, robert lazarski wrote:

> On Nov 16, 2007 4:46 PM, Kumar Gala <galak@kernel.crashing.org> wrote:
>>
>>
>> On Nov 16, 2007, at 3:28 PM, robert lazarski wrote:
>>
>>> On Nov 16, 2007 3:44 PM, robert lazarski <robertlazarski@gmail.com>
>>> wrote:
>>>> On Nov 16, 2007 10:27 AM, Clemens Koller
>>>> <clemens.koller@anagramm.de> wrote:
>>>>> The SRESET# (pin AF20) is the soft reset input, causes
>>>>> an mcp assertion to the core.... (RTFM)
>>>>>
>>>>
>>>> That's what we are doing. The 85xx docs say "Soft reset. Causes a
>>>> machine check interrupt to the e500 core. Note that if the e500  
>>>> core
>>>> is not configured to process machine check interrupts, the  
>>>> assertion
>>>> of SRESET causes a core checkstop. SRESET need not be asserted  
>>>> during
>>>> a hard reset."
>>>>
>>>
>>> Sorry for replying to myself, but thought I'd mention SRESET works
>>> fine on 85xx 2.6.23 , ie, the board resets after kernel panic. It
>>> doesn't work for me on 2.6.24rc2 .
>>
>> What actual 85xx are you using?
>>
>> - k
>>
>
> Custom 8548 board. I'm using the cds 85xx code for a reference and I
> calling the same reset functions.
>
1. do you have the following in your dts:

                 global-utilities@e0000 {        //global utilities reg
                         compatible = "fsl,mpc8548-guts";
                         reg = <e0000 1000>;
                         fsl,has-rstcr;
                 };


2. in your platform code are you using fsl_rstcr_restart in  
define_machine()

- k

^ permalink raw reply

* Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
From: Torsten Kaiser @ 2007-11-17 20:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Trond Myklebust, LKML, Kamalesh Babulal, linuxppc-dev, nfs,
	Andrew Morton, Jan Blunck, Balbir Singh
In-Reply-To: <Pine.LNX.4.64.0711171128530.7986@schroedinger.engr.sgi.com>

On Nov 17, 2007 8:33 PM, Christoph Lameter <clameter@sgi.com> wrote:
> On Sat, 17 Nov 2007, Andrew Morton wrote:
>
> > That's slub.  It appears that list_lock is being taken from process context
> > in one place and from softirq in another.
>
> I kicked out some weird interrupt disable code in mm that was only run during
> NUMA bootstrap.

I'm using NUMA (Opteron), so this indeed fixes it.

A kernel complied with SLUB now outputs the same message as the SLAB
one, that lockdep annotations are needed at the place where nfs hangs.

> This should fix it but isnt there some mechanism to convince lockdep that
> it is okay to do these things during bootstrap?
>
> ---
>  mm/slub.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c    2007-11-17 11:31:21.044136631 -0800
> +++ linux-2.6/mm/slub.c 2007-11-17 11:32:17.364386560 -0800
> @@ -2044,7 +2044,9 @@ static struct kmem_cache_node *early_kme
>  #endif
>         init_kmem_cache_node(n);
>         atomic_long_inc(&n->nr_slabs);
> +       local_irq_disable();
>         add_partial(kmalloc_caches, page, 0);
> +       local_irq_enable();
>         return n;
>  }
>
>
>

^ permalink raw reply

* Makefile FLAGS typoes??
From: Robert P. J. Day @ 2007-11-17 20:32 UTC (permalink / raw)
  To: Linux PPC Mailing List

  from arch/powerpc/Makefile:
...
KBUILD_CPPFLAGS += $(CPPFLAGS-y)
KBUILD_AFLAGS   += $(AFLAGS-y)
KBUILD_CFLAGS   += -msoft-float -pipe $(CFLAGS-y)
...

  those right-hand side variables don't look right.  are you sure they
shouldn't be, say, CFLAGS, or ccflags-y?  etc, etc.

rday

========================================================================
Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://crashcourse.ca
========================================================================

^ permalink raw reply

* Re: multiprocessor
From: Grant Likely @ 2007-11-17 21:53 UTC (permalink / raw)
  To: keng_629; +Cc: linuxppc-embedded
In-Reply-To: <200711171445384213369@126.com>

On 11/16/07, keng_629 <keng_629@126.com> wrote:
>
>
> i am trying to make a exam about Multiprocessor on the Xilinx Virtex-4 with
> two PowerPc hard core.
> please give me some advices about  bootloader and os.
> how can i start my work.

I'm sorry, but I really don't understand what you're asking about.
Both u-boot and Linux run on the ppc405 cores that the Virtex-4 uses.
However, there is no cache coherency between the cores so you would
need to run a separate Linux image on each one.  (SMP doesn't work)

Cheers,
g.

-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
grant.likely@secretlab.ca
(403) 399-0195

^ permalink raw reply

* [patch] PS3: Fix printing of os-area magic numbers
From: Geoff Levand @ 2007-11-17 22:24 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Geert Uytterhoeven, linuxppc-dev@ozlabs.org

Fix a bug in the printing of the PS3 os-area magic numbers which assumed that
magic numbers were zero terminated strings.  The magic numbers are represented
in memory as integers.  If the os-area sections are not initialized correctly
they could contained random data that would be printed to the display.

CC: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
---

Paul,

This fixes a very minor bug in linus' current tree.  Please consider
for 2.6.24.

-Geoff

 arch/powerpc/platforms/ps3/os-area.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

--- a/arch/powerpc/platforms/ps3/os-area.c
+++ b/arch/powerpc/platforms/ps3/os-area.c
@@ -269,8 +269,13 @@ static void __init os_area_get_property(
 static void _dump_header(const struct os_area_header *h, const char *func,
 	int line)
 {
+	u8 str[sizeof(h->magic_num) + 1];
+
+	memcpy(str, h->magic_num, sizeof(h->magic_num));
+	str[sizeof(h->magic_num)] = 0;
+
 	pr_debug("%s:%d: h.magic_num:       '%s'\n", func, line,
-		h->magic_num);
+		str);
 	pr_debug("%s:%d: h.hdr_version:     %u\n", func, line,
 		h->hdr_version);
 	pr_debug("%s:%d: h.db_area_offset:  %u\n", func, line,
@@ -484,8 +489,13 @@ static int db_get_rtc_diff(const struct 
 static void _dump_db(const struct os_area_db *db, const char *func,
 	int line)
 {
+	u8 str[sizeof(db->magic_num) + 1];
+
+	memcpy(str, &db->magic_num, sizeof(db->magic_num));
+	str[sizeof(db->magic_num)] = 0;
+
 	pr_debug("%s:%d: db.magic_num:      '%s'\n", func, line,
-		(const char*)&db->magic_num);
+		str);
 	pr_debug("%s:%d: db.version:         %u\n", func, line,
 		db->version);
 	pr_debug("%s:%d: db.index_64:        %u\n", func, line,

^ permalink raw reply

* Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
From: Peter Zijlstra @ 2007-11-17 23:05 UTC (permalink / raw)
  To: Torsten Kaiser
  Cc: Trond Myklebust, Peter Zijlstra, steved, LKML, Kamalesh Babulal,
	linuxppc-dev, nfs, Andrew Morton, Jan Blunck, Ingo Molnar,
	Balbir Singh
In-Reply-To: <64bb37e0711171140w5f1451e0qea081a4fbc7a45f7@mail.gmail.com>

On Sat, Nov 17, 2007 at 08:40:22PM +0100, Torsten Kaiser wrote:

> Lockdep triggers immedetly before the freeze, but the result is still
> not helpful:
> 
> [  221.565011] INFO: trying to register non-static key.
> [  221.566999] the code is fine but needs lockdep annotation.
> [  221.569206] turning off the locking correctness validator.
> [  221.571404]
> [  221.571405] Call Trace:
> [  221.572996]  [<ffffffff8025a1b4>] __lock_acquire+0x4c4/0x1140
> [  221.575298]  [<ffffffff8025ae85>] lock_acquire+0x55/0x70
> [  221.577429]  [<ffffffff8022d6fd>] __wake_up+0x2d/0x70
> [  221.579457]  [<ffffffff805c5f04>] _spin_lock_irqsave+0x34/0x50
> [  221.581800]  [<ffffffff805c5e45>] _spin_unlock_irqrestore+0x55/0x70
> [  221.584317]  [<ffffffff8022d6fd>] __wake_up+0x2d/0x70
> [  221.586344]  [<ffffffff805a88b0>] rpc_async_schedule+0x0/0x10
> [  221.588648]  [<ffffffff802fface>] nfs_free_unlinkdata+0x1e/0x50
> [  221.591023]  [<ffffffff805a7e96>] rpc_release_calldata+0x26/0x50
> [  221.593428]  [<ffffffff8024778f>] run_workqueue+0x16f/0x210
> [  221.595662]  [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
> [  221.598004]  [<ffffffff802483d0>] worker_thread+0x0/0xb0
> [  221.600130]  [<ffffffff802483d0>] worker_thread+0x0/0xb0
> [  221.602265]  [<ffffffff8024843d>] worker_thread+0x6d/0xb0
> [  221.604431]  [<ffffffff8024bfc0>] autoremove_wake_function+0x0/0x30
> [  221.606939]  [<ffffffff802483d0>] worker_thread+0x0/0xb0
> [  221.609067]  [<ffffffff802483d0>] worker_thread+0x0/0xb0
> [  221.611199]  [<ffffffff8024bbeb>] kthread+0x4b/0x80
> [  221.613156]  [<ffffffff8020cb98>] child_rip+0xa/0x12
> [  221.615151]  [<ffffffff8020c2af>] restore_args+0x0/0x30
> [  221.617247]  [<ffffffff8024bba0>] kthread+0x0/0x80
> [  221.619162]  [<ffffffff8020cb8e>] child_rip+0x0/0x12
> [  221.621147]
> [  221.621749] INFO: lockdep is turned off.

I've been staring at this NFS code for a while an can't make any sense
out of it. It seems to correctly initialize the waitqueue. So this would
indicate corruption of some sort.



> I also had another BUG output during system startup, but that should
> be unrelated:
> [  103.254681] BUG: sleeping function called from invalid context at
> kernel/rwsem.c:20
> [  103.257757] in_atomic():0, irqs_disabled():1
> [  103.259469] 1 lock held by artsd/5883:
> [  103.259470]  #0:  (pm_qos_lock){....}, at: [<ffffffff80250efb>]
> pm_qos_add_requirement+0x6b/0xf0
> [  103.263316] irq event stamp: 49712
> [  103.263318] hardirqs last  enabled at (49711): [<ffffffff802941ed>]
> __kmalloc+0x10d/0x180
> [  103.263321] hardirqs last disabled at (49712): [<ffffffff805c5eea>]
> _spin_lock_irqsave+0x1a/0x50
> [  103.263326] softirqs last  enabled at (48820): [<ffffffff805954d9>]
> unix_release_sock+0x79/0x240
> [  103.263330] softirqs last disabled at (48818): [<ffffffff805c5b89>]
> _write_lock_bh+0x9/0x30
> [  103.263333]
> [  103.263333] Call Trace:
> [  103.263335]  [<ffffffff8024fc25>] down_read+0x15/0x40
> [  103.263338]  [<ffffffff802507e6>] __blocking_notifier_call_chain+0x46/0x90
> [  103.263341]  [<ffffffff80250f23>] pm_qos_add_requirement+0x93/0xf0
> [  103.263344]  [<ffffffff804fdc4a>] snd_pcm_hw_params+0x2fa/0x380
> [  103.263347]  [<ffffffff804fe93c>] snd_pcm_common_ioctl1+0xb4c/0xdc0
> [  103.263350]  [<ffffffff8027b167>] __do_fault+0x227/0x470
> [  103.263353]  [<ffffffff8025a435>] __lock_acquire+0x745/0x1140
> [  103.263357]  [<ffffffff805c5e45>] _spin_unlock_irqrestore+0x55/0x70
> [  103.263359]  [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
> [  103.263362]  [<ffffffff804fee88>] snd_pcm_playback_ioctl1+0x48/0x240
> [  103.263365]  [<ffffffff804ffa36>] snd_pcm_playback_ioctl+0x36/0x50
> [  103.263367]  [<ffffffff802a80bf>] vfs_ioctl+0x2f/0xa0
> [  103.263369]  [<ffffffff802a8390>] do_vfs_ioctl+0x260/0x2e0
> [  103.263371]  [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
> [  103.263373]  [<ffffffff802a84a1>] sys_ioctl+0x91/0xb0
> [  103.263376]  [<ffffffff8020bc5e>] system_call+0x7e/0x83
> [  103.263379]

This pm-qos code is fubar, it calls blocking_notifier_call_chain while
holding a spinlock (and that is after 'fixing' it from a
srcu_notifier_call_chain - which is equally wrong).

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox