From mboxrd@z Thu Jan 1 00:00:00 1970 From: arnd@arndb.de (Arnd Bergmann) Date: Tue, 31 May 2016 15:12:30 +0200 Subject: [BUG] CONFIG_UNINLINE_SPIN_UNLOCK important for Cortex-A9 In-Reply-To: <20160531121640.GU19428@n2100.arm.linux.org.uk> References: <874m9eoetu.fsf@gmail.com> <6689734.3ffZe38SoY@wuerfel> <20160531121640.GU19428@n2100.arm.linux.org.uk> Message-ID: <11929103.x2FTA7ihIh@wuerfel> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Tuesday, May 31, 2016 1:16:40 PM CEST Russell King - ARM Linux wrote: > > > > [17827.766279] pgd = ee09c000 > > > [17827.769003] [00001014] *pgd=3eba3831, *pte=00000000, *ppte=00000000 > > > [17827.775383] Internal error: Oops: 17 [#1] SMP ARM > > > [17827.780108] Modules linked in: usbhid btusb btrtl btbcm btintel bluetooth flexcan smsc95xx usbnet mii ptxc(O) > > > [17827.790242] CPU: 1 PID: 372 Comm: stress-ng-socke Tainted: G O 4.5.4 #1 > > > [17827.797995] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > > > [17827.804536] task: ed614780 ti: eebba000 task.ti: eebba000 > > > [17827.809977] PC is at __netif_receive_skb_core+0x328/0xa9c > > > > Unfortunately in the middle of a rather long function, and I don't > > see a spin_unlock in this function, in fact it's not even called > > with a spinlock held, so it must be something more indirect. > > On a kernel here, I have: > > 1290: e51b4058 ldr r4, [fp, #-88] ; 0xffffffa8 > ... > 12b0: e5943014 ldr r3, [r4, #20] > 12b4: e5b37054 ldr r7, [r3, #84]! ; 0x54 > 12b8: e1570003 cmp r7, r3 > 12bc: e2477014 sub r7, r7, #20 > 12c0: 0a00001f beq 1344 <__netif_receive_skb_core+0x3c0> > ... > 1314: e1a0300a mov r3, sl > 1318: e12fff3c blx ip > 131c: e51b4058 ldr r4, [fp, #-88] ; 0xffffffa8 > 1320: e1a02007 mov r2, r7 > 1324: e5971014 ldr r1, [r7, #20] > > So it's a list of some sort. fp, #-88 is the first arg, so that's > the struct sk_buff pointer. > > Adding debug info to the build, reveals that it's this: > > list_for_each_entry_rcu(ptype, &skb->dev->ptype_all, list) { > if (pt_prev) > ret = deliver_skb(skb, pt_prev, orig_dev); > pt_prev = ptype; > } > > specifically, the load is for __read_once_size() inside > list_for_each_entry_rcu(). Ok, so this is an rcu protected list that gets written to using the function void dev_add_pack(struct packet_type *pt) { struct list_head *head = ptype_head(pt); spin_lock(&ptype_lock); list_add_rcu(&pt->list, head); spin_unlock(&ptype_lock); } EXPORT_SYMBOL(dev_add_pack); and the respective __dev_remove_pack taking the same lock. These get called once for each network protocol (which basically should never change) and also for af_packet.c when registering a new listener. Somehow we managed to get an invalid entry in the list, which could be related to lots of af_packet registering/unregistering. Does the stress-ng test case do that? Do the other oops output logs have any relation to the above? Arnd