* [PATCH] powerpc/mobility: Fix node detach/rename problem @ 2018-07-29 13:11 Michael Bringmann 2018-07-29 16:31 ` kbuild test robot ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Michael Bringmann @ 2018-07-29 13:11 UTC (permalink / raw) To: linuxppc-dev Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler, Thomas Falcon During LPAR migration, the content of the device tree/sysfs may be updated including deletion and replacement of nodes in the tree. When nodes are added to the internal node structures, they are appended in FIFO order to a list of nodes maintained by the OF code APIs. When nodes are removed from the device tree, they are marked OF_DETACHED, but not actually deleted from the system to allow for pointers cached elsewhere in the kernel. The order and content of the entries in the list of nodes is not altered, though. During LPAR migration some common nodes are deleted and re-added e.g. "ibm,platform-facilities". If a node is re-added to the OF node lists, the of_attach_node function checks to make sure that the name + ibm,phandle of the to-be-added data is unique. As the previous copy of a re-added node is not modified beyond the addition of a bit flag, the code (1) finds the old copy, (2) prints a WARNING notice to the console, (3) renames the to-be-added node to avoid filename collisions within a directory, and (3) adds entries to the sysfs/kernfs. This patch fixes the 'migration add' problem by changing the stored 'phandle' of the OF_DETACHed node to 0 (reserved value for of_find_node_by_phandle), so that subsequent re-add operations, such as those during migration, do not find the detached node, do not observe duplicate names, do not rename them, and the extra WARNING notices are removed from the console output. In addition, it erases the 'name' field of the OF_DETACHED node, to prevent any future calls to of_find_node_by_name() or of_find_node_by_path() from matching this node. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> --- arch/powerpc/platforms/pseries/dlpar.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c index 2de0f0d..9d82c28 100644 --- a/arch/powerpc/platforms/pseries/dlpar.c +++ b/arch/powerpc/platforms/pseries/dlpar.c @@ -274,6 +274,9 @@ int dlpar_detach_node(struct device_node *dn) if (rc) return rc; + dn->phandle = 0; + memset(dn->name, 0, strlen(dn->name)); + return 0; } ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem 2018-07-29 13:11 [PATCH] powerpc/mobility: Fix node detach/rename problem Michael Bringmann @ 2018-07-29 16:31 ` kbuild test robot 2018-07-30 6:31 ` Michael Ellerman 2018-07-31 0:59 ` [PATCH] powerpc/mobility: Fix node detach/rename problem Tyrel Datwyler 2 siblings, 0 replies; 17+ messages in thread From: kbuild test robot @ 2018-07-29 16:31 UTC (permalink / raw) To: Michael Bringmann Cc: kbuild-all, linuxppc-dev, Nathan Fontenot, Michael Bringmann, Thomas Falcon, Tyrel Datwyler, John Allen [-- Attachment #1: Type: text/plain, Size: 2459 bytes --] Hi Michael, Thank you for the patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v4.18-rc6 next-20180727] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-mobility-Fix-node-detach-rename-problem/20180729-213517 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-defconfig (attached as .config) compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=powerpc All errors (new ones prefixed by >>): arch/powerpc/platforms/pseries/dlpar.c: In function 'dlpar_detach_node': >> arch/powerpc/platforms/pseries/dlpar.c:276:9: error: passing argument 1 of 'memset' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] memset(dn->name, 0, strlen(dn->name)); ^~ In file included from include/linux/string.h:20:0, from arch/powerpc/include/asm/paca.h:19, from arch/powerpc/include/asm/current.h:16, from include/linux/mutex.h:14, from include/linux/notifier.h:14, from arch/powerpc/platforms/pseries/dlpar.c:16: arch/powerpc/include/asm/string.h:23:15: note: expected 'void *' but argument is of type 'const char *' extern void * memset(void *,int,__kernel_size_t); ^~~~~~ cc1: all warnings being treated as errors vim +276 arch/powerpc/platforms/pseries/dlpar.c 259 260 int dlpar_detach_node(struct device_node *dn) 261 { 262 struct device_node *child; 263 int rc; 264 265 child = of_get_next_child(dn, NULL); 266 while (child) { 267 dlpar_detach_node(child); 268 child = of_get_next_child(dn, child); 269 } 270 271 rc = of_detach_node(dn); 272 if (rc) 273 return rc; 274 275 dn->phandle = 0; > 276 memset(dn->name, 0, strlen(dn->name)); 277 278 return 0; 279 } 280 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 23683 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem 2018-07-29 13:11 [PATCH] powerpc/mobility: Fix node detach/rename problem Michael Bringmann 2018-07-29 16:31 ` kbuild test robot @ 2018-07-30 6:31 ` Michael Ellerman 2018-07-30 21:02 ` Michael Bringmann 2018-07-31 0:59 ` [PATCH] powerpc/mobility: Fix node detach/rename problem Tyrel Datwyler 2 siblings, 1 reply; 17+ messages in thread From: Michael Ellerman @ 2018-07-30 6:31 UTC (permalink / raw) To: Michael Bringmann, linuxppc-dev Cc: Nathan Fontenot, Michael Bringmann, Thomas Falcon, Tyrel Datwyler, John Allen Michael Bringmann <mwb@linux.vnet.ibm.com> writes: > During LPAR migration, the content of the device tree/sysfs may > be updated including deletion and replacement of nodes in the > tree. When nodes are added to the internal node structures, they > are appended in FIFO order to a list of nodes maintained by the > OF code APIs. That hasn't been true for several years. The data structure is an n-ary tree. What kernel version are you working on? > When nodes are removed from the device tree, they > are marked OF_DETACHED, but not actually deleted from the system > to allow for pointers cached elsewhere in the kernel. The order > and content of the entries in the list of nodes is not altered, > though. Something is going wrong if this is actually happening. When the node is detached it should be *detached* from the tree of all nodes, so it should not be discoverable other than by having an existing pointer to it. That's what __of_detach_node() does: parent = np->parent; if (WARN_ON(!parent)) return; if (parent->child == np) parent->child = np->sibling; else { struct device_node *prevsib; for (prevsib = np->parent->child; prevsib->sibling != np; prevsib = prevsib->sibling) ; prevsib->sibling = np->sibling; } ie. the node must already have a NULL parent, and then it is spliced out of its parent's child list. Please give us more info so we can work out what's actually happening. cheers ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem 2018-07-30 6:31 ` Michael Ellerman @ 2018-07-30 21:02 ` Michael Bringmann 2018-07-31 6:34 ` phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) Michael Ellerman 0 siblings, 1 reply; 17+ messages in thread From: Michael Bringmann @ 2018-07-30 21:02 UTC (permalink / raw) To: linuxppc-dev See below. On 07/30/2018 01:31 AM, Michael Ellerman wrote: > Michael Bringmann <mwb@linux.vnet.ibm.com> writes: > >> During LPAR migration, the content of the device tree/sysfs may >> be updated including deletion and replacement of nodes in the >> tree. When nodes are added to the internal node structures, they >> are appended in FIFO order to a list of nodes maintained by the >> OF code APIs. > > That hasn't been true for several years. The data structure is an n-ary > tree. What kernel version are you working on? Sorry for an error in my description. I oversimplified based on the name of a search iterator. Let me try to provide a better explanation of the problem, here. This is the problem. The PPC mobility code receives RTAS requests to delete nodes with platform-/hardware-specific attributes when restarting the kernel after a migration. My example is for migration between a P8 Alpine and a P8 Brazos. Nodes to be deleted may include 'ibm,random-v1', 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', or others. The mobility.c code calls 'of_detach_node' for the nodes and their children. This makes calls to detach the properties and to try to remove the associated sysfs/kernfs files. Then new copies of the same nodes are next provided by the PHYP, local copies are built, and a pointer to the 'struct device_node' is passed to of_attach_node. Before the call to of_attach_node, the phandle is initialized to 0 when the data structure is alloced. During the call to of_attach_node, it calls __of_attach_node which pulls the actual name and phandle from just created sub-properties named something like 'name' and 'ibm,phandle'. This is all fine for the first migration. The problem occurs with the second and subsequent migrations when the PHYP on the new system wants to replace the same set of nodes again, referenced with the same names and phandle values. > >> When nodes are removed from the device tree, they >> are marked OF_DETACHED, but not actually deleted from the system >> to allow for pointers cached elsewhere in the kernel. The order >> and content of the entries in the list of nodes is not altered, >> though. > > Something is going wrong if this is actually happening. > > When the node is detached it should be *detached* from the tree of all > nodes, so it should not be discoverable other than by having an existing > pointer to it. On the second and subsequent migrations, the PHYP tells the system to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these nodes by its known set of phandle values -- the same handles used by the PHYP on the source system are known on the target system. The mobility.c code calls of_find_node_by_phandle() with these values and ends up locating the first instance of each node that was added during the original boot, instead of the second instance of each node created after the first migration. The detach during the second migration fails with errors like, [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0 [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G W 4.18.0-rc1-wi107836-v05-120+ #201 [ 4565.030737] NIP: c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170 [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700 Tainted: G W (4.18.0-rc1-wi107836-v05-120+) [ 4565.030745] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22288822 XER: 0000000a [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1 [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68 [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843 [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000 [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001 [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930 [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0 [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0 [ 4565.030811] Call Trace: [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable) [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150 [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80 [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0 [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0 [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130 [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60 [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0 [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240 [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220 [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240 [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100 [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70 [ 4565.030884] Instruction dump: [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040 [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]--- The mobility.c code continues on during the second migration, accepts the definitions of the new nodes from the PHYP and ends up renaming the new properties e.g. [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1" I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within of_find_node_by_phandle to skip nodes that are detached, but still present due to caching or use count considerations. Another possibility to consider is that of_find_node_by_phandle also uses something called 'phandle_cache' which may have outdated data as of_detach_node() does not have access to that cache for the 'OF_DETACHED' nodes. > > That's what __of_detach_node() does: > > parent = np->parent; > if (WARN_ON(!parent)) > return; > > if (parent->child == np) > parent->child = np->sibling; > else { > struct device_node *prevsib; > for (prevsib = np->parent->child; > prevsib->sibling != np; > prevsib = prevsib->sibling) > ; > prevsib->sibling = np->sibling; > } > > ie. the node must already have a NULL parent, and then it is spliced out > of its parent's child list. > > > Please give us more info so we can work out what's actually happening. I duplicated the problem in the 4.18 raw kernel on the second and subsequent legs of migration of an LPAR back and forth between an Alpine to a Brazos. Again, entries like 'ibm,random-v1', 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', yield errors as described above. My patch was attempting to resolve the issue wholly within the powerpc code, at this time. > > cheers > Michael -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 mwb@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) 2018-07-30 21:02 ` Michael Bringmann @ 2018-07-31 6:34 ` Michael Ellerman 2018-07-31 14:17 ` Rob Herring 2018-07-31 19:11 ` Michael Bringmann 0 siblings, 2 replies; 17+ messages in thread From: Michael Ellerman @ 2018-07-31 6:34 UTC (permalink / raw) To: Michael Bringmann, linuxppc-dev, Rob Herring, Frank Rowand, devicetree Hi Rob/Frank, I think we might have a problem with the phandle_cache not interacting well with of_detach_node(): Michael Bringmann <mwb@linux.vnet.ibm.com> writes: > See below. > > On 07/30/2018 01:31 AM, Michael Ellerman wrote: >> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >> >>> During LPAR migration, the content of the device tree/sysfs may >>> be updated including deletion and replacement of nodes in the >>> tree. When nodes are added to the internal node structures, they >>> are appended in FIFO order to a list of nodes maintained by the >>> OF code APIs. >> >> That hasn't been true for several years. The data structure is an n-ary >> tree. What kernel version are you working on? > > Sorry for an error in my description. I oversimplified based on the > name of a search iterator. Let me try to provide a better explanation > of the problem, here. > > This is the problem. The PPC mobility code receives RTAS requests to > delete nodes with platform-/hardware-specific attributes when restarting > the kernel after a migration. My example is for migration between a > P8 Alpine and a P8 Brazos. Nodes to be deleted may include 'ibm,random-v1', > 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', > or others. > > The mobility.c code calls 'of_detach_node' for the nodes and their children. > This makes calls to detach the properties and to try to remove the associated > sysfs/kernfs files. > > Then new copies of the same nodes are next provided by the PHYP, local > copies are built, and a pointer to the 'struct device_node' is passed to > of_attach_node. Before the call to of_attach_node, the phandle is initialized > to 0 when the data structure is alloced. During the call to of_attach_node, > it calls __of_attach_node which pulls the actual name and phandle from just > created sub-properties named something like 'name' and 'ibm,phandle'. > > This is all fine for the first migration. The problem occurs with the > second and subsequent migrations when the PHYP on the new system wants to > replace the same set of nodes again, referenced with the same names and > phandle values. > >> >>> When nodes are removed from the device tree, they >>> are marked OF_DETACHED, but not actually deleted from the system >>> to allow for pointers cached elsewhere in the kernel. The order >>> and content of the entries in the list of nodes is not altered, >>> though. >> >> Something is going wrong if this is actually happening. >> >> When the node is detached it should be *detached* from the tree of all >> nodes, so it should not be discoverable other than by having an existing >> pointer to it. > On the second and subsequent migrations, the PHYP tells the system > to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', > 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these > nodes by its known set of phandle values -- the same handles used > by the PHYP on the source system are known on the target system. > The mobility.c code calls of_find_node_by_phandle() with these values > and ends up locating the first instance of each node that was added > during the original boot, instead of the second instance of each node > created after the first migration. The detach during the second > migration fails with errors like, > > [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0 > [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod > [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G W 4.18.0-rc1-wi107836-v05-120+ #201 > [ 4565.030737] NIP: c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170 > [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700 Tainted: G W (4.18.0-rc1-wi107836-v05-120+) > [ 4565.030745] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22288822 XER: 0000000a > [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1 > [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68 > [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff > [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843 > [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000 > [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff > [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001 > [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930 > [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0 > [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0 > [ 4565.030811] Call Trace: > [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable) > [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150 > [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80 > [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0 > [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0 > [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130 > [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60 > [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0 > [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240 > [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220 > [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240 > [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100 > [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70 > [ 4565.030884] Instruction dump: > [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 > [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040 > [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]--- > > The mobility.c code continues on during the second migration, accepts the > definitions of the new nodes from the PHYP and ends up renaming the new > properties e.g. > > [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1" > > I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within > of_find_node_by_phandle to skip nodes that are detached, but still present > due to caching or use count considerations. Another possibility to consider > is that of_find_node_by_phandle also uses something called 'phandle_cache' > which may have outdated data as of_detach_node() does not have access to > that cache for the 'OF_DETACHED' nodes. Yes the phandle_cache looks like it might be the problem. I saw of_free_phandle_cache() being called as late_initcall, but didn't realise that's only if MODULES is disabled. So I don't see anything that invalidates the phandle_cache when a node is removed. The right solution would be for __of_detach_node() to invalidate the phandle_cache for the node being detached. That's slightly complicated by the phandle_cache being static inside base.c To test the theory that it's the phandle_cache causing the problems can you try this patch: diff --git a/drivers/of/base.c b/drivers/of/base.c index 848f549164cd..60e219132e24 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle handle) if (phandle_cache[masked_handle] && handle == phandle_cache[masked_handle]->phandle) np = phandle_cache[masked_handle]; + + if (of_node_check_flag(np, OF_DETACHED)) + np = NULL; } if (!np) { cheers ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) 2018-07-31 6:34 ` phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) Michael Ellerman @ 2018-07-31 14:17 ` Rob Herring 2018-07-31 19:18 ` Frank Rowand 2018-07-31 19:11 ` Michael Bringmann 1 sibling, 1 reply; 17+ messages in thread From: Rob Herring @ 2018-07-31 14:17 UTC (permalink / raw) To: Michael Ellerman, Frank Rowand; +Cc: mwb, linuxppc-dev, devicetree On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrot= e: > > Hi Rob/Frank, > > I think we might have a problem with the phandle_cache not interacting > well with of_detach_node(): Probably needs a similar fix as this commit did for overlays: commit b9952b5218added5577e4a3443969bc20884cea9 Author: Frank Rowand <frank.rowand@sony.com> Date: Thu Jul 12 14:00:07 2018 -0700 of: overlay: update phandle cache on overlay apply and remove A comment in the review of the patch adding the phandle cache said that the cache would have to be updated when modules are applied and removed= . This patch implements the cache updates. Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") Reported-by: Alan Tull <atull@kernel.org> Suggested-by: Alan Tull <atull@kernel.org> Signed-off-by: Frank Rowand <frank.rowand@sony.com> Signed-off-by: Rob Herring <robh@kernel.org> Really what we need here is an "invalidate phandle" function rather than free and re-allocate the whole damn cache. Rob > > Michael Bringmann <mwb@linux.vnet.ibm.com> writes: > > See below. > > > > On 07/30/2018 01:31 AM, Michael Ellerman wrote: > >> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: > >> > >>> During LPAR migration, the content of the device tree/sysfs may > >>> be updated including deletion and replacement of nodes in the > >>> tree. When nodes are added to the internal node structures, they > >>> are appended in FIFO order to a list of nodes maintained by the > >>> OF code APIs. > >> > >> That hasn't been true for several years. The data structure is an n-ar= y > >> tree. What kernel version are you working on? > > > > Sorry for an error in my description. I oversimplified based on the > > name of a search iterator. Let me try to provide a better explanation > > of the problem, here. > > > > This is the problem. The PPC mobility code receives RTAS requests to > > delete nodes with platform-/hardware-specific attributes when restartin= g > > the kernel after a migration. My example is for migration between a > > P8 Alpine and a P8 Brazos. Nodes to be deleted may include 'ibm,rando= m-v1', > > 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1= ', > > or others. > > > > The mobility.c code calls 'of_detach_node' for the nodes and their chil= dren. > > This makes calls to detach the properties and to try to remove the asso= ciated > > sysfs/kernfs files. > > > > Then new copies of the same nodes are next provided by the PHYP, local > > copies are built, and a pointer to the 'struct device_node' is passed t= o > > of_attach_node. Before the call to of_attach_node, the phandle is init= ialized > > to 0 when the data structure is alloced. During the call to of_attach_= node, > > it calls __of_attach_node which pulls the actual name and phandle from = just > > created sub-properties named something like 'name' and 'ibm,phandle'. > > > > This is all fine for the first migration. The problem occurs with the > > second and subsequent migrations when the PHYP on the new system wants = to > > replace the same set of nodes again, referenced with the same names and > > phandle values. > > > >> > >>> When nodes are removed from the device tree, they > >>> are marked OF_DETACHED, but not actually deleted from the system > >>> to allow for pointers cached elsewhere in the kernel. The order > >>> and content of the entries in the list of nodes is not altered, > >>> though. > >> > >> Something is going wrong if this is actually happening. > >> > >> When the node is detached it should be *detached* from the tree of all > >> nodes, so it should not be discoverable other than by having an existi= ng > >> pointer to it. > > On the second and subsequent migrations, the PHYP tells the system > > to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', > > 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these > > nodes by its known set of phandle values -- the same handles used > > by the PHYP on the source system are known on the target system. > > The mobility.c code calls of_find_node_by_phandle() with these values > > and ends up locating the first instance of each node that was added > > during the original boot, instead of the second instance of each node > > created after the first migration. The detach during the second > > migration fails with errors like, > > > > [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __= of_detach_node+0x8/0xa0 > > [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag i= net_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc x= ts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibm= veth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod > > [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G W = 4.18.0-rc1-wi107836-v05-120+ #201 > > [ 4565.030737] NIP: c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000= 000655170 > > [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700 Tainted: G W = (4.18.0-rc1-wi107836-v05-120+) > > [ 4565.030745] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,= TM[E]> CR: 22288822 XER: 0000000a > > [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1 > > [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf0= 0 c0000003ffff8e68 > > [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b= 8 ffffffffffffffff > > [ 4565.030757] GPR08: 0000000000000000 0000000000000001 000000008000000= 3 0000000000002843 > > [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 000000004000000= 0 0000000000000000 > > [ 4565.030757] GPR16: 0000000000000000 0000000000000008 000000000000000= 0 00000000f6ffffff > > [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f03= 4 0000000000000001 > > [ 4565.030757] GPR24: 0000000000000000 0000000000000000 000000000000000= 0 0000000000000000 > > [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e6= 8 c0000003f302b930 > > [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0 > > [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0 > > [ 4565.030811] Call Trace: > > [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x6= 4/0xd0 (unreliable) > > [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+= 0xb4/0x150 > > [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3= c/0x80 > > [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree= _update+0x150/0x4f0 > > [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixu= p+0x7c/0xf0 > > [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0x= f8/0x130 > > [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x= 30/0x60 > > [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x6= 4/0xa0 > > [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0= x16c/0x240 > > [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0= x220 > > [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x2= 40 > > [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x= 100 > > [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0= x70 > > [ 4565.030884] Instruction dump: > > [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 eb= c1fff0 ebe1fff8 > > [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 = 4cde0020 e9030040 > > [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]--- > > > > The mobility.c code continues on during the second migration, accepts t= he > > definitions of the new nodes from the PHYP and ends up renaming the new > > properties e.g. > > > > [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilit= ies#1" > > > > I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within > > of_find_node_by_phandle to skip nodes that are detached, but still pres= ent > > due to caching or use count considerations. Another possibility to con= sider > > is that of_find_node_by_phandle also uses something called 'phandle_cac= he' > > which may have outdated data as of_detach_node() does not have access t= o > > that cache for the 'OF_DETACHED' nodes. > > Yes the phandle_cache looks like it might be the problem. > > I saw of_free_phandle_cache() being called as late_initcall, but didn't > realise that's only if MODULES is disabled. > > So I don't see anything that invalidates the phandle_cache when a node > is removed. > > The right solution would be for __of_detach_node() to invalidate the > phandle_cache for the node being detached. That's slightly complicated > by the phandle_cache being static inside base.c > > To test the theory that it's the phandle_cache causing the problems can > you try this patch: > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 848f549164cd..60e219132e24 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle= handle) > if (phandle_cache[masked_handle] && > handle =3D=3D phandle_cache[masked_handle]->phandle) > np =3D phandle_cache[masked_handle]; > + > + if (of_node_check_flag(np, OF_DETACHED)) > + np =3D NULL; > } > > if (!np) { > > cheers ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) 2018-07-31 14:17 ` Rob Herring @ 2018-07-31 19:18 ` Frank Rowand 2018-07-31 19:22 ` Frank Rowand ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Frank Rowand @ 2018-07-31 19:18 UTC (permalink / raw) To: Rob Herring, Michael Ellerman; +Cc: mwb, linuxppc-dev, devicetree On 07/31/18 07:17, Rob Herring wrote: > On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrote: >> >> Hi Rob/Frank, >> >> I think we might have a problem with the phandle_cache not interacting >> well with of_detach_node(): > > Probably needs a similar fix as this commit did for overlays: > > commit b9952b5218added5577e4a3443969bc20884cea9 > Author: Frank Rowand <frank.rowand@sony.com> > Date: Thu Jul 12 14:00:07 2018 -0700 > > of: overlay: update phandle cache on overlay apply and remove > > A comment in the review of the patch adding the phandle cache said that > the cache would have to be updated when modules are applied and removed. > This patch implements the cache updates. > > Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of > of_find_node_by_phandle()") > Reported-by: Alan Tull <atull@kernel.org> > Suggested-by: Alan Tull <atull@kernel.org> > Signed-off-by: Frank Rowand <frank.rowand@sony.com> > Signed-off-by: Rob Herring <robh@kernel.org> Agreed. Sorry about missing the of_detach_node() case. > Really what we need here is an "invalidate phandle" function rather > than free and re-allocate the whole damn cache. The big hammer approach was chosen to avoid the race conditions that would otherwise occur. OF does not have a locking strategy that would be able to protect against the races. We could maybe implement a slightly smaller hammer by (1) disabling the cache, (2) invalidate a phandle entry in the cache, (3) re-enable the cache. That is an off the cuff thought - I would have to look a little bit more carefully to make sure it would work. But I don't see a need to add the complexity of the smaller hammer or the bigger hammer of proper locking _unless_ we start seeing that the cache is being freed and re-allocated frequently. For overlays I don't expect the high frequency because it happens on a per overlay removal basis (not per node removal basis). For of_detach_node() the event _is_ on a per node removal basis. Michael, do you expect node removals to be a frequent event with low latency being important? If so, a rough guess on what the frequency would be? -Frank > Rob > >> >> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >>> See below. >>> >>> On 07/30/2018 01:31 AM, Michael Ellerman wrote: >>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >>>> >>>>> During LPAR migration, the content of the device tree/sysfs may >>>>> be updated including deletion and replacement of nodes in the >>>>> tree. When nodes are added to the internal node structures, they >>>>> are appended in FIFO order to a list of nodes maintained by the >>>>> OF code APIs. >>>> >>>> That hasn't been true for several years. The data structure is an n-ary >>>> tree. What kernel version are you working on? >>> >>> Sorry for an error in my description. I oversimplified based on the >>> name of a search iterator. Let me try to provide a better explanation >>> of the problem, here. >>> >>> This is the problem. The PPC mobility code receives RTAS requests to >>> delete nodes with platform-/hardware-specific attributes when restarting >>> the kernel after a migration. My example is for migration between a >>> P8 Alpine and a P8 Brazos. Nodes to be deleted may include 'ibm,random-v1', >>> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', >>> or others. >>> >>> The mobility.c code calls 'of_detach_node' for the nodes and their children. >>> This makes calls to detach the properties and to try to remove the associated >>> sysfs/kernfs files. >>> >>> Then new copies of the same nodes are next provided by the PHYP, local >>> copies are built, and a pointer to the 'struct device_node' is passed to >>> of_attach_node. Before the call to of_attach_node, the phandle is initialized >>> to 0 when the data structure is alloced. During the call to of_attach_node, >>> it calls __of_attach_node which pulls the actual name and phandle from just >>> created sub-properties named something like 'name' and 'ibm,phandle'. >>> >>> This is all fine for the first migration. The problem occurs with the >>> second and subsequent migrations when the PHYP on the new system wants to >>> replace the same set of nodes again, referenced with the same names and >>> phandle values. >>> >>>> >>>>> When nodes are removed from the device tree, they >>>>> are marked OF_DETACHED, but not actually deleted from the system >>>>> to allow for pointers cached elsewhere in the kernel. The order >>>>> and content of the entries in the list of nodes is not altered, >>>>> though. >>>> >>>> Something is going wrong if this is actually happening. >>>> >>>> When the node is detached it should be *detached* from the tree of all >>>> nodes, so it should not be discoverable other than by having an existing >>>> pointer to it. >>> On the second and subsequent migrations, the PHYP tells the system >>> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', >>> 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these >>> nodes by its known set of phandle values -- the same handles used >>> by the PHYP on the source system are known on the target system. >>> The mobility.c code calls of_find_node_by_phandle() with these values >>> and ends up locating the first instance of each node that was added >>> during the original boot, instead of the second instance of each node >>> created after the first migration. The detach during the second >>> migration fails with errors like, >>> >>> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0 >>> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod >>> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G W 4.18.0-rc1-wi107836-v05-120+ #201 >>> [ 4565.030737] NIP: c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170 >>> [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700 Tainted: G W (4.18.0-rc1-wi107836-v05-120+) >>> [ 4565.030745] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22288822 XER: 0000000a >>> [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1 >>> [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68 >>> [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff >>> [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843 >>> [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000 >>> [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff >>> [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001 >>> [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930 >>> [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0 >>> [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0 >>> [ 4565.030811] Call Trace: >>> [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable) >>> [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150 >>> [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80 >>> [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0 >>> [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0 >>> [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130 >>> [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60 >>> [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0 >>> [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240 >>> [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220 >>> [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240 >>> [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100 >>> [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70 >>> [ 4565.030884] Instruction dump: >>> [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 >>> [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040 >>> [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]--- >>> >>> The mobility.c code continues on during the second migration, accepts the >>> definitions of the new nodes from the PHYP and ends up renaming the new >>> properties e.g. >>> >>> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1" >>> >>> I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within >>> of_find_node_by_phandle to skip nodes that are detached, but still present >>> due to caching or use count considerations. Another possibility to consider >>> is that of_find_node_by_phandle also uses something called 'phandle_cache' >>> which may have outdated data as of_detach_node() does not have access to >>> that cache for the 'OF_DETACHED' nodes. >> >> Yes the phandle_cache looks like it might be the problem. >> >> I saw of_free_phandle_cache() being called as late_initcall, but didn't >> realise that's only if MODULES is disabled. >> >> So I don't see anything that invalidates the phandle_cache when a node >> is removed. >> >> The right solution would be for __of_detach_node() to invalidate the >> phandle_cache for the node being detached. That's slightly complicated >> by the phandle_cache being static inside base.c >> >> To test the theory that it's the phandle_cache causing the problems can >> you try this patch: >> >> diff --git a/drivers/of/base.c b/drivers/of/base.c >> index 848f549164cd..60e219132e24 100644 >> --- a/drivers/of/base.c >> +++ b/drivers/of/base.c >> @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle handle) >> if (phandle_cache[masked_handle] && >> handle == phandle_cache[masked_handle]->phandle) >> np = phandle_cache[masked_handle]; >> + >> + if (of_node_check_flag(np, OF_DETACHED)) >> + np = NULL; >> } >> >> if (!np) { >> >> cheers > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) 2018-07-31 19:18 ` Frank Rowand @ 2018-07-31 19:22 ` Frank Rowand 2018-07-31 19:26 ` Michael Bringmann 2018-08-01 14:26 ` Michael Ellerman 2 siblings, 0 replies; 17+ messages in thread From: Frank Rowand @ 2018-07-31 19:22 UTC (permalink / raw) To: Rob Herring, Michael Ellerman; +Cc: mwb, linuxppc-dev, devicetree On 07/31/18 12:18, Frank Rowand wrote: > On 07/31/18 07:17, Rob Herring wrote: >> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrote: >>> >>> Hi Rob/Frank, >>> >>> I think we might have a problem with the phandle_cache not interacting >>> well with of_detach_node(): >> >> Probably needs a similar fix as this commit did for overlays: >> >> commit b9952b5218added5577e4a3443969bc20884cea9 >> Author: Frank Rowand <frank.rowand@sony.com> >> Date: Thu Jul 12 14:00:07 2018 -0700 >> >> of: overlay: update phandle cache on overlay apply and remove >> >> A comment in the review of the patch adding the phandle cache said that >> the cache would have to be updated when modules are applied and removed. >> This patch implements the cache updates. >> >> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of >> of_find_node_by_phandle()") >> Reported-by: Alan Tull <atull@kernel.org> >> Suggested-by: Alan Tull <atull@kernel.org> >> Signed-off-by: Frank Rowand <frank.rowand@sony.com> >> Signed-off-by: Rob Herring <robh@kernel.org> > > Agreed. Sorry about missing the of_detach_node() case. > > >> Really what we need here is an "invalidate phandle" function rather >> than free and re-allocate the whole damn cache. > > The big hammer approach was chosen to avoid the race conditions that > would otherwise occur. OF does not have a locking strategy that > would be able to protect against the races. > > We could maybe implement a slightly smaller hammer by (1) disabling > the cache, (2) invalidate a phandle entry in the cache, (3) re-enable > the cache. That is an off the cuff thought - I would have to look > a little bit more carefully to make sure it would work. > > But I don't see a need to add the complexity of the smaller hammer > or the bigger hammer of proper locking _unless_ we start seeing that > the cache is being freed and re-allocated frequently. For overlays > I don't expect the high frequency because it happens on a per overlay > removal basis (not per node removal basis). > For of_detach_node() the > event _is_ on a per node removal basis. Michael, do you expect node > removals to be a frequent event with low latency being important? If > so, a rough guess on what the frequency would be? I have not looked at how of_detach_node() is used, so it might not be very different that overlays. If a group of of_detach_node() calls are made from a common code location, the the sequence could possibly be: of_free_phandle_cache() multiple calls of of_detach_node() of_populate_phandle_cache() -Frank > > -Frank > > >> Rob >> >>> >>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >>>> See below. >>>> >>>> On 07/30/2018 01:31 AM, Michael Ellerman wrote: >>>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >>>>> >>>>>> During LPAR migration, the content of the device tree/sysfs may >>>>>> be updated including deletion and replacement of nodes in the >>>>>> tree. When nodes are added to the internal node structures, they >>>>>> are appended in FIFO order to a list of nodes maintained by the >>>>>> OF code APIs. >>>>> >>>>> That hasn't been true for several years. The data structure is an n-ary >>>>> tree. What kernel version are you working on? >>>> >>>> Sorry for an error in my description. I oversimplified based on the >>>> name of a search iterator. Let me try to provide a better explanation >>>> of the problem, here. >>>> >>>> This is the problem. The PPC mobility code receives RTAS requests to >>>> delete nodes with platform-/hardware-specific attributes when restarting >>>> the kernel after a migration. My example is for migration between a >>>> P8 Alpine and a P8 Brazos. Nodes to be deleted may include 'ibm,random-v1', >>>> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', >>>> or others. >>>> >>>> The mobility.c code calls 'of_detach_node' for the nodes and their children. >>>> This makes calls to detach the properties and to try to remove the associated >>>> sysfs/kernfs files. >>>> >>>> Then new copies of the same nodes are next provided by the PHYP, local >>>> copies are built, and a pointer to the 'struct device_node' is passed to >>>> of_attach_node. Before the call to of_attach_node, the phandle is initialized >>>> to 0 when the data structure is alloced. During the call to of_attach_node, >>>> it calls __of_attach_node which pulls the actual name and phandle from just >>>> created sub-properties named something like 'name' and 'ibm,phandle'. >>>> >>>> This is all fine for the first migration. The problem occurs with the >>>> second and subsequent migrations when the PHYP on the new system wants to >>>> replace the same set of nodes again, referenced with the same names and >>>> phandle values. >>>> >>>>> >>>>>> When nodes are removed from the device tree, they >>>>>> are marked OF_DETACHED, but not actually deleted from the system >>>>>> to allow for pointers cached elsewhere in the kernel. The order >>>>>> and content of the entries in the list of nodes is not altered, >>>>>> though. >>>>> >>>>> Something is going wrong if this is actually happening. >>>>> >>>>> When the node is detached it should be *detached* from the tree of all >>>>> nodes, so it should not be discoverable other than by having an existing >>>>> pointer to it. >>>> On the second and subsequent migrations, the PHYP tells the system >>>> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', >>>> 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these >>>> nodes by its known set of phandle values -- the same handles used >>>> by the PHYP on the source system are known on the target system. >>>> The mobility.c code calls of_find_node_by_phandle() with these values >>>> and ends up locating the first instance of each node that was added >>>> during the original boot, instead of the second instance of each node >>>> created after the first migration. The detach during the second >>>> migration fails with errors like, >>>> >>>> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0 >>>> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod >>>> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G W 4.18.0-rc1-wi107836-v05-120+ #201 >>>> [ 4565.030737] NIP: c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170 >>>> [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700 Tainted: G W (4.18.0-rc1-wi107836-v05-120+) >>>> [ 4565.030745] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22288822 XER: 0000000a >>>> [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1 >>>> [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68 >>>> [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff >>>> [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843 >>>> [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000 >>>> [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff >>>> [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001 >>>> [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930 >>>> [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0 >>>> [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0 >>>> [ 4565.030811] Call Trace: >>>> [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable) >>>> [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150 >>>> [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80 >>>> [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0 >>>> [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0 >>>> [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130 >>>> [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60 >>>> [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0 >>>> [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240 >>>> [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220 >>>> [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240 >>>> [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100 >>>> [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70 >>>> [ 4565.030884] Instruction dump: >>>> [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 >>>> [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040 >>>> [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]--- >>>> >>>> The mobility.c code continues on during the second migration, accepts the >>>> definitions of the new nodes from the PHYP and ends up renaming the new >>>> properties e.g. >>>> >>>> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1" >>>> >>>> I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within >>>> of_find_node_by_phandle to skip nodes that are detached, but still present >>>> due to caching or use count considerations. Another possibility to consider >>>> is that of_find_node_by_phandle also uses something called 'phandle_cache' >>>> which may have outdated data as of_detach_node() does not have access to >>>> that cache for the 'OF_DETACHED' nodes. >>> >>> Yes the phandle_cache looks like it might be the problem. >>> >>> I saw of_free_phandle_cache() being called as late_initcall, but didn't >>> realise that's only if MODULES is disabled. >>> >>> So I don't see anything that invalidates the phandle_cache when a node >>> is removed. >>> >>> The right solution would be for __of_detach_node() to invalidate the >>> phandle_cache for the node being detached. That's slightly complicated >>> by the phandle_cache being static inside base.c >>> >>> To test the theory that it's the phandle_cache causing the problems can >>> you try this patch: >>> >>> diff --git a/drivers/of/base.c b/drivers/of/base.c >>> index 848f549164cd..60e219132e24 100644 >>> --- a/drivers/of/base.c >>> +++ b/drivers/of/base.c >>> @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle handle) >>> if (phandle_cache[masked_handle] && >>> handle == phandle_cache[masked_handle]->phandle) >>> np = phandle_cache[masked_handle]; >>> + >>> + if (of_node_check_flag(np, OF_DETACHED)) >>> + np = NULL; >>> } >>> >>> if (!np) { >>> >>> cheers >> > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) 2018-07-31 19:18 ` Frank Rowand 2018-07-31 19:22 ` Frank Rowand @ 2018-07-31 19:26 ` Michael Bringmann 2018-08-01 14:26 ` Michael Ellerman 2 siblings, 0 replies; 17+ messages in thread From: Michael Bringmann @ 2018-07-31 19:26 UTC (permalink / raw) To: linuxppc-dev On 07/31/2018 02:18 PM, Frank Rowand wrote: > On 07/31/18 07:17, Rob Herring wrote: >> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrote: >>> >>> Hi Rob/Frank, >>> >>> I think we might have a problem with the phandle_cache not interacting >>> well with of_detach_node(): >> >> Probably needs a similar fix as this commit did for overlays: >> >> commit b9952b5218added5577e4a3443969bc20884cea9 >> Author: Frank Rowand <frank.rowand@sony.com> >> Date: Thu Jul 12 14:00:07 2018 -0700 >> >> of: overlay: update phandle cache on overlay apply and remove >> >> A comment in the review of the patch adding the phandle cache said that >> the cache would have to be updated when modules are applied and removed. >> This patch implements the cache updates. >> >> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of >> of_find_node_by_phandle()") >> Reported-by: Alan Tull <atull@kernel.org> >> Suggested-by: Alan Tull <atull@kernel.org> >> Signed-off-by: Frank Rowand <frank.rowand@sony.com> >> Signed-off-by: Rob Herring <robh@kernel.org> > > Agreed. Sorry about missing the of_detach_node() case. > > >> Really what we need here is an "invalidate phandle" function rather >> than free and re-allocate the whole damn cache. > > The big hammer approach was chosen to avoid the race conditions that > would otherwise occur. OF does not have a locking strategy that > would be able to protect against the races. > > We could maybe implement a slightly smaller hammer by (1) disabling > the cache, (2) invalidate a phandle entry in the cache, (3) re-enable > the cache. That is an off the cuff thought - I would have to look > a little bit more carefully to make sure it would work. > > But I don't see a need to add the complexity of the smaller hammer > or the bigger hammer of proper locking _unless_ we start seeing that > the cache is being freed and re-allocated frequently. For overlays > I don't expect the high frequency because it happens on a per overlay > removal basis (not per node removal basis). For of_detach_node() the > event _is_ on a per node removal basis. Michael, do you expect node > removals to be a frequent event with low latency being important? If > so, a rough guess on what the frequency would be? I am only seeing node removals during startup of the kernel after an LPAR migration event. For the tests that I have run between a couple of P8 platforms, I see about 15 node removals split between, 8 l2-caches, 4 hardware-specific property clusters (e.g. ibm,platform-facilities), and 3 CPUs. > > -Frank Michael > > >> Rob >> >>> >>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >>>> See below. >>>> >>>> On 07/30/2018 01:31 AM, Michael Ellerman wrote: >>>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >>>>> >>>>>> During LPAR migration, the content of the device tree/sysfs may >>>>>> be updated including deletion and replacement of nodes in the >>>>>> tree. When nodes are added to the internal node structures, they >>>>>> are appended in FIFO order to a list of nodes maintained by the >>>>>> OF code APIs. >>>>> >>>>> That hasn't been true for several years. The data structure is an n-ary >>>>> tree. What kernel version are you working on? >>>> >>>> Sorry for an error in my description. I oversimplified based on the >>>> name of a search iterator. Let me try to provide a better explanation >>>> of the problem, here. >>>> >>>> This is the problem. The PPC mobility code receives RTAS requests to >>>> delete nodes with platform-/hardware-specific attributes when restarting >>>> the kernel after a migration. My example is for migration between a >>>> P8 Alpine and a P8 Brazos. Nodes to be deleted may include 'ibm,random-v1', >>>> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', >>>> or others. >>>> >>>> The mobility.c code calls 'of_detach_node' for the nodes and their children. >>>> This makes calls to detach the properties and to try to remove the associated >>>> sysfs/kernfs files. >>>> >>>> Then new copies of the same nodes are next provided by the PHYP, local >>>> copies are built, and a pointer to the 'struct device_node' is passed to >>>> of_attach_node. Before the call to of_attach_node, the phandle is initialized >>>> to 0 when the data structure is alloced. During the call to of_attach_node, >>>> it calls __of_attach_node which pulls the actual name and phandle from just >>>> created sub-properties named something like 'name' and 'ibm,phandle'. >>>> >>>> This is all fine for the first migration. The problem occurs with the >>>> second and subsequent migrations when the PHYP on the new system wants to >>>> replace the same set of nodes again, referenced with the same names and >>>> phandle values. >>>> >>>>> >>>>>> When nodes are removed from the device tree, they >>>>>> are marked OF_DETACHED, but not actually deleted from the system >>>>>> to allow for pointers cached elsewhere in the kernel. The order >>>>>> and content of the entries in the list of nodes is not altered, >>>>>> though. >>>>> >>>>> Something is going wrong if this is actually happening. >>>>> >>>>> When the node is detached it should be *detached* from the tree of all >>>>> nodes, so it should not be discoverable other than by having an existing >>>>> pointer to it. >>>> On the second and subsequent migrations, the PHYP tells the system >>>> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', >>>> 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these >>>> nodes by its known set of phandle values -- the same handles used >>>> by the PHYP on the source system are known on the target system. >>>> The mobility.c code calls of_find_node_by_phandle() with these values >>>> and ends up locating the first instance of each node that was added >>>> during the original boot, instead of the second instance of each node >>>> created after the first migration. The detach during the second >>>> migration fails with errors like, >>>> >>>> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0 >>>> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod >>>> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G W 4.18.0-rc1-wi107836-v05-120+ #201 >>>> [ 4565.030737] NIP: c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170 >>>> [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700 Tainted: G W (4.18.0-rc1-wi107836-v05-120+) >>>> [ 4565.030745] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22288822 XER: 0000000a >>>> [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1 >>>> [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68 >>>> [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff >>>> [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843 >>>> [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000 >>>> [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff >>>> [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001 >>>> [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>>> [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930 >>>> [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0 >>>> [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0 >>>> [ 4565.030811] Call Trace: >>>> [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable) >>>> [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150 >>>> [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80 >>>> [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0 >>>> [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0 >>>> [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130 >>>> [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60 >>>> [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0 >>>> [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240 >>>> [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220 >>>> [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240 >>>> [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100 >>>> [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70 >>>> [ 4565.030884] Instruction dump: >>>> [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 >>>> [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040 >>>> [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]--- >>>> >>>> The mobility.c code continues on during the second migration, accepts the >>>> definitions of the new nodes from the PHYP and ends up renaming the new >>>> properties e.g. >>>> >>>> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1" >>>> >>>> I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within >>>> of_find_node_by_phandle to skip nodes that are detached, but still present >>>> due to caching or use count considerations. Another possibility to consider >>>> is that of_find_node_by_phandle also uses something called 'phandle_cache' >>>> which may have outdated data as of_detach_node() does not have access to >>>> that cache for the 'OF_DETACHED' nodes. >>> >>> Yes the phandle_cache looks like it might be the problem. >>> >>> I saw of_free_phandle_cache() being called as late_initcall, but didn't >>> realise that's only if MODULES is disabled. >>> >>> So I don't see anything that invalidates the phandle_cache when a node >>> is removed. >>> >>> The right solution would be for __of_detach_node() to invalidate the >>> phandle_cache for the node being detached. That's slightly complicated >>> by the phandle_cache being static inside base.c >>> >>> To test the theory that it's the phandle_cache causing the problems can >>> you try this patch: >>> >>> diff --git a/drivers/of/base.c b/drivers/of/base.c >>> index 848f549164cd..60e219132e24 100644 >>> --- a/drivers/of/base.c >>> +++ b/drivers/of/base.c >>> @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle handle) >>> if (phandle_cache[masked_handle] && >>> handle == phandle_cache[masked_handle]->phandle) >>> np = phandle_cache[masked_handle]; >>> + >>> + if (of_node_check_flag(np, OF_DETACHED)) >>> + np = NULL; >>> } >>> >>> if (!np) { >>> >>> cheers >> > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 mwb@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) 2018-07-31 19:18 ` Frank Rowand 2018-07-31 19:22 ` Frank Rowand 2018-07-31 19:26 ` Michael Bringmann @ 2018-08-01 14:26 ` Michael Ellerman 2018-08-01 17:26 ` Michael Bringmann 2 siblings, 1 reply; 17+ messages in thread From: Michael Ellerman @ 2018-08-01 14:26 UTC (permalink / raw) To: Frank Rowand, Rob Herring; +Cc: mwb, linuxppc-dev, devicetree Frank Rowand <frowand.list@gmail.com> writes: > On 07/31/18 07:17, Rob Herring wrote: >> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrote: >>> >>> Hi Rob/Frank, >>> >>> I think we might have a problem with the phandle_cache not interacting >>> well with of_detach_node(): >> >> Probably needs a similar fix as this commit did for overlays: >> >> commit b9952b5218added5577e4a3443969bc20884cea9 >> Author: Frank Rowand <frank.rowand@sony.com> >> Date: Thu Jul 12 14:00:07 2018 -0700 >> >> of: overlay: update phandle cache on overlay apply and remove >> >> A comment in the review of the patch adding the phandle cache said that >> the cache would have to be updated when modules are applied and removed. >> This patch implements the cache updates. >> >> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of >> of_find_node_by_phandle()") >> Reported-by: Alan Tull <atull@kernel.org> >> Suggested-by: Alan Tull <atull@kernel.org> >> Signed-off-by: Frank Rowand <frank.rowand@sony.com> >> Signed-off-by: Rob Herring <robh@kernel.org> > > Agreed. Sorry about missing the of_detach_node() case. > > >> Really what we need here is an "invalidate phandle" function rather >> than free and re-allocate the whole damn cache. > > The big hammer approach was chosen to avoid the race conditions that > would otherwise occur. OF does not have a locking strategy that > would be able to protect against the races. > > We could maybe implement a slightly smaller hammer by (1) disabling > the cache, (2) invalidate a phandle entry in the cache, (3) re-enable > the cache. That is an off the cuff thought - I would have to look > a little bit more carefully to make sure it would work. > > But I don't see a need to add the complexity of the smaller hammer > or the bigger hammer of proper locking _unless_ we start seeing that > the cache is being freed and re-allocated frequently. For overlays > I don't expect the high frequency because it happens on a per overlay > removal basis (not per node removal basis). For of_detach_node() the > event _is_ on a per node removal basis. Michael, do you expect node > removals to be a frequent event with low latency being important? If > so, a rough guess on what the frequency would be? No in practice we expect them to be fairly rare and in the thousands at most I think. TBH you could just disable the phandle_cache on the first detach and that would be fine by me. I don't think we've ever noticed phandle lookups being much of a problem for us on our hardware. cheers ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) 2018-08-01 14:26 ` Michael Ellerman @ 2018-08-01 17:26 ` Michael Bringmann 0 siblings, 0 replies; 17+ messages in thread From: Michael Bringmann @ 2018-08-01 17:26 UTC (permalink / raw) To: linuxppc-dev On 08/01/2018 09:26 AM, Michael Ellerman wrote: > Frank Rowand <frowand.list@gmail.com> writes: >> On 07/31/18 07:17, Rob Herring wrote: >>> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe@ellerman.id.au> wrote: >>>> >>>> Hi Rob/Frank, >>>> >>>> I think we might have a problem with the phandle_cache not interacting >>>> well with of_detach_node(): >>> >>> Probably needs a similar fix as this commit did for overlays: >>> >>> commit b9952b5218added5577e4a3443969bc20884cea9 >>> Author: Frank Rowand <frank.rowand@sony.com> >>> Date: Thu Jul 12 14:00:07 2018 -0700 >>> >>> of: overlay: update phandle cache on overlay apply and remove >>> >>> A comment in the review of the patch adding the phandle cache said that >>> the cache would have to be updated when modules are applied and removed. >>> This patch implements the cache updates. >>> >>> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of >>> of_find_node_by_phandle()") >>> Reported-by: Alan Tull <atull@kernel.org> >>> Suggested-by: Alan Tull <atull@kernel.org> >>> Signed-off-by: Frank Rowand <frank.rowand@sony.com> >>> Signed-off-by: Rob Herring <robh@kernel.org> >> >> Agreed. Sorry about missing the of_detach_node() case. >> >> >>> Really what we need here is an "invalidate phandle" function rather >>> than free and re-allocate the whole damn cache. >> >> The big hammer approach was chosen to avoid the race conditions that >> would otherwise occur. OF does not have a locking strategy that >> would be able to protect against the races. >> >> We could maybe implement a slightly smaller hammer by (1) disabling >> the cache, (2) invalidate a phandle entry in the cache, (3) re-enable >> the cache. That is an off the cuff thought - I would have to look >> a little bit more carefully to make sure it would work. >> >> But I don't see a need to add the complexity of the smaller hammer >> or the bigger hammer of proper locking _unless_ we start seeing that >> the cache is being freed and re-allocated frequently. For overlays >> I don't expect the high frequency because it happens on a per overlay >> removal basis (not per node removal basis). For of_detach_node() the >> event _is_ on a per node removal basis. Michael, do you expect node >> removals to be a frequent event with low latency being important? If >> so, a rough guess on what the frequency would be? > > No in practice we expect them to be fairly rare and in the thousands at > most I think. > > TBH you could just disable the phandle_cache on the first detach and > that would be fine by me. I don't think we've ever noticed phandle > lookups being much of a problem for us on our hardware. I ran a couple of migrations with the calls to of_free_phandle_cache() ... of_populate_phandle_cache() bracketing the body of arch/powerpc/platforms/pseries/mobility.c:post_mobility_fixup() with a patch like, diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index e245a88..efc9442 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -22,6 +22,9 @@ #include <asm/rtas.h> #include "pseries.h" +extern int of_free_phandle_cache(void); +extern void of_populate_phandle_cache(void); + static struct kobject *mobility_kobj; struct update_props_workarea { @@ -343,6 +346,8 @@ void post_mobility_fixup(void) rc = rtas_call(activate_fw_token, 0, 1, NULL); } while (rtas_busy_delay(rc)); + of_free_phandle_cache(); + if (rc) printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc); @@ -354,6 +359,8 @@ void post_mobility_fixup(void) /* Possibly switch to a new RFI flush type */ pseries_setup_rfi_flush(); + of_populate_phandle_cache(); + return; } and did not observe the warnings/crashes that have been plaguing me. We will need to move their prototypes out of drivers/of/of_private.h to include/linux/of.h, though. > > cheers > Michael -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 mwb@linux.vnet.ibm.com ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) 2018-07-31 6:34 ` phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) Michael Ellerman 2018-07-31 14:17 ` Rob Herring @ 2018-07-31 19:11 ` Michael Bringmann 1 sibling, 0 replies; 17+ messages in thread From: Michael Bringmann @ 2018-07-31 19:11 UTC (permalink / raw) To: linuxppc-dev I applied your suggestion with a couple of modifications, and it looks to have worked for the first 2 migration events. I am not seeing the errors from repeated migrations. The revised patch tested is, diff --git a/drivers/of/base.c b/drivers/of/base.c index 466e3c8..8bf64e5 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -1096,8 +1096,14 @@ struct device_node *of_find_node_by_phandle(phandle handle) if (phandle_cache) { if (phandle_cache[masked_handle] && - handle == phandle_cache[masked_handle]->phandle) + handle == phandle_cache[masked_handle]->phandle) { np = phandle_cache[masked_handle]; + + if (of_node_check_flag(np, OF_DETACHED)) { + np = NULL; + phandle_cache[masked_handle] = NULL; + } + } } if (!np) { During a conference call this morning, Tyrel expressed concerns about the use of the phandle_cache on powerpc, at all. I will try another build with that feature disabled, but without this patch. Michael On 07/31/2018 01:34 AM, Michael Ellerman wrote: > Hi Rob/Frank, > > I think we might have a problem with the phandle_cache not interacting > well with of_detach_node(): > > Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >> See below. >> >> On 07/30/2018 01:31 AM, Michael Ellerman wrote: >>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >>> >>>> During LPAR migration, the content of the device tree/sysfs may >>>> be updated including deletion and replacement of nodes in the >>>> tree. When nodes are added to the internal node structures, they >>>> are appended in FIFO order to a list of nodes maintained by the >>>> OF code APIs. >>> >>> That hasn't been true for several years. The data structure is an n-ary >>> tree. What kernel version are you working on? >> >> Sorry for an error in my description. I oversimplified based on the >> name of a search iterator. Let me try to provide a better explanation >> of the problem, here. >> >> This is the problem. The PPC mobility code receives RTAS requests to >> delete nodes with platform-/hardware-specific attributes when restarting >> the kernel after a migration. My example is for migration between a >> P8 Alpine and a P8 Brazos. Nodes to be deleted may include 'ibm,random-v1', >> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', >> or others. >> >> The mobility.c code calls 'of_detach_node' for the nodes and their children. >> This makes calls to detach the properties and to try to remove the associated >> sysfs/kernfs files. >> >> Then new copies of the same nodes are next provided by the PHYP, local >> copies are built, and a pointer to the 'struct device_node' is passed to >> of_attach_node. Before the call to of_attach_node, the phandle is initialized >> to 0 when the data structure is alloced. During the call to of_attach_node, >> it calls __of_attach_node which pulls the actual name and phandle from just >> created sub-properties named something like 'name' and 'ibm,phandle'. >> >> This is all fine for the first migration. The problem occurs with the >> second and subsequent migrations when the PHYP on the new system wants to >> replace the same set of nodes again, referenced with the same names and >> phandle values. >> >>> >>>> When nodes are removed from the device tree, they >>>> are marked OF_DETACHED, but not actually deleted from the system >>>> to allow for pointers cached elsewhere in the kernel. The order >>>> and content of the entries in the list of nodes is not altered, >>>> though. >>> >>> Something is going wrong if this is actually happening. >>> >>> When the node is detached it should be *detached* from the tree of all >>> nodes, so it should not be discoverable other than by having an existing >>> pointer to it. >> On the second and subsequent migrations, the PHYP tells the system >> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', >> 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these >> nodes by its known set of phandle values -- the same handles used >> by the PHYP on the source system are known on the target system. >> The mobility.c code calls of_find_node_by_phandle() with these values >> and ends up locating the first instance of each node that was added >> during the original boot, instead of the second instance of each node >> created after the first migration. The detach during the second >> migration fails with errors like, >> >> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0 >> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod >> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G W 4.18.0-rc1-wi107836-v05-120+ #201 >> [ 4565.030737] NIP: c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170 >> [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700 Tainted: G W (4.18.0-rc1-wi107836-v05-120+) >> [ 4565.030745] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22288822 XER: 0000000a >> [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1 >> [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68 >> [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff >> [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843 >> [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000 >> [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff >> [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001 >> [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930 >> [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0 >> [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0 >> [ 4565.030811] Call Trace: >> [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable) >> [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150 >> [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80 >> [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0 >> [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0 >> [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130 >> [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60 >> [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0 >> [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240 >> [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220 >> [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240 >> [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100 >> [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70 >> [ 4565.030884] Instruction dump: >> [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 >> [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040 >> [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]--- >> >> The mobility.c code continues on during the second migration, accepts the >> definitions of the new nodes from the PHYP and ends up renaming the new >> properties e.g. >> >> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1" >> >> I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within >> of_find_node_by_phandle to skip nodes that are detached, but still present >> due to caching or use count considerations. Another possibility to consider >> is that of_find_node_by_phandle also uses something called 'phandle_cache' >> which may have outdated data as of_detach_node() does not have access to >> that cache for the 'OF_DETACHED' nodes. > > Yes the phandle_cache looks like it might be the problem. > > I saw of_free_phandle_cache() being called as late_initcall, but didn't > realise that's only if MODULES is disabled. > > So I don't see anything that invalidates the phandle_cache when a node > is removed. > > The right solution would be for __of_detach_node() to invalidate the > phandle_cache for the node being detached. That's slightly complicated > by the phandle_cache being static inside base.c > > To test the theory that it's the phandle_cache causing the problems can > you try this patch: > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 848f549164cd..60e219132e24 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle handle) > if (phandle_cache[masked_handle] && > handle == phandle_cache[masked_handle]->phandle) > np = phandle_cache[masked_handle]; > + > + if (of_node_check_flag(np, OF_DETACHED)) > + np = NULL; > } > > if (!np) { > > cheers > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 mwb@linux.vnet.ibm.com ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem 2018-07-29 13:11 [PATCH] powerpc/mobility: Fix node detach/rename problem Michael Bringmann 2018-07-29 16:31 ` kbuild test robot 2018-07-30 6:31 ` Michael Ellerman @ 2018-07-31 0:59 ` Tyrel Datwyler 2018-07-31 1:46 ` Tyrel Datwyler 2018-07-31 6:42 ` Michael Ellerman 2 siblings, 2 replies; 17+ messages in thread From: Tyrel Datwyler @ 2018-07-31 0:59 UTC (permalink / raw) To: Michael Bringmann, linuxppc-dev Cc: Nathan Fontenot, Thomas Falcon, John Allen, mpe On 07/29/2018 06:11 AM, Michael Bringmann wrote: > During LPAR migration, the content of the device tree/sysfs may > be updated including deletion and replacement of nodes in the > tree. When nodes are added to the internal node structures, they > are appended in FIFO order to a list of nodes maintained by the > OF code APIs. When nodes are removed from the device tree, they > are marked OF_DETACHED, but not actually deleted from the system > to allow for pointers cached elsewhere in the kernel. The order > and content of the entries in the list of nodes is not altered, > though. > > During LPAR migration some common nodes are deleted and re-added > e.g. "ibm,platform-facilities". If a node is re-added to the OF > node lists, the of_attach_node function checks to make sure that > the name + ibm,phandle of the to-be-added data is unique. As the > previous copy of a re-added node is not modified beyond the addition > of a bit flag, the code (1) finds the old copy, (2) prints a WARNING > notice to the console, (3) renames the to-be-added node to avoid > filename collisions within a directory, and (3) adds entries to > the sysfs/kernfs. So, this patch actually just band aids over the real problem. This is a long standing problem with several PFO drivers leaking references. The issue here is that, during the device tree update that follows a migration. the update of the ibm,platform-facilities node and friends below are always deleted and re-added on the destination lpar and subsequently the leaked references prevent the devices nodes from every actually being properly cleaned up after detach. Thus, leading to the issue you are observing. As and example a quick look at nx-842-pseries.c reveals several issues. static int __init nx842_pseries_init(void) { struct nx842_devdata *new_devdata; int ret; if (!of_find_compatible_node(NULL, NULL, "ibm,compression")) return -ENODEV; This call to of_find_compatible_node() results in a node returned with refcount incremented and therefore immediately leaked. Further, the reconfig notifier logic makes the assumption that it only needs to deal with node updates, but as I outlined above the post migration device tree update always results in PFO nodes and properties being deleted and re-added. /** * nx842_OF_notifier - Process updates to OF properties for the device * * @np: notifier block * @action: notifier action * @update: struct pSeries_reconfig_prop_update pointer if action is * PSERIES_UPDATE_PROPERTY * * Returns: * NOTIFY_OK on success * NOTIFY_BAD encoded with error number on failure, use * notifier_to_errno() to decode this value */ static int nx842_OF_notifier(struct notifier_block *np, unsigned long action, void *data) { struct of_reconfig_data *upd = data; struct nx842_devdata *local_devdata; struct device_node *node = NULL; rcu_read_lock(); local_devdata = rcu_dereference(devdata); if (local_devdata) node = local_devdata->dev->of_node; if (local_devdata && action == OF_RECONFIG_UPDATE_PROPERTY && !strcmp(upd->dn->name, node->name)) { rcu_read_unlock(); nx842_OF_upd(upd->prop); } else rcu_read_unlock(); return NOTIFY_OK; } I expect to find the same problems in pseries-rng.c and nx.c. -Tyrel > > This patch fixes the 'migration add' problem by changing the > stored 'phandle' of the OF_DETACHed node to 0 (reserved value for > of_find_node_by_phandle), so that subsequent re-add operations, > such as those during migration, do not find the detached node, > do not observe duplicate names, do not rename them, and the > extra WARNING notices are removed from the console output. > > In addition, it erases the 'name' field of the OF_DETACHED node, > to prevent any future calls to of_find_node_by_name() or > of_find_node_by_path() from matching this node. > > Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> > --- > arch/powerpc/platforms/pseries/dlpar.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c > index 2de0f0d..9d82c28 100644 > --- a/arch/powerpc/platforms/pseries/dlpar.c > +++ b/arch/powerpc/platforms/pseries/dlpar.c > @@ -274,6 +274,9 @@ int dlpar_detach_node(struct device_node *dn) > if (rc) > return rc; > > + dn->phandle = 0; > + memset(dn->name, 0, strlen(dn->name)); > + > return 0; > } > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem 2018-07-31 0:59 ` [PATCH] powerpc/mobility: Fix node detach/rename problem Tyrel Datwyler @ 2018-07-31 1:46 ` Tyrel Datwyler 2018-07-31 6:42 ` Michael Ellerman 1 sibling, 0 replies; 17+ messages in thread From: Tyrel Datwyler @ 2018-07-31 1:46 UTC (permalink / raw) To: Michael Bringmann, linuxppc-dev Cc: Nathan Fontenot, Thomas Falcon, John Allen On 07/30/2018 05:59 PM, Tyrel Datwyler wrote: > On 07/29/2018 06:11 AM, Michael Bringmann wrote: >> During LPAR migration, the content of the device tree/sysfs may >> be updated including deletion and replacement of nodes in the >> tree. When nodes are added to the internal node structures, they >> are appended in FIFO order to a list of nodes maintained by the >> OF code APIs. When nodes are removed from the device tree, they >> are marked OF_DETACHED, but not actually deleted from the system >> to allow for pointers cached elsewhere in the kernel. The order >> and content of the entries in the list of nodes is not altered, >> though. >> >> During LPAR migration some common nodes are deleted and re-added >> e.g. "ibm,platform-facilities". If a node is re-added to the OF >> node lists, the of_attach_node function checks to make sure that >> the name + ibm,phandle of the to-be-added data is unique. As the >> previous copy of a re-added node is not modified beyond the addition >> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING >> notice to the console, (3) renames the to-be-added node to avoid >> filename collisions within a directory, and (3) adds entries to >> the sysfs/kernfs. > > So, this patch actually just band aids over the real problem. This is a long standing problem with several PFO drivers leaking references. The issue here is that, during the device tree update that follows a migration. the update of the ibm,platform-facilities node and friends below are always deleted and re-added on the destination lpar and subsequently the leaked references prevent the devices nodes from every actually being properly cleaned up after detach. Thus, leading to the issue you are observing. > > As and example a quick look at nx-842-pseries.c reveals several issues. > > static int __init nx842_pseries_init(void) > { > struct nx842_devdata *new_devdata; > int ret; > > if (!of_find_compatible_node(NULL, NULL, "ibm,compression")) > return -ENODEV; > > This call to of_find_compatible_node() results in a node returned with refcount incremented and therefore immediately leaked. > > Further, the reconfig notifier logic makes the assumption that it only needs to deal with node updates, but as I outlined above the post migration device tree update always results in PFO nodes and properties being deleted and re-added. > > /** > * nx842_OF_notifier - Process updates to OF properties for the device > * > * @np: notifier block > * @action: notifier action > * @update: struct pSeries_reconfig_prop_update pointer if action is > * PSERIES_UPDATE_PROPERTY > * > * Returns: > * NOTIFY_OK on success > * NOTIFY_BAD encoded with error number on failure, use > * notifier_to_errno() to decode this value > */ > static int nx842_OF_notifier(struct notifier_block *np, unsigned long action, > void *data) > { > struct of_reconfig_data *upd = data; > struct nx842_devdata *local_devdata; > struct device_node *node = NULL; > > rcu_read_lock(); > local_devdata = rcu_dereference(devdata); > if (local_devdata) > node = local_devdata->dev->of_node; > > if (local_devdata && > action == OF_RECONFIG_UPDATE_PROPERTY && > !strcmp(upd->dn->name, node->name)) { > rcu_read_unlock(); > nx842_OF_upd(upd->prop); > } else > rcu_read_unlock(); > > return NOTIFY_OK; > } > > I expect to find the same problems in pseries-rng.c and nx.c. So, in actuality the main root of the problem is really in the vio core. A node reference for each PFO device under "ibm,platform-facilities" is taken during vio_register_device_node(). We need a reconfig notifier to call vio_unregister_device() for each PFO device on detach, and vio_bus_scan_register_devices("ibm,platform-facilities") on attach. This will make sure the PFO vio devices are released such that vio_dev_release() gets called to put the node reference that was taken at original registration time. /* vio_dev refcount hit 0 */ static void vio_dev_release(struct device *dev) { struct iommu_table *tbl = get_iommu_table_base(dev); if (tbl) iommu_tce_table_put(tbl); of_node_put(dev->of_node); kfree(to_vio_dev(dev)); } -Tyrel > > -Tyrel > >> >> This patch fixes the 'migration add' problem by changing the >> stored 'phandle' of the OF_DETACHed node to 0 (reserved value for >> of_find_node_by_phandle), so that subsequent re-add operations, >> such as those during migration, do not find the detached node, >> do not observe duplicate names, do not rename them, and the >> extra WARNING notices are removed from the console output. >> >> In addition, it erases the 'name' field of the OF_DETACHED node, >> to prevent any future calls to of_find_node_by_name() or >> of_find_node_by_path() from matching this node. >> >> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> >> --- >> arch/powerpc/platforms/pseries/dlpar.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c >> index 2de0f0d..9d82c28 100644 >> --- a/arch/powerpc/platforms/pseries/dlpar.c >> +++ b/arch/powerpc/platforms/pseries/dlpar.c >> @@ -274,6 +274,9 @@ int dlpar_detach_node(struct device_node *dn) >> if (rc) >> return rc; >> >> + dn->phandle = 0; >> + memset(dn->name, 0, strlen(dn->name)); >> + >> return 0; >> } >> >> > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem 2018-07-31 0:59 ` [PATCH] powerpc/mobility: Fix node detach/rename problem Tyrel Datwyler 2018-07-31 1:46 ` Tyrel Datwyler @ 2018-07-31 6:42 ` Michael Ellerman 2018-07-31 19:50 ` Tyrel Datwyler 1 sibling, 1 reply; 17+ messages in thread From: Michael Ellerman @ 2018-07-31 6:42 UTC (permalink / raw) To: Tyrel Datwyler, Michael Bringmann, linuxppc-dev Cc: Nathan Fontenot, Thomas Falcon, John Allen Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes: > On 07/29/2018 06:11 AM, Michael Bringmann wrote: >> During LPAR migration, the content of the device tree/sysfs may >> be updated including deletion and replacement of nodes in the >> tree. When nodes are added to the internal node structures, they >> are appended in FIFO order to a list of nodes maintained by the >> OF code APIs. When nodes are removed from the device tree, they >> are marked OF_DETACHED, but not actually deleted from the system >> to allow for pointers cached elsewhere in the kernel. The order >> and content of the entries in the list of nodes is not altered, >> though. >> >> During LPAR migration some common nodes are deleted and re-added >> e.g. "ibm,platform-facilities". If a node is re-added to the OF >> node lists, the of_attach_node function checks to make sure that >> the name + ibm,phandle of the to-be-added data is unique. As the >> previous copy of a re-added node is not modified beyond the addition >> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING >> notice to the console, (3) renames the to-be-added node to avoid >> filename collisions within a directory, and (3) adds entries to >> the sysfs/kernfs. > > So, this patch actually just band aids over the real problem. This is > a long standing problem with several PFO drivers leaking references. > The issue here is that, during the device tree update that follows a > migration. the update of the ibm,platform-facilities node and friends > below are always deleted and re-added on the destination lpar and > subsequently the leaked references prevent the devices nodes from > every actually being properly cleaned up after detach. Thus, leading > to the issue you are observing. Leaking references shouldn't affect the node being detached from the tree though. See of_detach_node() calling __of_detach_node(), none of that depends on the refcount. It's only the actual freeing of the node, in of_node_release() that is prevented by leaked reference counts. So I agree we need to do a better job with the reference counting, but I don't see how it is causing the problem here. cheers ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem 2018-07-31 6:42 ` Michael Ellerman @ 2018-07-31 19:50 ` Tyrel Datwyler 2018-08-01 14:35 ` Michael Ellerman 0 siblings, 1 reply; 17+ messages in thread From: Tyrel Datwyler @ 2018-07-31 19:50 UTC (permalink / raw) To: Michael Ellerman, Michael Bringmann, linuxppc-dev Cc: Nathan Fontenot, Thomas Falcon, John Allen T24gMDcvMzAvMjAxOCAxMTo0MiBQTSwgTWljaGFlbCBFbGxlcm1hbiB3cm90ZToNCj4gVHly ZWwgRGF0d3lsZXIgPHR5cmVsZEBsaW51eC52bmV0LmlibS5jb20+IHdyaXRlczoNCj4+IE9u IDA3LzI5LzIwMTggMDY6MTEgQU0sIE1pY2hhZWwgQnJpbmdtYW5uIHdyb3RlOg0KPj4+IER1 cmluZyBMUEFSIG1pZ3JhdGlvbiwgdGhlIGNvbnRlbnQgb2YgdGhlIGRldmljZSB0cmVlL3N5 c2ZzIG1heQ0KPj4+IGJlIHVwZGF0ZWQgaW5jbHVkaW5nIGRlbGV0aW9uIGFuZCByZXBsYWNl bWVudCBvZiBub2RlcyBpbiB0aGUNCj4+PiB0cmVlLiAgV2hlbiBub2RlcyBhcmUgYWRkZWQg dG8gdGhlIGludGVybmFsIG5vZGUgc3RydWN0dXJlcywgdGhleQ0KPj4+IGFyZSBhcHBlbmRl ZCBpbiBGSUZPIG9yZGVyIHRvIGEgbGlzdCBvZiBub2RlcyBtYWludGFpbmVkIGJ5IHRoZQ0K Pj4+IE9GIGNvZGUgQVBJcy4gIFdoZW4gbm9kZXMgYXJlIHJlbW92ZWQgZnJvbSB0aGUgZGV2 aWNlIHRyZWUsIHRoZXkNCj4+PiBhcmUgbWFya2VkIE9GX0RFVEFDSEVELCBidXQgbm90IGFj dHVhbGx5IGRlbGV0ZWQgZnJvbSB0aGUgc3lzdGVtDQo+Pj4gdG8gYWxsb3cgZm9yIHBvaW50 ZXJzIGNhY2hlZCBlbHNld2hlcmUgaW4gdGhlIGtlcm5lbC4gIFRoZSBvcmRlcg0KPj4+IGFu ZCBjb250ZW50IG9mIHRoZSBlbnRyaWVzIGluIHRoZSBsaXN0IG9mIG5vZGVzIGlzIG5vdCBh bHRlcmVkLA0KPj4+IHRob3VnaC4NCj4+Pg0KPj4+IER1cmluZyBMUEFSIG1pZ3JhdGlvbiBz b21lIGNvbW1vbiBub2RlcyBhcmUgZGVsZXRlZCBhbmQgcmUtYWRkZWQNCj4+PiBlLmcuICJp Ym0scGxhdGZvcm0tZmFjaWxpdGllcyIuICBJZiBhIG5vZGUgaXMgcmUtYWRkZWQgdG8gdGhl IE9GDQo+Pj4gbm9kZSBsaXN0cywgdGhlIG9mX2F0dGFjaF9ub2RlIGZ1bmN0aW9uIGNoZWNr cyB0byBtYWtlIHN1cmUgdGhhdA0KPj4+IHRoZSBuYW1lICsgaWJtLHBoYW5kbGUgb2YgdGhl IHRvLWJlLWFkZGVkIGRhdGEgaXMgdW5pcXVlLiAgQXMgdGhlDQo+Pj4gcHJldmlvdXMgY29w eSBvZiBhIHJlLWFkZGVkIG5vZGUgaXMgbm90IG1vZGlmaWVkIGJleW9uZCB0aGUgYWRkaXRp b24NCj4+PiBvZiBhIGJpdCBmbGFnLCB0aGUgY29kZSAoMSkgZmluZHMgdGhlIG9sZCBjb3B5 LCAoMikgcHJpbnRzIGEgV0FSTklORw0KPj4+IG5vdGljZSB0byB0aGUgY29uc29sZSwgKDMp IHJlbmFtZXMgdGhlIHRvLWJlLWFkZGVkIG5vZGUgdG8gYXZvaWQNCj4+PiBmaWxlbmFtZSBj b2xsaXNpb25zIHdpdGhpbiBhIGRpcmVjdG9yeSwgYW5kICgzKSBhZGRzIGVudHJpZXMgdG8N Cj4+PiB0aGUgc3lzZnMva2VybmZzLg0KPj4NCj4+IFNvLCB0aGlzIHBhdGNoIGFjdHVhbGx5 IGp1c3QgYmFuZCBhaWRzIG92ZXIgdGhlIHJlYWwgcHJvYmxlbS4gVGhpcyBpcw0KPj4gYSBs b25nIHN0YW5kaW5nIHByb2JsZW0gd2l0aCBzZXZlcmFsIFBGTyBkcml2ZXJzIGxlYWtpbmcg cmVmZXJlbmNlcy4NCj4+IFRoZSBpc3N1ZSBoZXJlIGlzIHRoYXQsIGR1cmluZyB0aGUgZGV2 aWNlIHRyZWUgdXBkYXRlIHRoYXQgZm9sbG93cyBhDQo+PiBtaWdyYXRpb24uIHRoZSB1cGRh dGUgb2YgdGhlIGlibSxwbGF0Zm9ybS1mYWNpbGl0aWVzIG5vZGUgYW5kIGZyaWVuZHMNCj4+ IGJlbG93IGFyZSBhbHdheXMgZGVsZXRlZCBhbmQgcmUtYWRkZWQgb24gdGhlIGRlc3RpbmF0 aW9uIGxwYXIgYW5kDQo+PiBzdWJzZXF1ZW50bHkgdGhlIGxlYWtlZCByZWZlcmVuY2VzIHBy ZXZlbnQgdGhlIGRldmljZXMgbm9kZXMgZnJvbQ0KPj4gZXZlcnkgYWN0dWFsbHkgYmVpbmcg cHJvcGVybHkgY2xlYW5lZCB1cCBhZnRlciBkZXRhY2guIFRodXMsIGxlYWRpbmcNCj4+IHRv IHRoZSBpc3N1ZSB5b3UgYXJlIG9ic2VydmluZy4NCg0KU28sIGl0IHdhcyB0aGUgZW5kIG9m IHRoZSBkYXksIGFuZCBJIGtpbmQgb2YgZ2xvc3NlZCBvdmVyIHRoZSBpc3N1ZSBNaWNoYWVs IHdhcyB0cnlpbmcgdG8gYWRkcmVzcyB3aXRoIGFuIGlzc3VlIHRoYXQgSSByZW1lbWJlcmVk IHRoYXQgaGFkIGJlZW4gYXJvdW5kIGZvciBhd2hpbGUuDQoNCj4gDQo+IExlYWtpbmcgcmVm ZXJlbmNlcyBzaG91bGRuJ3QgYWZmZWN0IHRoZSBub2RlIGJlaW5nIGRldGFjaGVkIGZyb20g dGhlDQo+IHRyZWUgdGhvdWdoLg0KDQpObywgYnV0IGl0IGRvZXMgcHJldmVudCBpdCBmcm9t IGJlaW5nIGZyZWVkIGZyb20gc3lzZnMgd2hpY2ggbGVhZHMgdG8gdGhlIHN5c2ZzIGVudHJ5 IHJlbmFtaW5nIHRoYXQgaGFwcGVucyB3aGVuIGFub3RoZXIgbm9kZSB3aXRoIHRoZSBzYW1l IG5hbWUgaXMgYXR0YWNoZWQuDQoNCj4gDQo+IFNlZSBvZl9kZXRhY2hfbm9kZSgpIGNhbGxp bmcgX19vZl9kZXRhY2hfbm9kZSgpLCBub25lIG9mIHRoYXQgZGVwZW5kcyBvbg0KPiB0aGUg cmVmY291bnQuDQo+IA0KPiBJdCdzIG9ubHkgdGhlIGFjdHVhbCBmcmVlaW5nIG9mIHRoZSBu b2RlLCBpbiBvZl9ub2RlX3JlbGVhc2UoKSB0aGF0IGlzDQo+IHByZXZlbnRlZCBieSBsZWFr ZWQgcmVmZXJlbmNlIGNvdW50cy4NCg0KUmlnaHQsIGJ1dCBpZiB3ZSBkaWQgcmVmY291bnQg Y29ycmVjdGx5IHRoZXJlIGlzIHRoZSBuZXcgcHJvYmxlbSB0aGF0IHRoZSBub2RlIGlzIGZy ZWVkIGFuZCBub3cgdGhlIHBoYW5kbGUgY2FjaGUgcG9pbnRzIGF0IGZyZWVkIG1lbW9yeSBp biB0aGUgY2FzZSB3ZXJlIG5vIGludmFsaWRhdGlvbiBpcyBkb25lLg0KDQo+IA0KPiBTbyBJ IGFncmVlIHdlIG5lZWQgdG8gZG8gYSBiZXR0ZXIgam9iIHdpdGggdGhlIHJlZmVyZW5jZSBj b3VudGluZywgYnV0IEkNCj4gZG9uJ3Qgc2VlIGhvdyBpdCBpcyBjYXVzaW5nIHRoZSBwcm9i bGVtIGhlcmUNCg0KTm93IGluIHJlZ2FyZHMgdG8gdGhlIHBoYW5kbGUgY2FjaGluZyBzb21l aG93IEkgbWlzc2VkIHRoYXQgY29kZSBnb2luZyBpbnRvIE9GIGVhcmxpZXIgdGhpcyB5ZWFy LiBUaGF0IHNob3VsZCBoYXZlIGhhZCBhdCBsZWFzdCBzb21lIGRpc2N1c3Npb24gZnJvbSBv dXIgc2lkZSBiYXNlZCBvbiB0aGUgZmFjdCB0aGF0IGl0IGlzIGJ1aWx0IG9uIGR0YyBjb21w aWxlciBhc3N1bXB0aW9uIHRoYXQgdGhlcmUgYXJlIGEgc2V0IG51bWJlciBvZiBwaGFuZGxl cyB0aGF0IGFyZSBhbGxvY2F0ZWQgc3RhcnRpbmcgYXQgMS4ubiBzdWNoIHRoYXQgdGhleSBh cmUgbW9ub3RvbmljYWxseSBpbmNyZWFzaW5nLiBUaGF0IGhhcyBhIG5pY2UgZml4ZWQgc2l6 ZSB3aXRoIE8oMSkgbG9va3VwIHRpbWUuIFBoeXAgZG9lc24ndCBndWFyYW50ZWUgYW55IHNv cnQgb2YgbmljZW5lc3Mgd2l0aCBuaWNlbHkgb3JkZXJlZCBwaGFuZGxlcy4gRnJvbSB3aGF0 IEkndmUgc2VlbiB0aGVyZSBhcmUgYSBzdWJzZXQgb2YgcGhhbmRsZXMgdGhhdCBkZWNyZWFz ZSBmcm9tICgtMSkgbW9ub3RvbmljYWxseSwgYW5kIHRoZW4gdGhlcmUgYXJlIGEgYnVuY2gg b2YgcmFuZG9tIHZhbHVlcyBmb3IgY3B1cyBhbmQgSU9Bcy4gVGhpbmtpbmcgb24gaXQgbWln aHQgbm90IGJlIHRoYXQgYmlnIG9mIGEgZGVhbCBhcyB3ZSBqdXN0IGVuZCB1cCBpbiB0aGUg Y2FzZSB3aGVyZSB3ZSBoYXZlIGEgcGhhbmRsZSBjb2xsaXNpb24gb25lIG1ha2VzIGl0IGlu dG8gdGhlIGNhY2hlIGFuZCBpcyBvcHRpbWl6ZWQgd2hpbGUgdGhlIG90aGVyIGRvZXNuJ3Qu IE9uIGFub3RoZXIgbm90ZSwgdGhleSBjbGVhcmx5IGhpdCBhIHNpbWlsYXIgaXNzdWUgZHVy aW5nIG92ZXJsYXkgYXR0YWNoIGFuZCByZW1vdmUsIGFuZCBhcyBSb2IgcG9pbnRlZCBvdXQg dGhlaXIgc29sdXRpb24gdG8gaGFuZGxlIGl0IGlzIGEgZnVsbCBjYWNoZSBpbnZhbGlkYXRp b24gZm9sbG93ZWQgYnkgcmVzY2FubmluZyB0aGUgd2hvbGUgdHJlZSB0byByZWJ1aWxkIGl0 LiBTZWVpbmcgYXMgb3VyIGR5bmFtaWMgbGlmZWN5Y2xlIGlzIG5vZGUgYnkgbm9kZSB0aGlz IGRlZmluaXRlbHkgYWRkcyBzb21lIG92ZXJoZWFkLg0KDQotVHlyZWwNCg0KPiANCj4gY2hl ZXJzDQo+IA0KDQo= ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem 2018-07-31 19:50 ` Tyrel Datwyler @ 2018-08-01 14:35 ` Michael Ellerman 0 siblings, 0 replies; 17+ messages in thread From: Michael Ellerman @ 2018-08-01 14:35 UTC (permalink / raw) To: Tyrel Datwyler, Michael Bringmann, linuxppc-dev Cc: Nathan Fontenot, Thomas Falcon, John Allen Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes: > On 07/30/2018 11:42 PM, Michael Ellerman wrote: >> Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes: >>> On 07/29/2018 06:11 AM, Michael Bringmann wrote: >>>> During LPAR migration, the content of the device tree/sysfs may >>>> be updated including deletion and replacement of nodes in the >>>> tree. When nodes are added to the internal node structures, they >>>> are appended in FIFO order to a list of nodes maintained by the >>>> OF code APIs. When nodes are removed from the device tree, they >>>> are marked OF_DETACHED, but not actually deleted from the system >>>> to allow for pointers cached elsewhere in the kernel. The order >>>> and content of the entries in the list of nodes is not altered, >>>> though. >>>> >>>> During LPAR migration some common nodes are deleted and re-added >>>> e.g. "ibm,platform-facilities". If a node is re-added to the OF >>>> node lists, the of_attach_node function checks to make sure that >>>> the name + ibm,phandle of the to-be-added data is unique. As the >>>> previous copy of a re-added node is not modified beyond the addition >>>> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING >>>> notice to the console, (3) renames the to-be-added node to avoid >>>> filename collisions within a directory, and (3) adds entries to >>>> the sysfs/kernfs. >>> >>> So, this patch actually just band aids over the real problem. This is >>> a long standing problem with several PFO drivers leaking references. >>> The issue here is that, during the device tree update that follows a >>> migration. the update of the ibm,platform-facilities node and friends >>> below are always deleted and re-added on the destination lpar and >>> subsequently the leaked references prevent the devices nodes from >>> every actually being properly cleaned up after detach. Thus, leading >>> to the issue you are observing. > > So, it was the end of the day, and I kind of glossed over the issue > Michael was trying to address with an issue that I remembered that had > been around for awhile. Sure, I know that feeling :) >> Leaking references shouldn't affect the node being detached from the >> tree though. > > No, but it does prevent it from being freed from sysfs which leads to > the sysfs entry renaming that happens when another node with the same > name is attached. OK. >> See of_detach_node() calling __of_detach_node(), none of that depends on >> the refcount. >> >> It's only the actual freeing of the node, in of_node_release() that is >> prevented by leaked reference counts. > > Right, but if we did refcount correctly there is the new problem that > the node is freed and now the phandle cache points at freed memory in > the case were no invalidation is done. Yeah that's bad. I guess no one is supposed to lookup that phandle anymore, but it's super fragile. >> So I agree we need to do a better job with the reference counting, but I >> don't see how it is causing the problem here > > Now in regards to the phandle caching somehow I missed that code going > into OF earlier this year. That should have had at least some > discussion from our side based on the fact that it is built on dtc > compiler assumption that there are a set number of phandles that are > allocated starting at 1..n such that they are monotonically > increasing. That has a nice fixed size with O(1) lookup time. Phyp > doesn't guarantee any sort of niceness with nicely ordered phandles. > From what I've seen there are a subset of phandles that decrease from > (-1) monotonically, and then there are a bunch of random values for > cpus and IOAs. Thinking on it might not be that big of a deal as we > just end up in the case where we have a phandle collision one makes it > into the cache and is optimized while the other doesn't. On another > note, they clearly hit a similar issue during overlay attach and > remove, and as Rob pointed out their solution to handle it is a full > cache invalidation followed by rescanning the whole tree to rebuild > it. Seeing as our dynamic lifecycle is node by node this definitely > adds some overhead. Yeah I also should have noticed it going in, but despite being subscribed to the devicetree list I didn't spot it in the flood. AIUI the 1..n assumption is not about correctness but if our phandles violate that we may not be getting any benefit. Anyway yeah lots of things to look at tomorrow :) cheers ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2018-08-01 17:26 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-29 13:11 [PATCH] powerpc/mobility: Fix node detach/rename problem Michael Bringmann 2018-07-29 16:31 ` kbuild test robot 2018-07-30 6:31 ` Michael Ellerman 2018-07-30 21:02 ` Michael Bringmann 2018-07-31 6:34 ` phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem) Michael Ellerman 2018-07-31 14:17 ` Rob Herring 2018-07-31 19:18 ` Frank Rowand 2018-07-31 19:22 ` Frank Rowand 2018-07-31 19:26 ` Michael Bringmann 2018-08-01 14:26 ` Michael Ellerman 2018-08-01 17:26 ` Michael Bringmann 2018-07-31 19:11 ` Michael Bringmann 2018-07-31 0:59 ` [PATCH] powerpc/mobility: Fix node detach/rename problem Tyrel Datwyler 2018-07-31 1:46 ` Tyrel Datwyler 2018-07-31 6:42 ` Michael Ellerman 2018-07-31 19:50 ` Tyrel Datwyler 2018-08-01 14:35 ` Michael Ellerman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).