* Ordering problems with 3ware controller @ 2016-11-08 10:07 Paul Menzel 2016-11-08 11:09 ` Paul Menzel 2016-11-08 23:45 ` Martin K. Petersen 0 siblings, 2 replies; 8+ messages in thread From: Paul Menzel @ 2016-11-08 10:07 UTC (permalink / raw) To: linux-scsi; +Cc: Adam Radford Dear Linux SCSI folks, Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the 3ware devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to the controllers differently. This unfortunately breaks quite a lot of our scripts, as we depend on the fact that the first controller is also in the front. > $ dmesg | grep 3ware > [ 14.509238] 3ware 9000 Storage Controller device driver for Linux v2.26.02.014. > [ 14.824274] scsi host8: 3ware 9000 Storage Controller > [ 14.824537] 3w-9xxx: scsi8: Found a 3ware 9000 Storage Controller at 0xd0200000, IRQ: 17. > [ 15.508310] scsi host9: 3ware 9000 Storage Controller > [ 15.508569] 3w-9xxx: scsi9: Found a 3ware 9000 Storage Controller at 0xda100000, IRQ: 17. Tracing `twi_cli` it looks like the ordering of the devices in `/sys/class/scsi_host` might have changed, as `getdents64` seems to be used for the ordering of creating `/dev/twaX`. > $ find /sys/class/scsi_host/ -ls > 6033 0 drwxr-xr-x 2 root system 0 Nov 8 10:58 /sys/class/scsi_host/ > 23125 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host0 -> ../../devices/pci0000:00/0000:00:07.0/ata1/host0/scsi_host/host0 > 29893 0 lrwxrwxrwx 1 root system 0 Oct 27 18:03 /sys/class/scsi_host/host9 -> ../../devices/pci0000:80/0000:80:0e.0/0000:90:00.0/host9/scsi_host/host9 > 23878 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host7 -> ../../devices/pci0000:80/0000:80:08.0/ata8/host7/scsi_host/host7 > 23640 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host5 -> ../../devices/pci0000:80/0000:80:07.0/ata6/host5/scsi_host/host5 > 23402 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host3 -> ../../devices/pci0000:00/0000:00:08.0/ata4/host3/scsi_host/host3 > 23164 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host1 -> ../../devices/pci0000:00/0000:00:07.0/ata2/host1/scsi_host/host1 > 29851 0 lrwxrwxrwx 1 root system 0 Oct 27 18:03 /sys/class/scsi_host/host8 -> ../../devices/pci0000:00/0000:00:0e.0/0000:05:00.0/host8/scsi_host/host8 > 23839 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host6 -> ../../devices/pci0000:80/0000:80:08.0/ata7/host6/scsi_host/host6 > 23601 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host4 -> ../../devices/pci0000:80/0000:80:07.0/ata5/host4/scsi_host/host4 > 23363 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host2 -> ../../devices/pci0000:00/0000:00:08.0/ata3/host2/scsi_host/host2 > $ sudo -i tw_cli show > > Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU > ------------------------------------------------------------------------ > c8 9650SE-8LPML 8 8 1 0 5 1 OK > c9 9690SA-8E 0 0 0 0 5 1 OK > > Enclosure Slots Drives Fans TSUnits PSUnits Alarms > -------------------------------------------------------------- > /c9/e0 16 0 3 1 2 1 So in this case `c8` is mapped to `/dev/twa1`, and `c9` to `/dev/twa0`. As we do not know of a way, to use `tw_cli` to find the correct mapping, or another place, we rely on the implicit ordering, which – according to my colleagues – has worked for over 15 years [1]. Do you know of a way, to either get the mapping “over an API” so we don’t have to rely on the implicit ordering? Otherwise, do you know, why the ordering has changed, and can this be reverted? Kind regards, Paul Menzel [1] https://www.thomas-krenn.com/de/wiki/Smartmontools_mit_3ware_RAID_Controller (German) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ordering problems with 3ware controller 2016-11-08 10:07 Ordering problems with 3ware controller Paul Menzel @ 2016-11-08 11:09 ` Paul Menzel 2016-11-08 23:45 ` Martin K. Petersen 1 sibling, 0 replies; 8+ messages in thread From: Paul Menzel @ 2016-11-08 11:09 UTC (permalink / raw) To: linux-scsi; +Cc: Adam Radford, dvteam Dear Linux SCSI folks, On 11/08/16 11:07, Paul Menzel wrote: > Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the 3ware > devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to the > controllers differently. > > This unfortunately breaks quite a lot of our scripts, as we depend on > the fact that the first controller is also in the front. > >> $ dmesg | grep 3ware >> [ 14.509238] 3ware 9000 Storage Controller device driver for Linux >> v2.26.02.014. >> [ 14.824274] scsi host8: 3ware 9000 Storage Controller >> [ 14.824537] 3w-9xxx: scsi8: Found a 3ware 9000 Storage Controller >> at 0xd0200000, IRQ: 17. >> [ 15.508310] scsi host9: 3ware 9000 Storage Controller >> [ 15.508569] 3w-9xxx: scsi9: Found a 3ware 9000 Storage Controller >> at 0xda100000, IRQ: 17. > > Tracing `twi_cli` it looks like the ordering of the devices in > `/sys/class/scsi_host` might have changed, as `getdents64` seems to be > used for the ordering of creating `/dev/twaX`. > >> $ find /sys/class/scsi_host/ -ls >> 6033 0 drwxr-xr-x 2 root system 0 Nov 8 >> 10:58 /sys/class/scsi_host/ >> 23125 0 lrwxrwxrwx 1 root system 0 Oct 27 >> 17:41 /sys/class/scsi_host/host0 -> >> ../../devices/pci0000:00/0000:00:07.0/ata1/host0/scsi_host/host0 >> 29893 0 lrwxrwxrwx 1 root system 0 Oct 27 >> 18:03 /sys/class/scsi_host/host9 -> >> ../../devices/pci0000:80/0000:80:0e.0/0000:90:00.0/host9/scsi_host/host9 >> 23878 0 lrwxrwxrwx 1 root system 0 Oct 27 >> 17:41 /sys/class/scsi_host/host7 -> >> ../../devices/pci0000:80/0000:80:08.0/ata8/host7/scsi_host/host7 >> 23640 0 lrwxrwxrwx 1 root system 0 Oct 27 >> 17:41 /sys/class/scsi_host/host5 -> >> ../../devices/pci0000:80/0000:80:07.0/ata6/host5/scsi_host/host5 >> 23402 0 lrwxrwxrwx 1 root system 0 Oct 27 >> 17:41 /sys/class/scsi_host/host3 -> >> ../../devices/pci0000:00/0000:00:08.0/ata4/host3/scsi_host/host3 >> 23164 0 lrwxrwxrwx 1 root system 0 Oct 27 >> 17:41 /sys/class/scsi_host/host1 -> >> ../../devices/pci0000:00/0000:00:07.0/ata2/host1/scsi_host/host1 >> 29851 0 lrwxrwxrwx 1 root system 0 Oct 27 >> 18:03 /sys/class/scsi_host/host8 -> >> ../../devices/pci0000:00/0000:00:0e.0/0000:05:00.0/host8/scsi_host/host8 >> 23839 0 lrwxrwxrwx 1 root system 0 Oct 27 >> 17:41 /sys/class/scsi_host/host6 -> >> ../../devices/pci0000:80/0000:80:08.0/ata7/host6/scsi_host/host6 >> 23601 0 lrwxrwxrwx 1 root system 0 Oct 27 >> 17:41 /sys/class/scsi_host/host4 -> >> ../../devices/pci0000:80/0000:80:07.0/ata5/host4/scsi_host/host4 >> 23363 0 lrwxrwxrwx 1 root system 0 Oct 27 >> 17:41 /sys/class/scsi_host/host2 -> >> ../../devices/pci0000:00/0000:00:08.0/ata3/host2/scsi_host/host2 >> $ sudo -i tw_cli show >> >> Ctl Model (V)Ports Drives Units NotOpt RRate VRate BBU >> ------------------------------------------------------------------------ >> c8 9650SE-8LPML 8 8 1 0 5 1 OK >> c9 9690SA-8E 0 0 0 0 5 1 OK >> >> Enclosure Slots Drives Fans TSUnits PSUnits Alarms >> -------------------------------------------------------------- >> /c9/e0 16 0 3 1 2 1 > > So in this case `c8` is mapped to `/dev/twa1`, and `c9` to `/dev/twa0`. > > As we do not know of a way, to use `tw_cli` to find the correct mapping, > or another place, we rely on the implicit ordering, which – according to > my colleagues – has worked for over 15 years [1]. Here is the excerpt from the manual page for smartctl [2]. > --- end of manual page excerpt --- > 3ware,N - [FreeBSD and Linux only] the device consists of one or more ATA disks con‐ > nected to a 3ware RAID controller. The non-negative integer N (in the range from 0 to > 127 inclusive) denotes which disk on the controller is monitored. Use syntax such as: > smartctl -a -d 3ware,2 /dev/sda > smartctl -a -d 3ware,0 /dev/twe0 > smartctl -a -d 3ware,1 /dev/twa0 > smartctl -a -d 3ware,1 /dev/twl0 > The first two forms, which refer to devices /dev/sda-z and /dev/twe0-15, may be used > with 3ware series 6000, 7000, and 8000 series controllers that use the 3x-xxxx driver. > Note that the /dev/sda-z form is deprecated starting with the Linux 2.6 kernel series > and may not be supported by the Linux kernel in the near future. The final form, which > refers to devices /dev/twa0-15, must be used with 3ware 9000 series controllers, which > use the 3w-9xxx driver. > > The devices /dev/twl0-15 must be used with the 3ware/LSI 9750 series controllers which > use the 3w-sas driver. > > Note that if the special character device nodes /dev/twl?, /dev/twa? and /dev/twe? do > not exist, or exist with the incorrect major or minor numbers, smartctl will recreate > them on the fly. Typically /dev/twa0 refers to the first 9000-series controller, > /dev/twa1 refers to the second 9000 series controller, and so on. The /dev/twl0 > devices refers to the first 9750 series controller, /dev/twl1 resfers to the second > 9750 series controller, and so on. Likewise /dev/twe0 refers to the first > 6/7/8000-series controller, /dev/twe1 refers to the second 6/7/8000 series controller, > and so on. > > Note that for the 6/7/8000 controllers, any of the physical disks can be queried or > examined using any of the 3ware's SCSI logical device /dev/sd? entries. Thus, if log‐ > ical device /dev/sda is made up of two physical disks (3ware ports zero and one) and > logical device /dev/sdb is made up of two other physical disks (3ware ports two and > three) then you can examine the SMART data on any of the four physical disks using > either SCSI device /dev/sda or /dev/sdb. If you need to know which logical SCSI device > a particular physical disk (3ware port) is associated with, use the dmesg or SYSLOG > output to show which SCSI ID corresponds to a particular 3ware unit, and then use the > 3ware CLI or 3dm tool to determine which ports (physical disks) correspond to particu‐ > lar 3ware units. > > If the value of N corresponds to a port that does not exist on the 3ware controller, or > to a port that does not physically have a disk attached to it, the behavior of smartctl > depends upon the specific controller model, firmware, Linux kernel and platform. In > some cases you will get a warning message that the device does not exist. In other > cases you will be presented with ´void´ data for a non-existent device. > > Note that if the /dev/sd? addressing form is used, then older 3w-xxxx drivers do not > pass the "Enable Autosave" (´-S on´) and "Enable Automatic Offline" (´-o on´) commands > to the disk, and produce these types of harmless syslog error messages instead: > "3w-xxxx: tw_ioctl(): Passthru size (123392) too big". This can be fixed by upgrading > to version 1.02.00.037 or later of the 3w-xxxx driver, or by applying a patch to older > versions. Alternatively, use the character device /dev/twe0-15 interface. > > The selective self-test functions (´-t select,A-B´) are only supported using the char‐ > acter device interface /dev/twl0-15, /dev/twa0-15 and /dev/twe0-15. The necessary > WRITE LOG commands can not be passed through the SCSI interface. > --- end of manual page excerpt --- > Do you know of a way, to either get the mapping “over an API” so we > don’t have to rely on the implicit ordering? > > Otherwise, do you know, why the ordering has changed, and can this be > reverted? Kind regards, Paul Menzel > [1] https://www.thomas-krenn.com/de/wiki/Smartmontools_mit_3ware_RAID_Controller > (German) [2] https://www.smartmontools.org/browser/trunk/smartmontools/smartctl.8.in ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ordering problems with 3ware controller 2016-11-08 10:07 Ordering problems with 3ware controller Paul Menzel 2016-11-08 11:09 ` Paul Menzel @ 2016-11-08 23:45 ` Martin K. Petersen 2016-11-09 9:08 ` Paul Menzel 1 sibling, 1 reply; 8+ messages in thread From: Martin K. Petersen @ 2016-11-08 23:45 UTC (permalink / raw) To: Paul Menzel; +Cc: linux-scsi, Adam Radford >>>>> "Paul" == Paul Menzel <pmenzel@molgen.mpg.de> writes: Paul, Paul> Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the Paul> 3ware devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to Paul> the controllers differently. Paul> This unfortunately breaks quite a lot of our scripts, as we depend Paul> on the fact that the first controller is also in the front. It's not the 3ware drivers since they have not been updated in a long time (since way before 4.4). Linux does not provide device discovery ordering guarantees. You need to fix your scripts to use UUIDs, filesystem labels, or DM devices to get stable naming. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ordering problems with 3ware controller 2016-11-08 23:45 ` Martin K. Petersen @ 2016-11-09 9:08 ` Paul Menzel 2016-11-10 13:59 ` Martin K. Petersen 0 siblings, 1 reply; 8+ messages in thread From: Paul Menzel @ 2016-11-09 9:08 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, Adam Radford, dvteam [-- Attachment #1: Type: text/plain, Size: 3169 bytes --] Dear Martin, On 11/09/16 00:45, Martin K. Petersen wrote: >>>>>> "Paul" == Paul Menzel <pmenzel@molgen.mpg.de> writes: > Paul> Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the > Paul> 3ware devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to > Paul> the controllers differently. > > Paul> This unfortunately breaks quite a lot of our scripts, as we depend > Paul> on the fact that the first controller is also in the front. > > It's not the 3ware drivers since they have not been updated in a long > time (since way before 4.4). Yes, that’s what made me wonder too. > Linux does not provide device discovery ordering guarantees. You need to > fix your scripts to use UUIDs, filesystem labels, or DM devices to get > stable naming. Indeed. But it worked for several years, so that *something* must have changed that the ordering of the result of `getdents64` is different now. Fixing the scripts is unfortunately not that easy, as `tw_cli` is a proprietary tool [1], and we do not have the sources. It does a `readdir()`. > open("/proc/scsi/3w-9xxx", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = -1 ENOENT (No such file or directory) > open("/sys/class/scsi_host", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3 > fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > fcntl(3, F_SETFD, FD_CLOEXEC) = 0 > getdents64(3, /* 12 entries */, 4096) = 368 > stat("/sys/class/scsi_host/host0/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) > stat("/sys/class/scsi_host/host9/stats", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 > open("/sys/class/scsi_host/host9/stats", O_RDONLY) = 4 > read(4, "3w-9xxx Driver v", 16) = 16 > close(4) = 0 > open("/dev/twa0", O_RDWR) = 4 > close(4) = 0 > stat("/sys/class/scsi_host/host7/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) > stat("/sys/class/scsi_host/host5/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) > stat("/sys/class/scsi_host/host3/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) > stat("/sys/class/scsi_host/host1/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) > stat("/sys/class/scsi_host/host8/stats", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 > open("/sys/class/scsi_host/host8/stats", O_RDONLY) = 4 > read(4, "3w-9xxx Driver v", 16) = 16 > close(4) = 0 > open("/dev/twa1", O_RDWR) = 4 > close(4) = 0 > stat("/sys/class/scsi_host/host6/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) > stat("/sys/class/scsi_host/host4/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) > stat("/sys/class/scsi_host/host2/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) > getdents64(3, /* 0 entries */, 4096) = 0 > close(3) = 0 > open("/proc/devices", O_RDONLY) = 3 Please find attached a wrapper from my colleague, using name spaces to ensure the ordering, that `tw_cli` expects. Kind regards, Paul [1] https://wiki.hetzner.de/index.php/3Ware_RAID_Controller/en [-- Attachment #2: tw_wrapcli --] [-- Type: text/plain, Size: 912 bytes --] #! /usr/bin/perl use strict; use warnings; sub sort_host { my ($n1,$n2); ($n1)=$a=~/^host(\d+)$/ and ($n2)=$b=~/^host(\d+)$/ and return $n1 <=> $n2; return $a cmp $b; } our $SYS_unshare=272; # /usr/include/asm/unistd_64.h our $CLONE_NEWNS=0x20000; # /usr/include/linux/sched.h my $pid=fork; defined $pid or die "$!\n"; unless ($pid) { opendir my $d,"/sys/class/scsi_host"; my @names=sort sort_host grep !/^\.\.?$/,readdir $d; syscall($SYS_unshare,$CLONE_NEWNS) and die "$!\n"; -d '/tmp/sysfs' or mkdir("/tmp/sysfs") or die "/tmp/sysfs: $!\n"; system 'mount','-tsysfs','BLA','/tmp/sysfs' and exit 1; system 'mount','-ttmpfs','BLA','/sys/class/scsi_host' and exit 1; for my $name (reverse @names) { symlink("/tmp/sysfs/class/scsi_host/$name","/sys/class/scsi_host/$name") or die "/sys/class/scsi_host/$name: $!\n"; } exec '/root/bin/tw_cli.exe',@ARGV; die "$!\n"; } wait; $? and exit 1; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ordering problems with 3ware controller 2016-11-09 9:08 ` Paul Menzel @ 2016-11-10 13:59 ` Martin K. Petersen 2016-11-16 21:24 ` Donald Buczek 0 siblings, 1 reply; 8+ messages in thread From: Martin K. Petersen @ 2016-11-10 13:59 UTC (permalink / raw) To: Paul Menzel; +Cc: Martin K. Petersen, linux-scsi, dvteam >>>>> "Paul" == Paul Menzel <pmenzel@molgen.mpg.de> writes: Paul, >> Linux does not provide device discovery ordering guarantees. You need >> to fix your scripts to use UUIDs, filesystem labels, or DM devices to >> get stable naming. Paul> Indeed. But it worked for several years, so that *something* must Paul> have changed that the ordering of the result of `getdents64` is Paul> different now. Could be changes in the PCI or platform code that causes things to be enumerated differently. Whatever it is, it has nothing to do with the 3ware drivers themselves since they have been dormant for a long time. Paul> Fixing the scripts is unfortunately not that easy, as `tw_cli` is Paul> a proprietary tool [1], and we do not have the sources. It does a Paul> `readdir()`. That's beside the point. Whatever you are doing that needs to address a specific physical controller needs to use a different scheme than discovery order to do so. Don't know if these cards provide a device identification VPD or something with a serial number that you could key off of? -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ordering problems with 3ware controller 2016-11-10 13:59 ` Martin K. Petersen @ 2016-11-16 21:24 ` Donald Buczek 2016-11-17 14:55 ` Paul Menzel 0 siblings, 1 reply; 8+ messages in thread From: Donald Buczek @ 2016-11-16 21:24 UTC (permalink / raw) To: dvteam, Paul Menzel; +Cc: Martin K. Petersen, linux-scsi [-- Attachment #1: Type: text/plain, Size: 2950 bytes --] On 10.11.2016 14:59, Martin K. Petersen wrote: >>>>>> "Paul" == Paul Menzel <pmenzel@molgen.mpg.de> writes: > Paul, > >>> Linux does not provide device discovery ordering guarantees. You need >>> to fix your scripts to use UUIDs, filesystem labels, or DM devices to >>> get stable naming. > Paul> Indeed. But it worked for several years, so that *something* must > Paul> have changed that the ordering of the result of `getdents64` is > Paul> different now. > > Could be changes in the PCI or platform code that causes things to be > enumerated differently. Whatever it is, it has nothing to do with the > 3ware drivers themselves since they have been dormant for a long time. > Right. We further tracked it down. In fact its not a matter of driver initialization order but of the way sysfs/kernfs hashes its object names and thereby defines the order of names returned by getdents64 calls. In fs/kernfs/dir.h the names are inserted into a red-black tree ordered by the hashes over their names (and possibly namespace pointer, which in our case is zero). I've walked the rbtrees of the kernfs_node structs from /sys/class/scsi_host showing their addresses, the hash values and the names in a 4.4.27 system: root:cu:/home/buczek/autofs/# ./peek-3w ffff88046d847640 : 11bf1ddd : host0 ffff88046c56d3e8 : 11bf1e8d : host1 ffff88046c571c58 : 11bf1f3d : host2 ffff88046c572550 : 11bf1fed : host3 ffff88046c577dc0 : 11bf209d : host4 ffff88046a4bbaf0 : 11bf214d : host5 As can be seen, in 4.4 the hash algorithm happened to produce increasing hash values for names like "host0","host1","host2",... In 4.8.6 the hash values seem to be more random: root:gynaekophobie:/home/buczek/autofs/# ./peek-3w ffff88041df9a7f8 : 074af64b : host0 ffff88081db40528 : 1009cd9b : host9 ffff88041d3fba50 : 1c512bfb : host7 ffff88181d19c000 : 28988a5b : host5 ffff88041df5a780 : 34dfe8bb : host3 ffff88041d3f5e10 : 4127471b : host1 ffff88041ccbd258 : 562d7ccb : host8 ffff88201cd5f960 : 6274db2b : host6 ffff88141e2d0ca8 : 6ebc398b : host4 ffff88041df599d8 : 7b0397eb : host2 The relevant commit is 703b5fa which includes static inline unsigned long end_name_hash(unsigned long hash) { - return (unsigned int)hash; + return __hash_32((unsigned int)hash); } __hash_32 is a multiplication by 0x61C88647 ( hash.h ) And this exactly is the difference between the hash value of "host0" on the 4.4 and the 4.8 system: DB<2> x sprintf '%x',0x11bf1ddd*0x61C88647 0 '6c750ef074af64b' The bug, of course, is in the userspace tool tw_cli which wrongly assumes that the names would be returned in the "right" order by getdents. As a dirty workaround, I've created a new wrapper, which uses ptrace to pause the program on return from SYS_getdents64 and sorts the values returned from the system call in the memory of the target process. I append the source of the wrapper. -- Donald Buczek buczek@molgen.mpg.de Tel: +49 30 8413 1433 [-- Attachment #2: tw_cli.c --] [-- Type: text/x-csrc, Size: 5722 bytes --] #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <unistd.h> #include <sys/syscall.h> /* For SYS_xxx definitions */ #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/ptrace.h> #include <errno.h> #include <signal.h> #include <sys/user.h> #include <syscall.h> #include <fcntl.h> #include <stdint.h> #include <string.h> #include <stdarg.h> #include <regex.h> typedef uint64_t u64; typedef int64_t s64; /* from include/linux/dirent.h : */ struct linux_dirent64 { u64 d_ino; s64 d_off; unsigned short d_reclen; unsigned char d_type; char d_name[0]; }; void die(char *fmt,...) { va_list ap; va_start(ap,fmt); vfprintf(stderr,fmt,ap); exit(1); } void die_regerror(int status,regex_t *re) { char msg[80]; int s; s=regerror(status,re,msg,sizeof(msg)); die("regex: %s\n",msg); } int hostnum(char *hostname) { static regex_t *re=NULL; int status; regmatch_t match[2]; if (!re) { re=malloc(sizeof(*re)); if (!re) die(""); status=regcomp(re,"^host([0-9]+)$",REG_EXTENDED); if (status) die_regerror(status,re); } status=regexec(re,hostname,sizeof(match)/sizeof(*match),match,0); if (status==0) { char c=hostname[match[1].rm_eo]; match[1].rm_eo='\0'; int num=atoi(&hostname[match[1].rm_so]); match[1].rm_eo=c; return(num); } else if (status==REG_NOMATCH) { return(-1); } else { die_regerror(status,re); } } struct sortentry { struct linux_dirent64 *dirent; int hostnum; }; int compare_sortentry(const void *vp1,const void *vp2) { struct sortentry *p1=(struct sortentry *)vp1; struct sortentry *p2=(struct sortentry *)vp2; if (p1->hostnum!=-1 && p2->hostnum!=-1) { return p1->hostnum<p2->hostnum ? -1 : p1->hostnum>p2->hostnum ? 1 : 0; } return strcmp(p1->dirent->d_name,p2->dirent->d_name); } void fix_memory(pid_t pid,size_t count,void *dirp) { char *memfilename; int fd; char *dirents_unsorted,*dirents_sorted; struct sortentry *sort_array; struct sortentry *sort_entry; size_t s; int entry_count; int bpos; int i; struct linux_dirent64 *d; if (count==0) return; if (asprintf(&memfilename,"/proc/%d/mem",pid)==-1) die("%m\n"); fd=open(memfilename,O_RDWR); if (fd==-1) die (memfilename); dirents_unsorted=malloc(count); if(!dirents_unsorted) die (""); if (lseek(fd,(off_t)dirp,SEEK_SET)<-1) die("%s: %m\n",memfilename); s=read(fd,dirents_unsorted,count); if (s == -1) die("%s: %m\n",memfilename); if (s != count) die("short reads on childs memory not implemented"); entry_count=0; for (bpos=0;bpos<count;) { d = (struct linux_dirent64 *) (dirents_unsorted + bpos); entry_count++; bpos+=d->d_reclen; } sort_array=malloc(entry_count*sizeof (*sort_array)); if (!sort_array) die (""); sort_entry=sort_array; for (bpos=0;bpos<count;) { d = (struct linux_dirent64 *) (dirents_unsorted + bpos); sort_entry->dirent=d; sort_entry->hostnum=hostnum(d->d_name); sort_entry++; bpos+=d->d_reclen; } // for (i=0;i<entry_count;i++) { printf("ary[%d] : %p : %s : %d\n",i,sort_array[i].dirent,sort_array[i].dirent->d_name,sort_array[i].hostnum); } qsort(sort_array,entry_count,sizeof(*sort_array),compare_sortentry); // for (i=0;i<entry_count;i++) { printf("ary[%d] : %p : %s : %d\n",i,sort_array[i].dirent,sort_array[i].dirent->d_name,sort_array[i].hostnum); } dirents_sorted=malloc(count); if(!dirents_sorted) die ("%m\n"); bpos=0; for (i=0;i<entry_count;i++) { d = (struct linux_dirent64 *) (dirents_sorted + bpos); memcpy(d,sort_array[i].dirent,sort_array[i].dirent->d_reclen); bpos+=sort_array[i].dirent->d_reclen; } // for (bpos=0;bpos<count;) { d = (struct linux_dirent64 *) (dirents_sorted + bpos); printf(" --> %s\n",d->d_name); bpos+=d->d_reclen; } if (lseek(fd,(off_t)dirp,SEEK_SET)<-1) die("%s: %m\n",memfilename); s=write(fd,dirents_sorted,count); if (s == -1) die(memfilename); if (s != count) die("internal error: short write"); close(fd); free(memfilename); free(dirents_unsorted); free(dirents_sorted); free(sort_array); } int main(int argc, char **argv) { pid_t pid; int status; int syscall_state=0; struct user user; static const char *TW_CLI_ORIG="/root/bin/tw_cli.exe"; pid=fork(); if (pid==0) { if (ptrace(PTRACE_TRACEME,NULL,NULL)==-1) die("ptrace: %m\n"); execv(TW_CLI_ORIG,argv); die("%s: %m\n",TW_CLI_ORIG); } else if (pid==-1) { die("fork: %m\n"); } while(1) { pid=wait(&status); if (pid==-1) die("wait: %m\n"); if (WIFSIGNALED(status)) { int signal=WTERMSIG(status); die("child got signal %d - exiting\n",signal); } else if (WIFSTOPPED(status)) { int signal=WSTOPSIG(status); if (signal==SIGTRAP) { if(ptrace(PTRACE_SETOPTIONS,pid,NULL,PTRACE_O_TRACESYSGOOD)==-1) die("ptrace: %m\n"); if(ptrace(PTRACE_SYSCALL,pid,NULL,NULL)==-1) die ("ptrace: %m\n"); } else if (signal==SIGTRAP|0x80) { if (syscall_state==1) { if(ptrace(PTRACE_GETREGS,pid,NULL,&user)==-1) die("ptrace: %m\n");; /* the 0xFF is not right but rax is 0xffffffffffffffda , _NR_getdents64 is 217 which is 0xda * and the syscall interface somehow fixes is */ if ((unsigned char)(user.regs.orig_rax & 0xFF) == SYS_getdents64) { fix_memory(pid,(int)user.regs.rax,(void *)user.regs.rsi); } } syscall_state=1-syscall_state; if(ptrace(PTRACE_SYSCALL,pid,NULL,NULL)==-1) die("ptrace: %m\n");; } else { die("child stopped by signal %d - exiting\n",signal); } } else if (WIFEXITED(status)) { if (WEXITSTATUS(status)) { exit(1); } else { exit(0); } } else { die("unexpected return from wait. status=%08x - exiting\n",status); } } } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ordering problems with 3ware controller 2016-11-16 21:24 ` Donald Buczek @ 2016-11-17 14:55 ` Paul Menzel 2016-11-17 19:55 ` Donald Buczek 0 siblings, 1 reply; 8+ messages in thread From: Paul Menzel @ 2016-11-17 14:55 UTC (permalink / raw) To: Donald Buczek, dvteam, George Spelvin Cc: Martin K. Petersen, linux-scsi, linux-kernel Dear Linux folks, On 11/16/16 22:24, Donald Buczek wrote: > On 10.11.2016 14:59, Martin K. Petersen wrote: >>>>>>> "Paul" == Paul Menzel <pmenzel@molgen.mpg.de> writes: >>>> Linux does not provide device discovery ordering guarantees. You need >>>> to fix your scripts to use UUIDs, filesystem labels, or DM devices to >>>> get stable naming. >> Paul> Indeed. But it worked for several years, so that *something* must >> Paul> have changed that the ordering of the result of `getdents64` is >> Paul> different now. >> >> Could be changes in the PCI or platform code that causes things to be >> enumerated differently. Whatever it is, it has nothing to do with the >> 3ware drivers themselves since they have been dormant for a long time. >> > > Right. We further tracked it down. In fact its not a matter of driver > initialization order but of the way sysfs/kernfs hashes its object names > and thereby defines the order of names returned by getdents64 calls. In > fs/kernfs/dir.h the names are inserted into a red-black tree ordered by > the hashes over their names (and possibly namespace pointer, which in > our case is zero). > > I've walked the rbtrees of the kernfs_node structs from > /sys/class/scsi_host showing their addresses, the hash values and the > names in a 4.4.27 system: > > root:cu:/home/buczek/autofs/# ./peek-3w > > ffff88046d847640 : 11bf1ddd : host0 > ffff88046c56d3e8 : 11bf1e8d : host1 > ffff88046c571c58 : 11bf1f3d : host2 > ffff88046c572550 : 11bf1fed : host3 > ffff88046c577dc0 : 11bf209d : host4 > ffff88046a4bbaf0 : 11bf214d : host5 > > As can be seen, in 4.4 the hash algorithm happened to produce increasing > hash values for names like "host0","host1","host2",... In 4.8.6 the hash > values seem to be more random: > > root:gynaekophobie:/home/buczek/autofs/# ./peek-3w > > ffff88041df9a7f8 : 074af64b : host0 > ffff88081db40528 : 1009cd9b : host9 > ffff88041d3fba50 : 1c512bfb : host7 > ffff88181d19c000 : 28988a5b : host5 > ffff88041df5a780 : 34dfe8bb : host3 > ffff88041d3f5e10 : 4127471b : host1 > ffff88041ccbd258 : 562d7ccb : host8 > ffff88201cd5f960 : 6274db2b : host6 > ffff88141e2d0ca8 : 6ebc398b : host4 > ffff88041df599d8 : 7b0397eb : host2 > > The relevant commit is 703b5fa which includes The commit message summary is *fs/dcache.c: Save one 32-bit multiply in dcache lookup*. > static inline unsigned long end_name_hash(unsigned long hash) > { > - return (unsigned int)hash; > + return __hash_32((unsigned int)hash); > } > > __hash_32 is a multiplication by 0x61C88647 ( hash.h ) > > And this exactly is the difference between the hash value of "host0" on > the 4.4 and the 4.8 system: > > DB<2> x sprintf '%x',0x11bf1ddd*0x61C88647 > 0 '6c750ef074af64b' > > The bug, of course, is in the userspace tool tw_cli which wrongly > assumes that the names would be returned in the "right" order by getdents. Nice analysis. Unfortunately, I don’t find the discussion of the patch on the Linux kernel mailing list. Searching for the summary only brings up *screen rotation flipped in 4.8-rc* [1]. > As a dirty workaround, I've created a new wrapper, which uses ptrace to > pause the program on return from SYS_getdents64 and sorts the values > returned from the system call in the memory of the target process. > > I append the source of the wrapper. Kind regards, Paul [1] https://lkml.org/lkml/2016/8/30/739 "screen rotation flipped in 4.8-rc" ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ordering problems with 3ware controller 2016-11-17 14:55 ` Paul Menzel @ 2016-11-17 19:55 ` Donald Buczek 0 siblings, 0 replies; 8+ messages in thread From: Donald Buczek @ 2016-11-17 19:55 UTC (permalink / raw) To: Paul Menzel, dvteam, George Spelvin Cc: Martin K. Petersen, linux-scsi, linux-kernel On 17.11.2016 15:55, Paul Menzel wrote: > Dear Linux folks, > > > On 11/16/16 22:24, Donald Buczek wrote: >> >> The relevant commit is 703b5fa which includes > > The commit message summary is *fs/dcache.c: Save one 32-bit multiply > in dcache lookup*. > >> static inline unsigned long end_name_hash(unsigned long hash) >> { >> - return (unsigned int)hash; >> + return __hash_32((unsigned int)hash); >> } >> >> __hash_32 is a multiplication by 0x61C88647 ( hash.h ) >> >> And this exactly is the difference between the hash value of "host0" on >> the 4.4 and the 4.8 system: >> >> DB<2> x sprintf '%x',0x11bf1ddd*0x61C88647 >> 0 '6c750ef074af64b' >> >> The bug, of course, is in the userspace tool tw_cli which wrongly >> assumes that the names would be returned in the "right" order by >> getdents. > > Nice analysis. > > Unfortunately, I don’t find the discussion of the patch on the Linux > kernel mailing list. 703b5fa sits on top of 8387ff2 from Linus Torvalds. Maybe he didn't send his own suggestion to the lists but to the three people named in that commit only. Maybe George Spelvin replied with his patch as an improvement and Linus just accepted it on his own branch and merged (554828e). Donald -- Donald Buczek buczek@molgen.mpg.de Tel: +49 30 8413 1433 ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-11-17 19:55 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-11-08 10:07 Ordering problems with 3ware controller Paul Menzel 2016-11-08 11:09 ` Paul Menzel 2016-11-08 23:45 ` Martin K. Petersen 2016-11-09 9:08 ` Paul Menzel 2016-11-10 13:59 ` Martin K. Petersen 2016-11-16 21:24 ` Donald Buczek 2016-11-17 14:55 ` Paul Menzel 2016-11-17 19:55 ` Donald Buczek
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.