From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Stanley Date: Tue, 18 Jan 2011 03:38:35 +0000 Subject: Re: Early-boot kernel panics from udev-165/extras/ata_id/ata_id.c Message-Id: <4D350B3B.8070006@verizon.net> List-Id: References: <4D263BF6.6050305@verizon.net> In-Reply-To: <4D263BF6.6050305@verizon.net> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-hotplug@vger.kernel.org Some additional info. If I modify test-identify-packet.c to align on _SC_PAGESIZE (4096 byte) rather than 512, as shown below, then run the loop: offset=${1:-0} increm=${2:-1} while [ $((offset+=$increm)) -lt $((4096-511)) ]; do echo -e "+++ ./test-identify-packetpage /dev/sr0 $offset\n" ./test-identify-packet-page /dev/sr0 $offset sleep 0.5 done no panics occur, for every offset. 512 -> pagesize modification: --- test-identify-packet.c 2011-01-17 13:47:25.000000000 -0500 +++ test-identify-packet-page.c 2011-01-17 22:27:11.293999984 -0500 @@ -99,7 +99,8 @@ int main(int argc, char *argv[]) { - char buf[2048]; + int pgsz = sysconf(_SC_PAGESIZE); + char buf[pgsz<<1]; char *id; char *path; int offset = 0; @@ -116,7 +117,7 @@ if (argc > 2) offset = atoi(argv[2]); - if (offset < 0 || offset > 512) { + if (offset < 0 || offset > pgsz-512) { fprintf(stderr, "offset out of range\n"); return 1; } @@ -133,7 +134,7 @@ return 1; } - id = (void *)((((unsigned long)buf + 511) & ~511) + offset); + id = (void *)((((unsigned long)buf + pgsz-1) & ~(pgsz-1)) + offset); printf("id buffer=%p\n", id); disk_identify_packet_device_command(fd, id, 512); John On 01/17/2011 10:27 AM, Tejun Heo wrote: > Hello, > > On Sun, Jan 16, 2011 at 11:03:06PM -0500, John Stanley wrote: >> The kernel-panic, which occurs at boot-time in udev/ata_id.c when >> issuing an ioctl SG_IO sg3 SCSI ATA Pass-through Identify command, >> appears to arise from DMA'ing into an incorrectly aligned user data >> buffer pointed to by sg_io_hdr.dxferp . > The problem is that nobody is DMA'ing in this case. The driver in > question is ata_piix and the IO path taken is an actual PIO where the > CPU reads from the IO space and writes to the memory itself. > >> My guess is that in the past, use of sg3 would not involve DMA by >> default, but now, with libata ATA Pass-Through commands, it does (I >> also may be totally wrong about that, just a thought). > No DMA in progress here. The only (somewhat) recent related change > would be libata PIO path now using 32bit IO commands when supported by > the controller, but I fail to see how that would trigger this type of > failures. > >> I recall documentaion somewhere which emphasized that if direct I/O >> (DMA) is to used in sg, one should page-align the SCSI response data >> buffer.. With sg using indirect I/O this wouldn't be necessary, of >> course, but perhaps now with libata, it is. Just guessing here. > If the buffer is not aligned, the kernel would just create a bounce > buffer and bounce the data, so it shouldn't be a problem either. It > looks like we have an obscure bug in buffer mapping code for SG_IO. > > I tried several things but can't reproduce the problem here. Can you > please try the attached minimal test case? It issues IDENTIFY_PACKET > and you can specify the alignment offset. By default the buffer would > be 512byte aligned but you can offset it. ie. specifying 1 would make > the buffer misaligned by 1 byte and so on. > > Can you please see whether the problem can be reliably triggered with > it? Also, please, > > * Attach full kernel log (including boot messages) and the program > output after triggering the problem. > > * Make sure the kernel is built with debug info and frame pointer. > > * Please reverse map the reported oops address to the source line. > > Thanks. >