* linux-next end of partition problems? @ 2009-06-02 23:42 Luck, Tony 2009-06-03 4:03 ` Robert Hancock 0 siblings, 1 reply; 6+ messages in thread From: Luck, Tony @ 2009-06-02 23:42 UTC (permalink / raw) To: linux-kernel I noticed an odd message in the kernel log while booting linux-next (tag: next-20090602[*]) attempt to access beyond end of device sdb3: rw=0, want=31664120, limit=31664115 sdb3 is an ext3 filesystem mounted as /home $ grep sdb3 /proc/partitions 8 19 15832057 sdb3 I think /proc/partitions in in KBytes ... so the block (512 Byte) count for this partition is 2*15832057 = 31664114 ... so the "limit" in the console log looks reasonable, and since the "want" is a bigger number, it does seem that we are trying to access beyond the device. BUT ... I don't get this message when booting a kernel built from Linus' tree. I see other weird stuff too. Running a Linus kernel I get: $ dd if=/dev/sdb3 of=/dev/null bs=1024 15832057+1 records in 15832057+1 records out this neatly matches the reported size in /proc/partitions, and no messages on the console. With linux-next I see: dd if=/dev/sdb3 of=/dev/null bs=1024 dd: reading `/dev/sdb3': Input/output error 15831936+0 records in 15831936+0 records out and the same attempt to access beyond end of device sdb3: rw=0, want=31664120, limit=31664115 Note that on linux-next the "dd" gets the I/O error 121 blocks earlier than we see end-of-file for this partition on the Linus kernel. -Tony [*] grepping through boot logs I see this message appears in earlier tags too. All builds since next-20090428 show this problem (but I wasn't being very diligent in April ... the previous build for which I have a recorded console log was next-20090417). ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: linux-next end of partition problems? 2009-06-02 23:42 linux-next end of partition problems? Luck, Tony @ 2009-06-03 4:03 ` Robert Hancock 2009-06-03 21:43 ` Luck, Tony 0 siblings, 1 reply; 6+ messages in thread From: Robert Hancock @ 2009-06-03 4:03 UTC (permalink / raw) To: Luck, Tony; +Cc: linux-kernel Luck, Tony wrote: > I noticed an odd message in the kernel log while booting > linux-next (tag: next-20090602[*]) > > attempt to access beyond end of device > sdb3: rw=0, want=31664120, limit=31664115 > > sdb3 is an ext3 filesystem mounted as /home > > $ grep sdb3 /proc/partitions > 8 19 15832057 sdb3 > > I think /proc/partitions in in KBytes ... so the block (512 Byte) > count for this partition is 2*15832057 = 31664114 ... so the > "limit" in the console log looks reasonable, and since the "want" > is a bigger number, it does seem that we are trying to access > beyond the device. > > BUT ... I don't get this message when booting a kernel built > from Linus' tree. What kind of controller/drive is this? Full dmesg output would be useful.. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: linux-next end of partition problems? 2009-06-03 4:03 ` Robert Hancock @ 2009-06-03 21:43 ` Luck, Tony 2009-06-04 13:11 ` Jeff Moyer 0 siblings, 1 reply; 6+ messages in thread From: Luck, Tony @ 2009-06-03 21:43 UTC (permalink / raw) To: Robert Hancock; +Cc: linux-kernel, Jeff Moyer > What kind of controller/drive is this? lspci says the controller is: 06:02.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) console log says drive is: scsi 0:0:1:0: Direct-Access SEAGATE ST318406LC 010A PQ: 0 ANSI: 3 target0:0:1: Beginning Domain Validation target0:0:1: Ending Domain Validation target0:0:1: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 63) sd 0:0:1:0: [sdb] 35843670 512-byte hardware sectors: (18.3 GB/17.0 GiB) sd 0:0:1:0: [sdb] Write Protect is off sd 0:0:1:0: [sdb] Mode Sense: 9f 00 10 08 scsi 0:0:6:0: Processor ESG-SHV SCA HSBP M17 1.0D PQ: 0 ANSI: 2 target0:0:6: Beginning Domain Validation sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA target0:0:6: Ending Domain Validation target0:0:6: asynchronous sdb: sdb1 sdb2 sdb3 sd 0:0:1:0: [sdb] Attached SCSI disk A git bisect between v2.6.30-rc7(good) and next-20090602(bad) points the finger at this commit (and reverting this change from next-20090602 confirms it introduces this problem): commit db2dbb12dc47a50c7a4c5678f526014063e486f6 Author: Jeff Moyer <jmoyer@redhat.com> Date: Wed Apr 22 14:08:13 2009 +0200 block: implement blkdev_readpages Doing a proper block dev ->readpages() speeds up the crazy dump(8) approach of using interleaved process IO. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> diff --git a/fs/block_dev.c b/fs/block_dev.c index f45dbc1..a85fe31 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -331,6 +331,12 @@ static int blkdev_readpage(struct file * file, struct page * page) return block_read_full_page(page, blkdev_get_block); } +static int blkdev_readpages(struct file *file, struct address_space *mapping, + struct list_head *pages, unsigned nr_pages) +{ + return mpage_readpages(mapping, pages, nr_pages, blkdev_get_block); +} + static int blkdev_write_begin(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned flags, struct page **pagep, void **fsdata) @@ -1399,6 +1405,7 @@ static int blkdev_releasepage(struct page *page, gfp_t wait) static const struct address_space_operations def_blk_aops = { .readpage = blkdev_readpage, + .readpages = blkdev_readpages, .writepage = blkdev_writepage, .sync_page = block_sync_page, .write_begin = blkdev_write_begin, On a random hunch, I wondered whether this error message was connected to the fact that ia64 kernel has a 64K page size. I re-built using a 4k pagesize ... and this also make the partition overrun message go away. So is it plausible that the blkdev_readpages() code is resulting in some readahead of a page that overlaps the partition end? The partition size (15832057 * 1K block according to /proc/partitions) is not a multiple of the 64K page size ... but then it isn't a multiple of 4K either :-( -Tony ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: linux-next end of partition problems? 2009-06-03 21:43 ` Luck, Tony @ 2009-06-04 13:11 ` Jeff Moyer 2009-06-04 20:33 ` Jens Axboe 0 siblings, 1 reply; 6+ messages in thread From: Jeff Moyer @ 2009-06-04 13:11 UTC (permalink / raw) To: Luck, Tony; +Cc: Robert Hancock, linux-kernel, Jens Axboe "Luck, Tony" <tony.luck@intel.com> writes: >> What kind of controller/drive is this? > > lspci says the controller is: > 06:02.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) > > console log says drive is: > scsi 0:0:1:0: Direct-Access SEAGATE ST318406LC 010A PQ: 0 ANSI: 3 > target0:0:1: Beginning Domain Validation > target0:0:1: Ending Domain Validation > target0:0:1: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 63) > sd 0:0:1:0: [sdb] 35843670 512-byte hardware sectors: (18.3 GB/17.0 GiB) > sd 0:0:1:0: [sdb] Write Protect is off > sd 0:0:1:0: [sdb] Mode Sense: 9f 00 10 08 > scsi 0:0:6:0: Processor ESG-SHV SCA HSBP M17 1.0D PQ: 0 ANSI: 2 > target0:0:6: Beginning Domain Validation > sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA > target0:0:6: Ending Domain Validation > target0:0:6: asynchronous > sdb: sdb1 sdb2 sdb3 > sd 0:0:1:0: [sdb] Attached SCSI disk > > A git bisect between v2.6.30-rc7(good) and next-20090602(bad) points > the finger at this commit (and reverting this change from next-20090602 > confirms it introduces this problem): > > > commit db2dbb12dc47a50c7a4c5678f526014063e486f6 > Author: Jeff Moyer <jmoyer@redhat.com> > Date: Wed Apr 22 14:08:13 2009 +0200 > > block: implement blkdev_readpages > > Doing a proper block dev ->readpages() speeds up the crazy dump(8) > approach of using interleaved process IO. > > Signed-off-by: Jeff Moyer <jmoyer@redhat.com> > Signed-off-by: Jens Axboe <jens.axboe@oracle.com> > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index f45dbc1..a85fe31 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -331,6 +331,12 @@ static int blkdev_readpage(struct file * file, struct page * page) > return block_read_full_page(page, blkdev_get_block); > } > > +static int blkdev_readpages(struct file *file, struct address_space *mapping, > + struct list_head *pages, unsigned nr_pages) > +{ > + return mpage_readpages(mapping, pages, nr_pages, blkdev_get_block); > +} > + > static int blkdev_write_begin(struct file *file, struct address_space *mapping, > loff_t pos, unsigned len, unsigned flags, > struct page **pagep, void **fsdata) > @@ -1399,6 +1405,7 @@ static int blkdev_releasepage(struct page *page, gfp_t wait) > > static const struct address_space_operations def_blk_aops = { > .readpage = blkdev_readpage, > + .readpages = blkdev_readpages, > .writepage = blkdev_writepage, > .sync_page = block_sync_page, > .write_begin = blkdev_write_begin, > > > On a random hunch, I wondered whether this error message was connected to > the fact that ia64 kernel has a 64K page size. I re-built using a 4k > pagesize ... and this also make the partition overrun message go away. > > So is it plausible that the blkdev_readpages() code is resulting in some > readahead of a page that overlaps the partition end? The partition size > (15832057 * 1K block according to /proc/partitions) is not a multiple of > the 64K page size ... but then it isn't a multiple of 4K either :-( Thanks for digging into this, Tony. I'll take a look at it today. Jens, you can feel free to pull this for now. I never did get you real data showing the improvement anyway, so I'll try to do that as well. Cheers, Jeff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: linux-next end of partition problems? 2009-06-04 13:11 ` Jeff Moyer @ 2009-06-04 20:33 ` Jens Axboe 2009-06-04 21:20 ` Jeff Moyer 0 siblings, 1 reply; 6+ messages in thread From: Jens Axboe @ 2009-06-04 20:33 UTC (permalink / raw) To: Jeff Moyer; +Cc: Luck, Tony, Robert Hancock, linux-kernel On Thu, Jun 04 2009, Jeff Moyer wrote: > "Luck, Tony" <tony.luck@intel.com> writes: > > >> What kind of controller/drive is this? > > > > lspci says the controller is: > > 06:02.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) > > > > console log says drive is: > > scsi 0:0:1:0: Direct-Access SEAGATE ST318406LC 010A PQ: 0 ANSI: 3 > > target0:0:1: Beginning Domain Validation > > target0:0:1: Ending Domain Validation > > target0:0:1: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 63) > > sd 0:0:1:0: [sdb] 35843670 512-byte hardware sectors: (18.3 GB/17.0 GiB) > > sd 0:0:1:0: [sdb] Write Protect is off > > sd 0:0:1:0: [sdb] Mode Sense: 9f 00 10 08 > > scsi 0:0:6:0: Processor ESG-SHV SCA HSBP M17 1.0D PQ: 0 ANSI: 2 > > target0:0:6: Beginning Domain Validation > > sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA > > target0:0:6: Ending Domain Validation > > target0:0:6: asynchronous > > sdb: sdb1 sdb2 sdb3 > > sd 0:0:1:0: [sdb] Attached SCSI disk > > > > A git bisect between v2.6.30-rc7(good) and next-20090602(bad) points > > the finger at this commit (and reverting this change from next-20090602 > > confirms it introduces this problem): > > > > > > commit db2dbb12dc47a50c7a4c5678f526014063e486f6 > > Author: Jeff Moyer <jmoyer@redhat.com> > > Date: Wed Apr 22 14:08:13 2009 +0200 > > > > block: implement blkdev_readpages > > > > Doing a proper block dev ->readpages() speeds up the crazy dump(8) > > approach of using interleaved process IO. > > > > Signed-off-by: Jeff Moyer <jmoyer@redhat.com> > > Signed-off-by: Jens Axboe <jens.axboe@oracle.com> > > > > diff --git a/fs/block_dev.c b/fs/block_dev.c > > index f45dbc1..a85fe31 100644 > > --- a/fs/block_dev.c > > +++ b/fs/block_dev.c > > @@ -331,6 +331,12 @@ static int blkdev_readpage(struct file * file, struct page * page) > > return block_read_full_page(page, blkdev_get_block); > > } > > > > +static int blkdev_readpages(struct file *file, struct address_space *mapping, > > + struct list_head *pages, unsigned nr_pages) > > +{ > > + return mpage_readpages(mapping, pages, nr_pages, blkdev_get_block); > > +} > > + > > static int blkdev_write_begin(struct file *file, struct address_space *mapping, > > loff_t pos, unsigned len, unsigned flags, > > struct page **pagep, void **fsdata) > > @@ -1399,6 +1405,7 @@ static int blkdev_releasepage(struct page *page, gfp_t wait) > > > > static const struct address_space_operations def_blk_aops = { > > .readpage = blkdev_readpage, > > + .readpages = blkdev_readpages, > > .writepage = blkdev_writepage, > > .sync_page = block_sync_page, > > .write_begin = blkdev_write_begin, > > > > > > On a random hunch, I wondered whether this error message was connected to > > the fact that ia64 kernel has a 64K page size. I re-built using a 4k > > pagesize ... and this also make the partition overrun message go away. > > > > So is it plausible that the blkdev_readpages() code is resulting in some > > readahead of a page that overlaps the partition end? The partition size > > (15832057 * 1K block according to /proc/partitions) is not a multiple of > > the 64K page size ... but then it isn't a multiple of 4K either :-( > > Thanks for digging into this, Tony. I'll take a look at it today. > Jens, you can feel free to pull this for now. I never did get you real > data showing the improvement anyway, so I'll try to do that as well. OK, I'll revert it for now. -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: linux-next end of partition problems? 2009-06-04 20:33 ` Jens Axboe @ 2009-06-04 21:20 ` Jeff Moyer 0 siblings, 0 replies; 6+ messages in thread From: Jeff Moyer @ 2009-06-04 21:20 UTC (permalink / raw) To: Jens Axboe; +Cc: Luck, Tony, Robert Hancock, linux-kernel Jens Axboe <jens.axboe@oracle.com> writes: > On Thu, Jun 04 2009, Jeff Moyer wrote: >> "Luck, Tony" <tony.luck@intel.com> writes: >> >> >> What kind of controller/drive is this? >> > >> > lspci says the controller is: >> > 06:02.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) >> > >> > console log says drive is: >> > scsi 0:0:1:0: Direct-Access SEAGATE ST318406LC 010A PQ: 0 ANSI: 3 >> > target0:0:1: Beginning Domain Validation >> > target0:0:1: Ending Domain Validation >> > target0:0:1: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 63) >> > sd 0:0:1:0: [sdb] 35843670 512-byte hardware sectors: (18.3 GB/17.0 GiB) >> > sd 0:0:1:0: [sdb] Write Protect is off >> > sd 0:0:1:0: [sdb] Mode Sense: 9f 00 10 08 >> > scsi 0:0:6:0: Processor ESG-SHV SCA HSBP M17 1.0D PQ: 0 ANSI: 2 >> > target0:0:6: Beginning Domain Validation >> > sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA >> > target0:0:6: Ending Domain Validation >> > target0:0:6: asynchronous >> > sdb: sdb1 sdb2 sdb3 >> > sd 0:0:1:0: [sdb] Attached SCSI disk >> > >> > A git bisect between v2.6.30-rc7(good) and next-20090602(bad) points >> > the finger at this commit (and reverting this change from next-20090602 >> > confirms it introduces this problem): >> > >> > >> > commit db2dbb12dc47a50c7a4c5678f526014063e486f6 >> > Author: Jeff Moyer <jmoyer@redhat.com> >> > Date: Wed Apr 22 14:08:13 2009 +0200 >> > >> > block: implement blkdev_readpages >> > >> > Doing a proper block dev ->readpages() speeds up the crazy dump(8) >> > approach of using interleaved process IO. >> > >> > Signed-off-by: Jeff Moyer <jmoyer@redhat.com> >> > Signed-off-by: Jens Axboe <jens.axboe@oracle.com> >> > >> > diff --git a/fs/block_dev.c b/fs/block_dev.c >> > index f45dbc1..a85fe31 100644 >> > --- a/fs/block_dev.c >> > +++ b/fs/block_dev.c >> > @@ -331,6 +331,12 @@ static int blkdev_readpage(struct file * file, struct page * page) >> > return block_read_full_page(page, blkdev_get_block); >> > } >> > >> > +static int blkdev_readpages(struct file *file, struct address_space *mapping, >> > + struct list_head *pages, unsigned nr_pages) >> > +{ >> > + return mpage_readpages(mapping, pages, nr_pages, blkdev_get_block); >> > +} >> > + >> > static int blkdev_write_begin(struct file *file, struct address_space *mapping, >> > loff_t pos, unsigned len, unsigned flags, >> > struct page **pagep, void **fsdata) >> > @@ -1399,6 +1405,7 @@ static int blkdev_releasepage(struct page *page, gfp_t wait) >> > >> > static const struct address_space_operations def_blk_aops = { >> > .readpage = blkdev_readpage, >> > + .readpages = blkdev_readpages, >> > .writepage = blkdev_writepage, >> > .sync_page = block_sync_page, >> > .write_begin = blkdev_write_begin, >> > >> > >> > On a random hunch, I wondered whether this error message was connected to >> > the fact that ia64 kernel has a 64K page size. I re-built using a 4k >> > pagesize ... and this also make the partition overrun message go away. >> > >> > So is it plausible that the blkdev_readpages() code is resulting in some >> > readahead of a page that overlaps the partition end? The partition size >> > (15832057 * 1K block according to /proc/partitions) is not a multiple of >> > the 64K page size ... but then it isn't a multiple of 4K either :-( >> >> Thanks for digging into this, Tony. I'll take a look at it today. >> Jens, you can feel free to pull this for now. I never did get you real >> data showing the improvement anyway, so I'll try to do that as well. > > OK, I'll revert it for now. You can keep it reverted... forever and ever. ;-) I'm certain this patch didn't have a *negative* impact when I sent it to you, but it sure causes problems now! (That's my story and I'm sticking to it!) Dump is ~48% slower with the patch applied when using deadline, ~25% slower when using cfq. This testing was done using a 4 disk stripe off of a CCISS controller. This doesn't make a whole lot of sense to me, though I don't have the bandwidth to go digging on this just now. Sorry for the headaches, and thanks for the report, Tony! Cheers, Jeff Dump average transfer rate for 32GB of data: | deadline | cfq --------+------------+----------- Vanilla | 87353 kB/s | 46132 kB/s Patched | 45756 kB/s | 34564 kB/s ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-06-04 21:20 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-06-02 23:42 linux-next end of partition problems? Luck, Tony 2009-06-03 4:03 ` Robert Hancock 2009-06-03 21:43 ` Luck, Tony 2009-06-04 13:11 ` Jeff Moyer 2009-06-04 20:33 ` Jens Axboe 2009-06-04 21:20 ` Jeff Moyer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox