All of lore.kernel.org
 help / color / mirror / Atom feed
* Merging fails reading /dev/uba1
@ 2005-02-21  4:00 Pete Zaitcev
  2005-02-21  7:51 ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Pete Zaitcev @ 2005-02-21  4:00 UTC (permalink / raw)
  To: axboe; +Cc: zaitcev, jgarzik, linux-kernel

Hi, Jens:

I think this question belongs to your domain, but please let me know
if I'm mistaken, so I can pursue this elsewhere.

I encountered a strange performance anomaly. I do the following:

<----- Plug USB key
[root@lembas ~]# time dd if=/dev/uba of=/dev/null bs=10k count=10240
10240+0 records in
10240+0 records out

real    0m22.731s
user    0m0.004s
sys     0m0.345s
[root@lembas ~]#

<----- Remove and replug the USB key
[root@lembas ~]# time dd if=/dev/uba1 of=/dev/null bs=10k count=10240
10240+0 records in
10240+0 records out

real    1m42.622s
user    0m0.005s
sys     0m1.518s
[root@lembas ~]#

So, reading from a partition of the same device is 5 times slower than
reading from the device itself. The question is, why?

To the best of my knowledge, this does not occur with SCSI (usb-storage
and sd or sr). This hints strongly that the ub is not doing something
right, but what that can be?

The ub takes the request processing machinery from Carmel exactly. I am
wondering if Carmel (sx8) exhibits any similar performance anomalies
(cc-ing to Jeff)

Additional information:

[root@lembas ~]# cat /proc/version
Linux version 2.6.11-rc4-lem (zaitcev@lembas) (gcc version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)) #1 Tue Feb 15 23:06:39 PST 2005
[root@lembas ~]# cat /proc/partitions
major minor  #blocks  name

   3     0   39070080 hda
   3     1    5935986 hda1
   3     2    5936017 hda2
   3     3     554242 hda3
   3     4          1 hda4
   3     5   26643771 hda5
 180     0    1024000 uba
 180     1    1023983 uba1
[root@lembas ~]#

Thanks,
-- Pete

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Merging fails reading /dev/uba1
  2005-02-21  4:00 Merging fails reading /dev/uba1 Pete Zaitcev
@ 2005-02-21  7:51 ` Jens Axboe
  2005-02-21 18:24   ` Pete Zaitcev
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2005-02-21  7:51 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: jgarzik, linux-kernel

On Sun, Feb 20 2005, Pete Zaitcev wrote:
> Hi, Jens:
> 
> I think this question belongs to your domain, but please let me know
> if I'm mistaken, so I can pursue this elsewhere.
> 
> I encountered a strange performance anomaly. I do the following:
> 
> <----- Plug USB key
> [root@lembas ~]# time dd if=/dev/uba of=/dev/null bs=10k count=10240
> 10240+0 records in
> 10240+0 records out
> 
> real    0m22.731s
> user    0m0.004s
> sys     0m0.345s
> [root@lembas ~]#
> 
> <----- Remove and replug the USB key
> [root@lembas ~]# time dd if=/dev/uba1 of=/dev/null bs=10k count=10240
> 10240+0 records in
> 10240+0 records out
> 
> real    1m42.622s
> user    0m0.005s
> sys     0m1.518s
> [root@lembas ~]#
> 
> So, reading from a partition of the same device is 5 times slower than
> reading from the device itself. The question is, why?
> 
> To the best of my knowledge, this does not occur with SCSI (usb-storage
> and sd or sr). This hints strongly that the ub is not doing something
> right, but what that can be?
> 
> The ub takes the request processing machinery from Carmel exactly. I am
> wondering if Carmel (sx8) exhibits any similar performance anomalies
> (cc-ing to Jeff)

I can't explain why the replugging slows it down, maybe you were lucky
to get contigious pages in the first case? As far as I can see, ub
effectively disables merging by setting max hw/phys segment limit of 1.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Merging fails reading /dev/uba1
  2005-02-21  7:51 ` Jens Axboe
@ 2005-02-21 18:24   ` Pete Zaitcev
  2005-02-21 18:31     ` Jeff Garzik
  2005-02-21 20:00     ` Linus Torvalds
  0 siblings, 2 replies; 7+ messages in thread
From: Pete Zaitcev @ 2005-02-21 18:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, zaitcev

On Mon, 21 Feb 2005 08:51:32 +0100, Jens Axboe <axboe@suse.de> wrote:

> > [root@lembas ~]# time dd if=/dev/uba of=/dev/null bs=10k count=10240
> > real    0m22.731s

> > [root@lembas ~]# time dd if=/dev/uba1 of=/dev/null bs=10k count=10240
> > real    1m42.622s

> > So, reading from a partition of the same device is 5 times slower than
> > reading from the device itself. The question is, why?

> I can't explain why the replugging slows it down, maybe you were lucky
> to get contigious pages in the first case? As far as I can see, ub
> effectively disables merging by setting max hw/phys segment limit of 1.

If you mean physical replugging, it has nothing to do with the issue.
I only mentioned it to show that old pages were purged.

Contiguous pages have nothing to do with it either. I forgot to mention
that in the first case (whole device), all reads are done with length of
4KB, while in the second case (partition), all reads are 512 bytes long.

Basically, the key is reading from a partition or not. It causes the
sub-page sized merging to fail.

This is how paritioning looks:

[root@lembas zaitcev]# fdisk /dev/uba

Command (m for help): p

Disk /dev/uba: 1048 MB, 1048576000 bytes
64 heads, 32 sectors/track, 1000 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/uba1   *           1        1000     1023983+   6  FAT16
Partition 1 has different physical/logical endings:
     phys=(998, 63, 32) logical=(999, 63, 31)

Command (m for help):

It does not look to me as if the partition started from an odd number
of sectors. In fact, it starts from a full number of pages.

The segment number hint was a good one. I can implement a fake s/g
capability easily within the driver, if this is suggested. But before
hacking on that, I'd like to note that I'm surprised how the block
layer is unable to coalesce sector-sized reads within a page. Also,
why does this depend on partitioning? Something is fishy here.

-- Pete

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Merging fails reading /dev/uba1
  2005-02-21 18:24   ` Pete Zaitcev
@ 2005-02-21 18:31     ` Jeff Garzik
  2005-02-21 20:00     ` Linus Torvalds
  1 sibling, 0 replies; 7+ messages in thread
From: Jeff Garzik @ 2005-02-21 18:31 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Jens Axboe, linux-kernel

On Mon, Feb 21, 2005 at 10:24:31AM -0800, Pete Zaitcev wrote:
> On Mon, 21 Feb 2005 08:51:32 +0100, Jens Axboe <axboe@suse.de> wrote:
> 
> > > [root@lembas ~]# time dd if=/dev/uba of=/dev/null bs=10k count=10240
> > > real    0m22.731s
> 
> > > [root@lembas ~]# time dd if=/dev/uba1 of=/dev/null bs=10k count=10240
> > > real    1m42.622s
> 
> > > So, reading from a partition of the same device is 5 times slower than
> > > reading from the device itself. The question is, why?
> 
> > I can't explain why the replugging slows it down, maybe you were lucky
> > to get contigious pages in the first case? As far as I can see, ub
> > effectively disables merging by setting max hw/phys segment limit of 1.
> 
> If you mean physical replugging, it has nothing to do with the issue.
> I only mentioned it to show that old pages were purged.
> 
> Contiguous pages have nothing to do with it either. I forgot to mention
> that in the first case (whole device), all reads are done with length of
> 4KB, while in the second case (partition), all reads are 512 bytes long.
> 
> Basically, the key is reading from a partition or not. It causes the
> sub-page sized merging to fail.

Does setting the blkdev's block size change things?

	Jeff




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Merging fails reading /dev/uba1
  2005-02-21 18:24   ` Pete Zaitcev
  2005-02-21 18:31     ` Jeff Garzik
@ 2005-02-21 20:00     ` Linus Torvalds
  2005-02-22  0:41       ` Pete Zaitcev
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2005-02-21 20:00 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Jens Axboe, linux-kernel



On Mon, 21 Feb 2005, Pete Zaitcev wrote:
> 
> Contiguous pages have nothing to do with it either. I forgot to mention
> that in the first case (whole device), all reads are done with length of
> 4KB, while in the second case (partition), all reads are 512 bytes long.

That's because your partition isn't a full 4kB in size.

So the kernel falls back to 512-byte reads, just because they are the only 
kind that _can_ read the last sector.

> Disk /dev/uba: 1048 MB, 1048576000 bytes

Note: this is a nice multiple of 4kB.

> 64 heads, 32 sectors/track, 1000 cylinders
> Units = cylinders of 2048 * 512 = 1048576 bytes
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/uba1   *           1        1000     1023983+   6  FAT16

And note how this is _not_ (see the "+" at the end), you've got a 
1023983.5 kB partition.

> It does not look to me as if the partition started from an odd number
> of sectors. In fact, it starts from a full number of pages.

But it seems to end in an odd number of sectors.

That said, I'm surprised that the difference in performance is _that_ 
large. Regardless of whether the disk blocksize is 512 bytes or 4096 
bytes, you should be getting IO merging - it might use more CPU time, but 
the actual IO should still be done in much larger blocks.

You should be able to try the BLKBSZSET ioctl to set the blocksize by hand 
if you want to try it out:

	int size = 4096;
	ioctl(fd, BLKBSZSET, &size);

or similar. Of course, mounting a filesystem on the device tends to do 
that (or undo it) for you, ie it will set the blocksize to whatever 
blocksize the filesystem wants.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Merging fails reading /dev/uba1
  2005-02-21 20:00     ` Linus Torvalds
@ 2005-02-22  0:41       ` Pete Zaitcev
  2005-02-22  1:48         ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Pete Zaitcev @ 2005-02-22  0:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, linux-kernel, zaitcev

On Mon, 21 Feb 2005 12:00:48 -0800 (PST), Linus Torvalds <torvalds@osdl.org> wrote:

> That said, I'm surprised that the difference in performance is _that_ 
> large. Regardless of whether the disk blocksize is 512 bytes or 4096 
> bytes, you should be getting IO merging - it might use more CPU time, but 
> the actual IO should still be done in much larger blocks.

I am surprised too. Jens says "ub effectively disables merging by setting
max hw/phys segment limit of 1." But surely this ought not to be a problem
for reads within the same page.

> 	int size = 4096;
> 	ioctl(fd, BLKBSZSET, &size);

Thank you for the tip. This works fine, 4KB I/O is restored for dd.
However, I still have this problem with people who use ub to read CF sticks
from their cameras, mounted as FAT or VFAT. I verified that the effect of
this ioctl disappears at mount time, just as you said.

I'll think what I can do about it.

-- Pete

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Merging fails reading /dev/uba1
  2005-02-22  0:41       ` Pete Zaitcev
@ 2005-02-22  1:48         ` Linus Torvalds
  0 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2005-02-22  1:48 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Jens Axboe, linux-kernel



On Mon, 21 Feb 2005, Pete Zaitcev wrote:
> 
> I am surprised too. Jens says "ub effectively disables merging by setting
> max hw/phys segment limit of 1." But surely this ought not to be a problem
> for reads within the same page.

Hmm.. Why does it do that anyway? Jens - will merging take place at all
with that setting, even for physically contiguous segments? It appears 
not, from the timings.

Anyway, I _think_ the bug is in the BIOVEC_VIRT_MERGEABLE() usage, which
doesn't seem to make much sense. In particular, look at 
"ll_merge_requests_fn()", and notice how it first checks whether something 
is physically mergeable, but even if it _is_ able to merge physically, it 
will still check virtual mergeability too - which makes no sense at all.

If it was physically mergeable, there _is_ no virtual merge. In 
particular, a device (or system) that doesn't support virtual merges, or 
only supports them on a page boundary, will always _fail_ to virtually 
merge within the same page, so it's guaranteed to never merge 512-byte 
entries.

Jens, that just _has_ to be wrong. If a physical merge was possible, we 
shouldn't check the virtual merge, we should just return 1.

> > 	int size = 4096;
> > 	ioctl(fd, BLKBSZSET, &size);
> 
> Thank you for the tip. This works fine, 4KB I/O is restored for dd.
> However, I still have this problem with people who use ub to read CF sticks
> from their cameras, mounted as FAT or VFAT. I verified that the effect of
> this ioctl disappears at mount time, just as you said.

Yes. The FAT filesystem needs to set the buffer size to 512 bytes, since 
it will actually act in 512-byte blocks.

> I'll think what I can do about it.

Enable merging is the thing to do. Why does UB have any merging limits at 
all, since USB has to scatter-gather the fragments anyway? 

Anyway, I think you can work around the above virtual merge bug (assuming 
I'm right, and it _is_ a bug, which Jens may or may be able to correct me 
on, depending on just how deep into baby-diapers he is), by just saying 
that UB supports only _one_ physical segment, but can take any number of 
virtual segments.

Ie do

	blk_queue_max_hw_segments(q, 100);
	blk_queue_max_phys_segments(q, 1);

which tells the block layer that you don't care about how hard it is to 
merge things virtually, but you only ever want _one_ physical segment. (At 
which point you will also only really ever get one virtual segment, of 
course, but the point is that you'll avoid the bug that says "I can't 
merge these two things virtually" when you don't care).

Maybe that works, maybe it doesn't. Give it a try.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-02-22  1:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-21  4:00 Merging fails reading /dev/uba1 Pete Zaitcev
2005-02-21  7:51 ` Jens Axboe
2005-02-21 18:24   ` Pete Zaitcev
2005-02-21 18:31     ` Jeff Garzik
2005-02-21 20:00     ` Linus Torvalds
2005-02-22  0:41       ` Pete Zaitcev
2005-02-22  1:48         ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.