* bounce buffer usage
@ 2001-12-22 1:02 Randy.Dunlap
2001-12-23 14:09 ` Jens Axboe
0 siblings, 1 reply; 15+ messages in thread
From: Randy.Dunlap @ 2001-12-22 1:02 UTC (permalink / raw)
To: linux-kernel
Hi-
I added bounce in/out counters (for all bounce I/O) and
bounce swap i/o counters to /proc/stat (sample output below,
before and after running fillmem).
I also removed ipackets, opackets, ierrors, oerrors, and
collisions from kernel_stat, but this isn't critical.
I also added a warning message (every 1000th occurrence) if/when
highmem is used for swap io (in 2.4.x, highmem => bounce).
(small sample also below; my log file is full of them,
and this part of the patch is overkill on messages. :)
Are there any drivers in 2.4.x that support highmem directly,
or is all of that being done in 2.5.x (BIO patches)?
so that I can run the same tests without using highmem bounce
for swapping...
Would it be useful to try this with a 2.5.1 kernel?
Here's a comparison on 2.4.16, using 6 instances of 'fillmem 700'
(from Quintela's memtest files) on a 4 GiB (!) 4-proc ix86 system.
a. 2.4.16 with bounce stats:
Elapsed run time: 2 min:58 sec.
=============== /proc/stat BEFORE fillmem ====================
=========== with 2 new lines [4 new counters] (at end) ==============
cpu 292 0 3248 630512
cpu0 23 0 422 158068
cpu1 116 0 873 157524
cpu2 132 0 877 157504
cpu3 21 0 1076 157416
page 12847 816
swap 3 0
intr 168971 158513 2 0 0 6 0 3 1 0 0 0 0 0 0 4 0 0 0 0 310 6305 0 0 0 2063 942 0 807 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
disk_io: (8,0):(1924,1556,25550,368,1632) (8,1):(3,3,24,0,0) (8,2):(1,1,8,0,0) (8,3):(2,2,16,0,0) (8,4):(2,2,16,0,0) (8,5):(2,2,16,0,0) (8,6):(2,2,16,0,0) (8,7):(2,2,16,0,0) (8,8):(2,2,16,0,0) (8,9):(2,2,16,0,0)
ctxt 11542
btime 1008974326
processes 387
bounce io 23966 476
bounce swap io 0 0
=============== /proc/stat AFTER fillmem ====================
=========== with 2 new lines [4 new counters] (at end) ==============
cpu 3320 0 20454 681522
cpu0 876 0 4860 170588
cpu1 850 0 5257 170217
cpu2 982 0 4975 170367
cpu3 612 0 5362 170350
page 686457 725766
swap 168354 181146
intr 216713 176324 2 0 0 6 0 3 1 0 0 0 0 0 0 4 0 0 0 0 310 7313 0 0 0 30804 1033 0 898 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
disk_io: (8,0):(35376,22879,1372770,12497,1452132) (8,1):(3,3,24,0,0) (8,2):(1,1,8,0,0) (8,3):(2,2,16,0,0) (8,4):(2,2,16,0,0) (8,5):(2,2,16,0,0) (8,6):(2,2,16,0,0) (8,7):(2,2,16,0,0) (8,8):(2,2,16,0,0) (8,9):(2,2,16,0,0)
ctxt 4110183
btime 1008974326
processes 644
bounce io 1311468 195826
bounce swap io 160892 24385
----------------------------------------------------------------------
b. 2.4.16 with bounce stats and linux/mm/swap_state.c::read_swap_cache_async()
only allocating swap pages in ZONE_NORMAL (i.e., not in HIGHMEM):
Elapsed run time: 3 min:11 sec.
(This part of the patch was done just to reduce Highmem bouncing,
although it doesn't help in elapsed run time.)
=============== /proc/stat BEFORE fillmem ====================
=========== with 2 new lines [4 new counters] (at end) ==============
cpu 253 0 3344 845231
cpu0 27 0 376 211804
cpu1 17 0 797 211393
cpu2 20 0 844 211343
cpu3 189 0 1327 210691
page 12899 815
swap 3 0
intr 225242 212207 2 0 0 6 0 3 1 0 0 0 0 0 0 4 0 0 0 0 310 8300 0 0 0 2095 1218 0 1081 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
disk_io: (8,0):(1945,1563,25654,382,1630) (8,1):(3,3,24,0,0) (8,2):(1,1,8,0,0) (8,3):(2,2,16,0,0) (8,4):(2,2,16,0,0) (8,5):(2,2,16,0,0) (8,6):(2,2,16,0,0) (8,7):(2,2,16,0,0) (8,8):(2,2,16,0,0) (8,9):(2,2,16,0,0)
ctxt 12951
btime 1008978227
processes 396
bounce io 24046 476
bounce swap io 0 0
=============== /proc/stat AFTER fillmem ====================
=========== with 2 new lines [4 new counters] (at end) ==============
cpu 3047 0 17262 904935
cpu0 715 0 3838 226758
cpu1 826 0 4374 226111
cpu2 458 0 4425 226428
cpu3 1048 0 4625 225638
page 685718 854715
swap 168157 213360
intr 275576 231311 2 0 0 6 0 3 1 0 0 0 0 0 0 4 0 0 0 0 310 9454 0 0 0 31975 1316 0 1179 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
disk_io: (8,0):(37156,22904,1371300,14252,1709926) (8,1):(3,3,24,0,0) (8,2):(1,1,8,0,0) (8,3):(2,2,16,0,0) (8,4):(2,2,16,0,0) (8,5):(2,2,16,0,0) (8,6):(2,2,16,0,0) (8,7):(2,2,16,0,0) (8,8):(2,2,16,0,0) (8,9):(2,2,16,0,0)
ctxt 675273
btime 1008978227
processes 665
bounce io 24410 164072
bounce swap io 0 20415
Thanks,
~Randy [see ya next year]
"What's that bright yellow object out the window?"
============================= sample dmesg ==============================
bounce io (R) for 8:17
bounce io (R) for 8:17
bounce io (R) for 8:8
bounce io (R) for 8:6
bounce io (R) for 8:18
bounce io (R) for 8:17
bounce io (R) for 8:18
bounce io (R) for 8:8
bounce io (R) for 8:18
bounce io (W) for 8:18
bounce io (W) for 8:6
bounce io (R) for 8:18
bounce io (R) for 8:17
bounce io (R) for 8:6
bounce io (R) for 8:17
bounce io (R) for 8:18
bounce io (W) for 8:8
bounce io (W) for 8:8
bounce io (W) for 8:8
bounce io (R) for 8:18
bounce io (R) for 8:17
bounce io (R) for 8:18
bounce io (R) for 8:18
bounce io (R) for 8:17
bounce io (W) for 8:8
bounce io (W) for 8:8
bounce io (R) for 8:18
bounce io (R) for 8:17
bounce io (W) for 8:8
bounce io (R) for 8:8
bounce io (R) for 8:18
bounce io (R) for 8:8
Bounce-stats patch:
======================
--- linux/include/linux/kernel_stat.h.org Mon Nov 26 10:19:29 2001
+++ linux/include/linux/kernel_stat.h Thu Dec 20 13:26:50 2001
@@ -26,12 +26,14 @@
unsigned int dk_drive_wblk[DK_MAX_MAJOR][DK_MAX_DISK];
unsigned int pgpgin, pgpgout;
unsigned int pswpin, pswpout;
+ unsigned int bouncein, bounceout;
+ unsigned int bounceswapin, bounceswapout;
#if !defined(CONFIG_ARCH_S390)
unsigned int irqs[NR_CPUS][NR_IRQS];
#endif
- unsigned int ipackets, opackets;
- unsigned int ierrors, oerrors;
- unsigned int collisions;
+/// unsigned int ipackets, opackets;
+/// unsigned int ierrors, oerrors;
+/// unsigned int collisions;
unsigned int context_swtch;
};
--- linux/fs/proc/proc_misc.c.org Tue Nov 20 21:29:09 2001
+++ linux/fs/proc/proc_misc.c Thu Dec 20 13:34:44 2001
@@ -310,6 +310,12 @@
xtime.tv_sec - jif / HZ,
total_forks);
+ len += sprintf(page + len,
+ "bounce io %u %u\n"
+ "bounce swap io %u %u\n",
+ kstat.bouncein, kstat.bounceout,
+ kstat.bounceswapin, kstat.bounceswapout);
+
return proc_calc_metrics(page, start, off, count, eof, len);
}
--- linux/mm/page_io.c.org Mon Nov 19 15:19:42 2001
+++ linux/mm/page_io.c Thu Dec 20 15:59:41 2001
@@ -10,6 +10,7 @@
* Always use brw_page, life becomes simpler. 12 May 1998 Eric Biederman
*/
+#include <linux/config.h>
#include <linux/mm.h>
#include <linux/kernel_stat.h>
#include <linux/swap.h>
@@ -68,6 +69,13 @@
dev = swapf->i_dev;
} else {
return 0;
+ }
+
+ if (PageHighMem(page)) {
+ if (rw == WRITE)
+ kstat.bounceswapout++;
+ else
+ kstat.bounceswapin++;
}
/* block_size == PAGE_SIZE/zones_used */
--- linux/drivers/block/ll_rw_blk.c.org Mon Oct 29 12:11:17 2001
+++ linux/drivers/block/ll_rw_blk.c Thu Dec 20 17:45:19 2001
@@ -873,6 +873,7 @@
} while (q->make_request_fn(q, rw, bh));
}
+static int bmsg_count = 0;
/**
* submit_bh: submit a buffer_head to the block device later for I/O
@@ -890,6 +891,7 @@
void submit_bh(int rw, struct buffer_head * bh)
{
int count = bh->b_size >> 9;
+ int bounce = PageHighMem(bh->b_page);
if (!test_bit(BH_Lock, &bh->b_state))
BUG();
@@ -908,10 +910,19 @@
switch (rw) {
case WRITE:
kstat.pgpgout += count;
+ if (bounce) kstat.bounceout += count;
break;
default:
kstat.pgpgin += count;
+ if (bounce) kstat.bouncein += count;
break;
+ }
+ if (bounce) {
+ bmsg_count++;
+ if ((bmsg_count % 1000) == 1)
+ printk ("bounce io (%c) for %d:%d\n",
+ (rw == WRITE) ? 'W' : 'R',
+ MAJOR(bh->b_rdev), MINOR(bh->b_rdev));
}
}
Debug patch to partially disable highmem swapping:
====================================================
--- linux/mm/swap_state.c.org Wed Oct 31 15:31:03 2001
+++ linux/mm/swap_state.c Fri Dec 21 10:32:02 2001
@@ -180,6 +180,7 @@
* A failure return means that either the page allocation failed or that
* the swap entry is no longer in use.
*/
+#define GFP_SWAP_LOW ( __GFP_WAIT | __GFP_IO | __GFP_FS )
struct page * read_swap_cache_async(swp_entry_t entry)
{
struct page *found_page, *new_page = NULL;
@@ -200,7 +201,7 @@
* Get a new page to read into from swap.
*/
if (!new_page) {
- new_page = alloc_page(GFP_HIGHUSER);
+ new_page = alloc_page(GFP_SWAP_LOW);
if (!new_page)
break; /* Out of memory */
}
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: bounce buffer usage
2001-12-22 1:02 bounce buffer usage Randy.Dunlap
@ 2001-12-23 14:09 ` Jens Axboe
2002-01-08 1:44 ` Randy.Dunlap
0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2001-12-23 14:09 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: linux-kernel
On Fri, Dec 21 2001, Randy.Dunlap wrote:
> Are there any drivers in 2.4.x that support highmem directly,
> or is all of that being done in 2.5.x (BIO patches)?
2.4 + my block-highmem patches support direct highmem I/O.
> Would it be useful to try this with a 2.5.1 kernel?
Sure
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: bounce buffer usage
2001-12-23 14:09 ` Jens Axboe
@ 2002-01-08 1:44 ` Randy.Dunlap
2002-01-08 7:42 ` Jens Axboe
0 siblings, 1 reply; 15+ messages in thread
From: Randy.Dunlap @ 2002-01-08 1:44 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-kernel
On Sun, 23 Dec 2001, Jens Axboe wrote:
| On Fri, Dec 21 2001, Randy.Dunlap wrote:
| > Are there any drivers in 2.4.x that support highmem directly,
| > or is all of that being done in 2.5.x (BIO patches)?
|
| 2.4 + my block-highmem patches support direct highmem I/O.
|
| > Would it be useful to try this with a 2.5.1 kernel?
|
| Sure
OK, here's 'fillmem 700' run against 5 kernels as described below,
with my bounce io/swap statistics patch added.
All tests are 6 instances of "fillmem 700" (700 MB) on a 4-way 4 GB
x86 VA 4450 server.
I'm including a reduced version of /proc/stat -- before and after the
fillmem test in each case.
Let me know if you'd like to see other variations.
=====bounce-stats-2416.txt
Linux 2.4.16 with bounce io/swap stats added:
elapsed run time: 2 min:58 sec.
BEFORE fillmem:
page 12847 816
swap 3 0
ctxt 11542
bounce io 23966 476
bounce swap io 0 0
AFTER fillmem:
page 686457 725766
swap 168354 181146
ctxt 4110183
bounce io 1311468 195826
bounce swap io 160892 24385
=====bounce-stats-low-2416.txt
Linux 2.4.16 with bounce io/swap stats added and some swap forced to low memory:
elapsed run time: 3 min:11 sec.
BEFORE fillmem:
page 12899 815
swap 3 0
ctxt 12951
bounce io 24046 476
bounce swap io 0 0
AFTER fillmem:
page 685718 854715
swap 168157 213360
ctxt 675273
bounce io 24410 164072
bounce swap io 0 20415
=====bounce-stats-2416-highio.txt
Linux 2.4.16 plus Jens's 2.4.x block-highmem patches, with CONFIG_HIGHIO=y:
elapsed run time: 2 min:45 sec.
BEFORE fillmem:
page 103940 1198
swap 3 0
ctxt 24409
bounce io 23978 576
bounce swap io 0 0
AFTER fillmem:
page 735853 651670
swap 157926 162541
ctxt 2328418
bounce io 1284928 186422
bounce swap io 157568 23194
=====bounce-stats-2416-nohighio.txt
Linux 2.4.16 plus Jens's 2.4.x block-highmem patches, with CONFIG_HIGHIO=n:
elapsed run time: 3 min:00 sec.
BEFORE fillmem:
page 13955 3177
swap 3 0
ctxt 20528
bounce io 26068 4052
bounce swap io 0 0
AFTER fillmem:
page 707172 741396
swap 173251 184456
ctxt 1013516
bounce io 1346488 156778
bounce swap io 165001 19066
=====bounce-stats-252-pre6.txt
Linux 2.5.2-pre6 with bounce io/swap stats added:
elapsed run time: 2 min:00 sec.
BEFORE fillmem:
page 16569 3318
swap 3 0
ctxt 46747
bounce io 30830 2080
bounce swap io 0 0
AFTER fillmem:
page 479343 546956
swap 115688 136051
ctxt 111294
bounce io 955046 427172
bounce swap io 115519 53094
--
~Randy
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: bounce buffer usage
2002-01-08 1:44 ` Randy.Dunlap
@ 2002-01-08 7:42 ` Jens Axboe
2002-01-09 17:23 ` Randy.Dunlap
0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2002-01-08 7:42 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: linux-kernel
On Mon, Jan 07 2002, Randy.Dunlap wrote:
> On Sun, 23 Dec 2001, Jens Axboe wrote:
>
> | On Fri, Dec 21 2001, Randy.Dunlap wrote:
> | > Are there any drivers in 2.4.x that support highmem directly,
> | > or is all of that being done in 2.5.x (BIO patches)?
> |
> | 2.4 + my block-highmem patches support direct highmem I/O.
> |
> | > Would it be useful to try this with a 2.5.1 kernel?
> |
> | Sure
>
>
> OK, here's 'fillmem 700' run against 5 kernels as described below,
> with my bounce io/swap statistics patch added.
>
> All tests are 6 instances of "fillmem 700" (700 MB) on a 4-way 4 GB
> x86 VA 4450 server.
>
> I'm including a reduced version of /proc/stat -- before and after the
> fillmem test in each case.
>
> Let me know if you'd like to see other variations.
The results look very promising, although I'm a bit surprised that 2.5
is actually that much quicker :-)
The bounce counts you are doing don't make too much sense to me though,
how come 2.4 + block-high and 2.5 show any bounced numbers at all? Maybe
you are not doing the accounting correctly? To get the right counts do
something ala:
+++ mm/highmem.c
@@ -409,7 +409,9 @@
vfrom = kmap(from->bv_page) + from->bv_offset;
memcpy(vto, vfrom, to->bv_len);
kunmap(from->bv_page);
- }
+ bounced_write++;
+ } else
+ bounced_read++;
}
Of course those are all bounces, not just (or only) swap bounces. Also
note that the above is not SMP safe.
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: bounce buffer usage
2002-01-08 7:42 ` Jens Axboe
@ 2002-01-09 17:23 ` Randy.Dunlap
2002-01-09 18:10 ` Jens Axboe
2002-01-09 22:33 ` Rik van Riel
0 siblings, 2 replies; 15+ messages in thread
From: Randy.Dunlap @ 2002-01-09 17:23 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-kernel
On Tue, 8 Jan 2002, Jens Axboe wrote:
| On Mon, Jan 07 2002, Randy.Dunlap wrote:
| >
| > OK, here's 'fillmem 700' run against 5 kernels as described below,
| > with my bounce io/swap statistics patch added.
| >
| > All tests are 6 instances of "fillmem 700" (700 MB) on a 4-way 4 GB
| > x86 VA 4450 server.
| >
| > I'm including a reduced version of /proc/stat -- before and after the
| > fillmem test in each case.
| >
| > Let me know if you'd like to see other variations.
|
| The results look very promising, although I'm a bit surprised that 2.5
| is actually that much quicker :-)
I was too. When I have the bounce accounting straightened out,
I'll run each test multiple times.
| The bounce counts you are doing don't make too much sense to me though,
| how come 2.4 + block-high and 2.5 show any bounced numbers at all? Maybe
| you are not doing the accounting correctly? To get the right counts do
| something ala:
Clearly I mucked that up. Thanks for pointing it out.
The patch below makes sense, but I also want to count
"bounced swap IOs" separately. I'll retest and report that
when I have it done.
| +++ mm/highmem.c
| @@ -409,7 +409,9 @@
| vfrom = kmap(from->bv_page) + from->bv_offset;
| memcpy(vto, vfrom, to->bv_len);
| kunmap(from->bv_page);
| - }
| + bounced_write++;
| + } else
| + bounced_read++;
| }
|
| Of course those are all bounces, not just (or only) swap bounces. Also
| note that the above is not SMP safe.
Is this the only place that kstat (kernel_stat) counters
are not SMP safe...?
--
~Randy
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: bounce buffer usage
2002-01-09 17:23 ` Randy.Dunlap
@ 2002-01-09 18:10 ` Jens Axboe
2002-01-09 22:33 ` Rik van Riel
1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2002-01-09 18:10 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: linux-kernel
On Wed, Jan 09 2002, Randy.Dunlap wrote:
> | The results look very promising, although I'm a bit surprised that 2.5
> | is actually that much quicker :-)
>
> I was too. When I have the bounce accounting straightened out,
> I'll run each test multiple times.
Good
> | +++ mm/highmem.c
> | @@ -409,7 +409,9 @@
> | vfrom = kmap(from->bv_page) + from->bv_offset;
> | memcpy(vto, vfrom, to->bv_len);
> | kunmap(from->bv_page);
> | - }
> | + bounced_write++;
> | + } else
> | + bounced_read++;
> | }
> |
> | Of course those are all bounces, not just (or only) swap bounces. Also
> | note that the above is not SMP safe.
>
> Is this the only place that kstat (kernel_stat) counters
> are not SMP safe...?
Haven't looked at the other stats, the i/o stats are protected by the
queue_lock though.
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: bounce buffer usage
2002-01-09 17:23 ` Randy.Dunlap
2002-01-09 18:10 ` Jens Axboe
@ 2002-01-09 22:33 ` Rik van Riel
1 sibling, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2002-01-09 22:33 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: Jens Axboe, linux-kernel
On Wed, 9 Jan 2002, Randy.Dunlap wrote:
> | + bounced_write++;
> | + } else
> | + bounced_read++;
> | }
>
> Is this the only place that kstat (kernel_stat) counters
> are not SMP safe...?
No, but we don't particularly care about that. If you care
about making the statistics SMP safe, you probably also want
to make them CPU local so the cacheline isn't bounced around
all the time ;)
cheers,
Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 2.4.9 2.4.18 diff
@ 2002-10-30 2:04 Douglas Gilbert
2002-11-01 0:16 ` Bounce buffer usage Mark Lobo
0 siblings, 1 reply; 15+ messages in thread
From: Douglas Gilbert @ 2002-10-30 2:04 UTC (permalink / raw)
To: Mark Lobo; +Cc: linux-scsi
Mark Lobo wrote:
> Hello!
>
> How can I get a list of differences in 2.4.9 and
> 2.4.18?
If you are talking about the scsi subsystem, then
look at this url:
http://www.tldp.org/HOWTO/SCSI-2.4-HOWTO/chg24.html
I need to update it for 2.4.19 .
It's probably a good time to start thinking about a
2.6 version of the above document ...
Doug Gilbert
^ permalink raw reply [flat|nested] 15+ messages in thread
* Bounce buffer usage
2002-10-30 2:04 2.4.9 2.4.18 diff Douglas Gilbert
@ 2002-11-01 0:16 ` Mark Lobo
2002-11-01 7:48 ` Jens Axboe
0 siblings, 1 reply; 15+ messages in thread
From: Mark Lobo @ 2002-11-01 0:16 UTC (permalink / raw)
To: linux-scsi
Guys,
Simple question on bounce buffer usage.
As of my understanding right now, a bounce buffer will
not be used if
1) We say we dont support over a fixed number of
address bits, for example, ISA devices.
2) If the address of the buffer is not in a space
directly addressable by the kernel ( not in kernel
logical address space )
Now my question is: what happens in the case where an
application sends an IO down? are bounce buffers used
in that case? I guess I am still confused on kernel
virtual v/s kernel logical addresses. As I understand,
a kernel logical address is one that is directly
addressable by the kernel, and is limited to 1GB. So
if we have a system with 2GB, does it mean some of the
physical memory ( probably 1GB ) has a kernel logical
address assigned to it permanently and the other 1GB
does not, which means if a user happens to get a page
in that space, there will be no logical address ( and
therefore bounce buffers WILL be used? ).
Or is any user address not mapped in the kernel
"logical" space at all?
Im confused in this one, I would appreciate if someone
can clear it up for me.
Thanks,
Mark
__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bounce buffer usage
2002-11-01 0:16 ` Bounce buffer usage Mark Lobo
@ 2002-11-01 7:48 ` Jens Axboe
2002-11-01 18:55 ` Mark Lobo
0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2002-11-01 7:48 UTC (permalink / raw)
To: Mark Lobo; +Cc: linux-scsi
On Thu, Oct 31 2002, Mark Lobo wrote:
> Guys,
> Simple question on bounce buffer usage.
> As of my understanding right now, a bounce buffer will
> not be used if
> 1) We say we dont support over a fixed number of
> address bits, for example, ISA devices.
Correct, if unchecked_isa_dma is set for instance.
> 2) If the address of the buffer is not in a space
> directly addressable by the kernel ( not in kernel
> logical address space )
Not so true anymore in 2.4.20-pre (and hasn't been true in 2.5 since
2.5.1). If you set host highmem_io flag, it will be happy to pass you
pages that have no kernel virtual mapping.
> Now my question is: what happens in the case where an
> application sends an IO down? are bounce buffers used
> in that case? I guess I am still confused on kernel
You just outlined the bounce scenarious above yourself :-). If the
buffer sent down resides at a higher address than what the adapter can
handle, then it is bounced. This may not necessarily have anything to do
with kernel virtual mapping or not.
> virtual v/s kernel logical addresses. As I understand,
> a kernel logical address is one that is directly
> addressable by the kernel, and is limited to 1GB. So
For the standard kernels split, you are looking at 896MiB of memory. So
a little under a gig.
> if we have a system with 2GB, does it mean some of the
> physical memory ( probably 1GB ) has a kernel logical
> address assigned to it permanently and the other 1GB
> does not, which means if a user happens to get a page
> in that space, there will be no logical address ( and
> therefore bounce buffers WILL be used? ).
> Or is any user address not mapped in the kernel
> "logical" space at all?
Yes this is what happens per default, in 2.4.19 and below. As mentioned
above, in 2.4.20 the virtual mapping of the page has nothing to do with
the ability to read/write to it. It requires a driver that uses the pci
dma api properly - if it does, it can set host_template->highmem_io to
tell the kernel it doesnt need to have anything below 4GB bounced (or
64GB, or more, depends on the pci device).
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bounce buffer usage
2002-11-01 7:48 ` Jens Axboe
@ 2002-11-01 18:55 ` Mark Lobo
2002-11-02 9:12 ` Jens Axboe
0 siblings, 1 reply; 15+ messages in thread
From: Mark Lobo @ 2002-11-01 18:55 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-scsi
> Not so true anymore in 2.4.20-pre (and hasn't been
> true in 2.5 since
> 2.5.1). If you set host highmem_io flag,>
So if I do set this flag in the host template, do I
need the bounce buffer patch also that is floating
around? Or is it just enough to set that flag and use
the DMA API? Im using 2.4.18.
it will be
> happy to pass you
> pages that have no kernel virtual mapping.
Pages that have no kernel virtual mapping? Do u mean
pages with no kernel "logical" mapping? We finally
need a kernel mapping, dont we? say if it is data in a
single page ( no scatter gather ), what kind of
address is passed down to the driver? Is it the pure
user address that is just passed down, and the DMA API
then gives the kernel address? cause we do need to do
a virt_to_page in this case before we can use the DMA
API, so what kind of virtual address is passed down to
the initiator?
>If the
> buffer sent down resides at a higher address than
> what the adapter can
> handle, then it is bounced. This may not necessarily
> have anything to do
> with kernel virtual mapping or not.
But if a buffer resides in high memory, and has no
kernel logical mapping, doesnt it mean that bounce
buffers will be used, especially in 2.4.18, without
the highmem_io bit set?
Thanks a lot!
Mark
__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bounce buffer usage
2002-11-01 18:55 ` Mark Lobo
@ 2002-11-02 9:12 ` Jens Axboe
2002-11-03 22:10 ` Mark Lobo
0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2002-11-02 9:12 UTC (permalink / raw)
To: Mark Lobo; +Cc: linux-scsi
On Fri, Nov 01 2002, Mark Lobo wrote:
>
> > Not so true anymore in 2.4.20-pre (and hasn't been
> > true in 2.5 since
> > 2.5.1). If you set host highmem_io flag,>
>
> So if I do set this flag in the host template, do I
> need the bounce buffer patch also that is floating
> around? Or is it just enough to set that flag and use
> the DMA API? Im using 2.4.18.
You need the patch, otherwise that template bool is not even there. Or
you just need 2.4.20-pre, the block-highmem patch has been integrated
since -pre2/3
> it will be
> > happy to pass you
> > pages that have no kernel virtual mapping.
>
> Pages that have no kernel virtual mapping? Do u mean
> pages with no kernel "logical" mapping? We finally
Yes
> need a kernel mapping, dont we? say if it is data in a
> single page ( no scatter gather ), what kind of
We don't the page mapped into the kernel virtual address space. For
scatter-gather setup, it's fine to know the physical address of such a
page. We can do clustering based on that. If the hardware (the platform,
not the controller) can further do iommu tricks on the resulting sg
table fine, but that's not our worry.
> address is passed down to the driver? Is it the pure
> user address that is just passed down, and the DMA API
> then gives the kernel address? cause we do need to do
> a virt_to_page in this case before we can use the DMA
> API, so what kind of virtual address is passed down to
> the initiator?
I think this is your problem. So you want to write out a page of memory.
You basically want to do
user address -> page -> virtual address -> page = virt_to_page(va) ->
dma map page
and this is a problem because for some pages we just don't have a
virtual kernel mapping. Instead you want to do
user address -> page -> dma map page
which doesn't require a virtual mapping at all. You want to pass down
page/length/offset tuplets to the dma mapping api, not virtual
addresses. Please read the document I pointed you at, it explains this
for you. You seem to be confused about several things in this area.
> >If the
> > buffer sent down resides at a higher address than
> > what the adapter can
> > handle, then it is bounced. This may not necessarily
> > have anything to do
> > with kernel virtual mapping or not.
>
> But if a buffer resides in high memory, and has no
> kernel logical mapping, doesnt it mean that bounce
> buffers will be used, especially in 2.4.18, without
> the highmem_io bit set?
2.4.18 without any patches will always bounce, there's nothing you can
do about it.
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Bounce buffer usage
2002-11-02 9:12 ` Jens Axboe
@ 2002-11-03 22:10 ` Mark Lobo
2002-11-04 8:21 ` Jens Axboe
0 siblings, 1 reply; 15+ messages in thread
From: Mark Lobo @ 2002-11-03 22:10 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-scsi
> I think this is your problem. So you want to write
> out a page of memory.
> You basically want to do
>
> user address -> page -> virtual address -> page =
> virt_to_page(va) ->
> dma map page
>
> and this is a problem because for some pages we just
> don't have a
> virtual kernel mapping. Instead you want to do
>
> user address -> page -> dma map page
>
> which doesn't require a virtual mapping at all. You
> want to pass down
> page/length/offset tuplets to the dma mapping api,
> not virtual
> addresses. Please read the document I pointed you
> at, it explains this
> for you. You seem to be confused about several
> things in this area.
>
Ok! So now I guess I understand how this DMA API
works. I did read this document along with the bounce
buffer patch doc. If I get a buffer in the SCSI
initiator driver for a single page, ( which means no
scatter gather), all I get in the SCSI initiator is
the address of the request buffer. To get the page
associated with the buffer, I need to call
virt_to_page ( as mentioned in the bounce buffer patch
docs ). And the DMA mapping doc says the following
about transferring a simgle page
"
struct pci_dev *pdev = mydev->pdev;
dma_addr_t dma_handle;
struct page *page = buffer->page;
unsigned long offset = buffer->offset;
size_t size = buffer->len;
dma_handle = pci_map_page(dev, page, offset,
size, direction);
...
pci_unmap_page(dev, dma_handle, size,
direction);
"
Now in the initiator driver, all what I get is a
request buffer address. To actually use the above API,
I need the page address, and that I get from
virt_to_page ( as described in the bounce buffer patch
doc) So I dont need a virt_to_page call to get the
page address? I dont understand how I can use
this --> "struct page *page = buffer->page;"? In that
case, what gets passed down to the initiator in the
"no scatter gather" case is some kind of structure?
Dont I need to do "struct page *page =
virt_to_page(buffer);"?
Youve been a great help Jens. I really appreciate all
your help!
Thanks,
Mark
__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Bounce buffer usage
2002-11-03 22:10 ` Mark Lobo
@ 2002-11-04 8:21 ` Jens Axboe
0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2002-11-04 8:21 UTC (permalink / raw)
To: Mark Lobo; +Cc: linux-scsi
On Sun, Nov 03 2002, Mark Lobo wrote:
> Now in the initiator driver, all what I get is a
> request buffer address. To actually use the above API,
> I need the page address, and that I get from
> virt_to_page ( as described in the bounce buffer patch
> doc) So I dont need a virt_to_page call to get the
> page address? I dont understand how I can use
> this --> "struct page *page = buffer->page;"? In that
> case, what gets passed down to the initiator in the
> "no scatter gather" case is some kind of structure?
> Dont I need to do "struct page *page =
> virt_to_page(buffer);"?
For drivers that deal with highmem, you are getting sg setup even for a
single segment. It would indeed be possible to send down a virtual
address for non-sg, if the original page didn't reside in highmem. In
that case, yes all you would have to do is
page = virt_to_page(buf);
offset = ((unsigned long) buf) & ~PAGE_MASK;
to get the remaining data you need. However, most drivers typically do
if (command->use_sg)
nr_entries = pci_map_page() sg list
else
pci_map_single() virtual address
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Bounce buffer usage
@ 2002-11-01 15:40 Infante, Jon
2002-11-01 16:01 ` Jens Axboe
0 siblings, 1 reply; 15+ messages in thread
From: Infante, Jon @ 2002-11-01 15:40 UTC (permalink / raw)
To: 'Jens Axboe'; +Cc: linux-scsi
< It requires a driver that uses the pci dma api properly.
Where can I find information on this api and its proper usage?
Jon Infante
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2002-11-04 8:21 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-22 1:02 bounce buffer usage Randy.Dunlap
2001-12-23 14:09 ` Jens Axboe
2002-01-08 1:44 ` Randy.Dunlap
2002-01-08 7:42 ` Jens Axboe
2002-01-09 17:23 ` Randy.Dunlap
2002-01-09 18:10 ` Jens Axboe
2002-01-09 22:33 ` Rik van Riel
-- strict thread matches above, loose matches on Subject: below --
2002-10-30 2:04 2.4.9 2.4.18 diff Douglas Gilbert
2002-11-01 0:16 ` Bounce buffer usage Mark Lobo
2002-11-01 7:48 ` Jens Axboe
2002-11-01 18:55 ` Mark Lobo
2002-11-02 9:12 ` Jens Axboe
2002-11-03 22:10 ` Mark Lobo
2002-11-04 8:21 ` Jens Axboe
2002-11-01 15:40 Infante, Jon
2002-11-01 16:01 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.