* Re: what is teh current meaning of blk_size? @ 2001-11-13 15:08 Peter T. Breuer 2001-11-13 18:51 ` blocks or KB? (was: .. current meaning of blk_size array) Peter T. Breuer 0 siblings, 1 reply; 17+ messages in thread From: Peter T. Breuer @ 2001-11-13 15:08 UTC (permalink / raw) To: ptb; +Cc: dalecki, linux kernel "ptb wrote:" > "Martin Dalecki wrote:" > > "Peter T. Breuer" wrote: > > > Is blk_size[][] supposed to contain the size in KB or blocks? > > There is no rumor it's in blocks. Nevertheless, experiments on 2.4.3 appear to show it is still in KB there. > Uh, thanks! I was looking at fs/block_dev.c. > > if (blk_size[MAJOR(dev)]) > size = ((loff_t) blk_size[MAJOR(dev)][MINOR(dev)] << BLOCK_SIZE_BITS) >> blocksize_bits; > > which sets the size to the entered blk_size << 10 - blksize_bits. > > I missed that BLOCK_SIZE_BITS was constant but blksize_bits is variable. > Amongst other things. Thing is, in my driver I have now chenged from setting blk_size to be in KB and put it in blocks instead (while keeping the blksize the same) and the result is that using lseek, the device measures to be 1/4 the size it really is. This is in kernel 2.4.3. If in look in ll_rw_blk.c, I see, for example: if (blk_size[major]) { unsigned long maxsector = (blk_size[major][MINOR(bh->b_rdev)] << 1) + 1; // (ptb) 1ST SECTOR BEYOND END OF DISK which implies to me that blk_size is still in KB there. BTW, I don't know why there should be a +1 at the end. The code goes on to say: unsigned long sector = bh->b_rsector; // (ptb) 1ST SECTOR ON DISK unsigned int count = bh->b_size >> 9; // (ptb) SECTORS IN BUFFER if (maxsector < count || maxsector - count < sector) { bh->b_state &= (1 << BH_Lock) | (1 << BH_Mapped); ... good stuff ... So we look for the nr sectors in the buffer to be _greater_than_ the number of sectors in the device _plus 1_. It should be _greater_than_or_equal_to ... _plus_1_. But even so it's meaningless. What we want is to check to see if the buffer contents will overflow the disk. I'm not too sure about the other half of the condition either. This is surely what I mentioned above: sector + count > maxsector? But again it should be >=. If we are on sector 0 of a 2 sector disk, and we try and write 3 sectors, then sector=0, count=3, and maxsector=3, and 0+3 /> 3, so the condition would not trigger, while we want it to. So it should be >=, not >. I believe the 1st check is merely a faster calculation and is backed up by the second check. However, the second check must be right! > > OK I was to fast to figure it out: > > > > /* > > * blk_size contains the size of all block-devices in units of 1024 byte > > * sectors: > > But this is not so .. it is the default, not the rule. And it is only > the default if the block size is the default value. > > > int * blk_size[MAX_BLKDEV]; > > > > /* > > * blksize_size contains the size of all block-devices: > > Err .... they mean the BLOCK SIZE of all ... > If you knew if the meaning of blk_size had ever changed, and when in > terms of kernel version, that would also be very very helpful. Peter ^ permalink raw reply [flat|nested] 17+ messages in thread
* blocks or KB? (was: .. current meaning of blk_size array) 2001-11-13 15:08 what is teh current meaning of blk_size? Peter T. Breuer @ 2001-11-13 18:51 ` Peter T. Breuer 2001-11-14 9:44 ` Martin Dalecki 0 siblings, 1 reply; 17+ messages in thread From: Peter T. Breuer @ 2001-11-13 18:51 UTC (permalink / raw) To: linux kernel Let me put it more plainly. Martin Daleki + rumour assures me that the blk_size array nowadays measure in blocks not KB, yet to me it seems that it doesn't. Look at this code from ll_rw_blk.c in 2.4.13: unsigned long maxsector = (blk_size[major][MINOR(bh->b_rdev)] << 1) + 1; and this comment: * blk_size contains the size of all block-devices in units of 1024 * byte sectors: so blk_size measures in KB. Where do I see it wrong? Is everybody talking about 2.4.14 and 2.4.15? No .. it's just the same in 2.4.14: if (blk_size[major]) minorsize = blk_size[major][MINOR(bh->b_rdev)]; if (minorsize) { unsigned long maxsector = (minorsize << 1) + 1; KB! Or is it the case that sectors don't mean 512B? Peter ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-13 18:51 ` blocks or KB? (was: .. current meaning of blk_size array) Peter T. Breuer @ 2001-11-14 9:44 ` Martin Dalecki 2001-11-14 20:41 ` Peter T. Breuer 0 siblings, 1 reply; 17+ messages in thread From: Martin Dalecki @ 2001-11-14 9:44 UTC (permalink / raw) To: ptb; +Cc: linux kernel "Peter T. Breuer" wrote: > > Let me put it more plainly. Martin Daleki + rumour assures me that the > blk_size array nowadays measure in blocks not KB, yet to me it seems that sectors = 512 per default blocks = 1024 per default. Never said anything else. Look at the initialization point for the arrays. They all use constants which you can look up in the kernel headers. ./linux/fs.h:#define BLOCK_SIZE_BITS 10 ./linux/fs.h:#define BLOCK_SIZE (1<<BLOCK_SIZE_BITS) Which means 1024 bytes for blk_size as default value. > it doesn't. Look at this code from ll_rw_blk.c in 2.4.13: -- - phone: +49 214 8656 283 - job: eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!) - langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort: ru_RU.KOI8-R ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-14 9:44 ` Martin Dalecki @ 2001-11-14 20:41 ` Peter T. Breuer 2001-11-14 20:51 ` Martin Dalecki ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Peter T. Breuer @ 2001-11-14 20:41 UTC (permalink / raw) To: dalecki; +Cc: ptb, linux kernel "Martin Dalecki wrote:" > "Peter T. Breuer" wrote: > > > > Let me put it more plainly. Martin Daleki + rumour assures me that the > > blk_size array nowadays measure in blocks not KB, yet to me it seems that > > sectors = 512 per default > blocks = 1024 per default. I know that! But it's irrelevant. What I need to know is if blk_size is still counting in KB, or if it has switched to blocks. > Never said anything else. Err .. you said that blk_size is now measured in blocks, not KB. You said thet the rumour is true. "A month of sundays ago Martin Dalecki wrote:" > "Peter T. Breuer" wrote: > > Is blk_size[][] supposed to contain the size in KB or blocks? > There is no rumor it's in blocks. Maybe I misinterpret what you write. I interpret it as meaning "the rumour is not a rumour but a fact. It is in blocks". > Look at the initialization point for the arrays. They all use constants > which you can look up in the kernel headers. I _know_ that. It's irrelevant. The point is that if blk_size counts in KB, then the size of a device cannot reach more that 2^32 * 2^10 = 2^42 = 4TB. I'd personally say 2TB, becuase the int blk_size number is signed. That's rumoured not to be the case, and the max size of a device is supposed to be about 8 to 16TB. Let's suppose the rumour is true .. So we deduce that one has to assign a different meaning for the blk_size array. "count in blocks" is how the rumour goes. That way you can get 4 times higher sizes .. all the way to 8 or 16TB per device. And this is what is rumoured to be the case. Is it or is it not so? A straight answer from the list would be nice! > ./linux/fs.h:#define BLOCK_SIZE_BITS 10 > ./linux/fs.h:#define BLOCK_SIZE (1<<BLOCK_SIZE_BITS) These are _defaults_ for _blksize_. Sure you can change it as you like, but according to the "blk_size is in KB" hypothesis, this matters not one iota to the size limit on devices. Change blksize and size does not change. But according to the "blk_size is in blocks" hypothesis, yes changing blksize will change the size of the device. Testing shows that scenario "blk_size is in KB" is true. Am I making plain the difference between blk_size and blksize? blk_size is the number of blocks or KB (which?) in a device. blksize is the size of the blocks. Is blk_size in KB or blocks? It should be in blocks if the size of a device is to reach 8 or 16TB. If it is in KB, we are limited to 2 or 4TB. > Which means 1024 bytes for blk_size as default value. But so what? That doesn't answer the question of whether blk_size is in blocks or not. > > it doesn't. Look at this code from ll_rw_blk.c in 2.4.[14]: Look at it: if (blk_size[major]) minorsize = blk_size[major][MINOR(bh->b_rdev)]; if (minorsize) { unsigned long maxsector = (minorsize << 1) + 1; This clearly hardcodes blk_size as measuring in units of 2 sectors, no matter what we set for blksize. It should be, in my view unsigned long maxsector = minorsize * blksize_size[major][MINOR(bh->b_rdev] + 1; or no device can be larger than 4TB. And neither can a filesystem, and neither can a file ... Now, I know I can write my own generic_make_request() code, but I have no intention of maintaining it through different kernel versions just to get the right size measurement. Besides, it's everyone's problem. Persuade me that this is not a bug, and an important one at that :-) Hellloooooo everybody! Linux cannot manage partitions greater than 4TB, ha ha ha hhhhhaaaa! ;-) I at least am getting up to devicesizes at the 8TB range. Best wishes! Peter ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-14 20:41 ` Peter T. Breuer @ 2001-11-14 20:51 ` Martin Dalecki 2001-11-14 21:16 ` Andreas Dilger 2001-11-15 5:34 ` William Park 2 siblings, 0 replies; 17+ messages in thread From: Martin Dalecki @ 2001-11-14 20:51 UTC (permalink / raw) To: ptb; +Cc: dalecki, linux kernel "Peter T. Breuer" wrote: > > "Martin Dalecki wrote:" > > "Peter T. Breuer" wrote: > > > > > > Let me put it more plainly. Martin Daleki + rumour assures me that the > > > blk_size array nowadays measure in blocks not KB, yet to me it seems that > > > > sectors = 512 per default > > blocks = 1024 per default. > > I know that! But it's irrelevant. What I need to know is if blk_size is > still counting in KB, or if it has switched to blocks. > > > Never said anything else. > > Err .. you said that blk_size is now measured in blocks, not KB. You > said thet the rumour is true. > > "A month of sundays ago Martin Dalecki wrote:" > > "Peter T. Breuer" wrote: > > > Is blk_size[][] supposed to contain the size in KB or blocks? > > There is no rumor it's in blocks. > > Maybe I misinterpret what you write. I interpret it as meaning "the > rumour is not a rumour but a fact. It is in blocks". > > > Look at the initialization point for the arrays. They all use constants > > which you can look up in the kernel headers. > > I _know_ that. It's irrelevant. > > The point is that if blk_size counts in KB, then the size of a device > cannot reach more that 2^32 * 2^10 = 2^42 = 4TB. I'd personally say > 2TB, becuase the int blk_size number is signed. > > That's rumoured not to be the case, and the max size of a device is > supposed to be about 8 to 16TB. Let's suppose the rumour is true .. > > So we deduce that one has to assign a different meaning for the blk_size > array. "count in blocks" is how the rumour goes. That way you can get > 4 times higher sizes .. all the way to 8 or 16TB per device. And this > is what is rumoured to be the case. > > Is it or is it not so? A straight answer from the list would be nice! > > > ./linux/fs.h:#define BLOCK_SIZE_BITS 10 > > ./linux/fs.h:#define BLOCK_SIZE (1<<BLOCK_SIZE_BITS) > > These are _defaults_ for _blksize_. Sure you can change it as you like, > but according to the "blk_size is in KB" hypothesis, this matters not one > iota to the size limit on devices. Change blksize and size does not > change. But according to the "blk_size is in blocks" hypothesis, yes > changing blksize will change the size of the device. Testing shows > that scenario "blk_size is in KB" is true. > > Am I making plain the difference between blk_size and blksize? > > blk_size is the number of blocks or KB (which?) in a device. blksize is > the size of the blocks. Is blk_size in KB or blocks? > > It should be in blocks if the size of a device is to reach 8 or 16TB. > If it is in KB, we are limited to 2 or 4TB. > > > Which means 1024 bytes for blk_size as default value. > > But so what? That doesn't answer the question of whether blk_size > is in blocks or not. > > > > it doesn't. Look at this code from ll_rw_blk.c in 2.4.[14]: > > Look at it: > > if (blk_size[major]) > minorsize = blk_size[major][MINOR(bh->b_rdev)]; > if (minorsize) { > unsigned long maxsector = (minorsize << 1) + 1; > > This clearly hardcodes blk_size as measuring in units of 2 sectors, no > matter what we set for blksize. It should be, in my view > > unsigned long maxsector = > minorsize * blksize_size[major][MINOR(bh->b_rdev] + 1; > > or no device can be larger than 4TB. And neither can a filesystem, and > neither can a file ... > > Now, I know I can write my own generic_make_request() code, but I have > no intention of maintaining it through different kernel versions just > to get the right size measurement. Besides, it's everyone's problem. > > Persuade me that this is not a bug, and an important one at that :-) > Hellloooooo everybody! Linux cannot manage partitions greater than > 4TB, ha ha ha hhhhhaaaa! ;-) > > I at least am getting up to devicesizes at the 8TB range. > The usage of it in block_dev.c is showing that in fact the matters are more complicated that all your hypothesis together... if (blk_size[MAJOR(dev)]) size = ((loff_t) blk_size[MAJOR(dev)][MINOR(dev)] << BLOCK_SIZE_BITS) >> blocksize_bits; else size = INT_MAX; The blk_size is in 90 out of 100 cases in units of 1024, which is the default *logical* blocksize used by linux. When this overflows, the block device layer just simply will not care a damn bit about it and it will rely on the driver to notice overflow. Therefore the answer is that yes it is in units of KB but linux will still happy work with devices bigger then this. Correct me please if I'm wrong... Slowly I start to look puzled myself. If this is the case you can regard blk_size as the same kind of silly blunt like the read_ahead array. > Best wishes! > > Peter -- - phone: +49 214 8656 283 - job: eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!) - langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort: ru_RU.KOI8-R ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-14 20:41 ` Peter T. Breuer 2001-11-14 20:51 ` Martin Dalecki @ 2001-11-14 21:16 ` Andreas Dilger 2001-11-14 21:49 ` Benjamin LaHaise 2001-11-15 1:48 ` William Park 2001-11-15 5:34 ` William Park 2 siblings, 2 replies; 17+ messages in thread From: Andreas Dilger @ 2001-11-14 21:16 UTC (permalink / raw) To: Peter T. Breuer; +Cc: dalecki, linux kernel On Nov 14, 2001 21:41 +0100, Peter T. Breuer wrote: > "A month of sundays ago Martin Dalecki wrote:" > > "Peter T. Breuer" wrote: > > > Is blk_size[][] supposed to contain the size in KB or blocks? > > There is no rumor it's in blocks. > > Maybe I misinterpret what you write. I interpret it as meaning "the > rumour is not a rumour but a fact. It is in blocks". Check what /proc/partitions shows us. #blocks, with units of 1kB. This has been standard in the kernel for a loooong time. > The point is that if blk_size counts in KB, then the size of a device > cannot reach more that 2^32 * 2^10 = 2^42 = 4TB. I'd personally say > 2TB, becuase the int blk_size number is signed. > > That's rumoured not to be the case, and the max size of a device is > supposed to be about 8 to 16TB. Let's suppose the rumour is true .. Well, the rumor is wrong. There has always been a single-device 1TB/2TB limit in the kernel (2^31 or 2^32 * 512 byte sector size), and until recently it has not been a problem. To remove the problem Jens Axboe (I think, or Ben LaHaise, can't remember) has a patch to support 64-bit block counts and has been tested with > 2TB devices. > So we deduce that one has to assign a different meaning for the blk_size > array. "count in blocks" is how the rumour goes. That way you can get > 4 times higher sizes .. all the way to 8 or 16TB per device. And this > is what is rumoured to be the case. Where do you get these rumors? > It should be in blocks if the size of a device is to reach 8 or 16TB. > If it is in KB, we are limited to 2 or 4TB. In theory this is possible (it was discussed on the LVM list a bit), but it would take a bunch of work to make it real. For LVM (and MD RAID), since we are dealing with multiple real devices < 2TB in size, we could use a blocksize of 4kB to get a larger virtual device. In the end this only wins for a short time and you need 64-bit block numbers anyways. > Persuade me that this is not a bug, and an important one at that :-) > Hellloooooo everybody! Linux cannot manage partitions greater than > 4TB, ha ha ha hhhhhaaaa! ;-) And it can't handle more than 64GB of RAM on ia32 (was previously 1GB). So what? When a limit is reached for any reasonable number of people, it is fixed. > I at least am getting up to devicesizes at the 8TB range. If you are in that ballpark, then get the 64-bit blocknumber patch, and start testing/fixing, instead of complaining. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-14 21:16 ` Andreas Dilger @ 2001-11-14 21:49 ` Benjamin LaHaise 2001-11-14 22:33 ` Scott Laird 2001-11-15 1:48 ` William Park 1 sibling, 1 reply; 17+ messages in thread From: Benjamin LaHaise @ 2001-11-14 21:49 UTC (permalink / raw) To: Peter T. Breuer, dalecki, linux kernel On Wed, Nov 14, 2001 at 02:16:39PM -0700, Andreas Dilger wrote: > Well, the rumor is wrong. There has always been a single-device 1TB/2TB > limit in the kernel (2^31 or 2^32 * 512 byte sector size), and until > recently it has not been a problem. To remove the problem Jens Axboe > (I think, or Ben LaHaise, can't remember) has a patch to support 64-bit > block counts and has been tested with > 2TB devices. It was tested with a 10TB loopback raid, not a real device. Strangly, nobody made any effort to test on real physical hardware (or offer any hardware for me to test on ;-). The patch was against ~2.4.6 and will need to get dusted off again soon. -ben -- Fish. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-14 21:49 ` Benjamin LaHaise @ 2001-11-14 22:33 ` Scott Laird 0 siblings, 0 replies; 17+ messages in thread From: Scott Laird @ 2001-11-14 22:33 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: Peter T. Breuer, dalecki, linux kernel On Wed, 14 Nov 2001, Benjamin LaHaise wrote: > > On Wed, Nov 14, 2001 at 02:16:39PM -0700, Andreas Dilger wrote: > > Well, the rumor is wrong. There has always been a single-device 1TB/2TB > > limit in the kernel (2^31 or 2^32 * 512 byte sector size), and until > > recently it has not been a problem. To remove the problem Jens Axboe > > (I think, or Ben LaHaise, can't remember) has a patch to support 64-bit > > block counts and has been tested with > 2TB devices. > > It was tested with a 10TB loopback raid, not a real device. Strangly, > nobody made any effort to test on real physical hardware (or offer any > hardware for me to test on ;-). The patch was against ~2.4.6 and will > need to get dusted off again soon. > Interesting. I have a couple 14x 100GB IDE boxes scheduled to show up next week. If I can get a patch for a reasonably recent kernel, I could do a few tests on a ~1.2 GB FS, and maybe on one a bit bigger. Once 160GB drives start shipping, it should be possible to make a 2TB software RAID5 box in a 4U case for around $7k. Interesting question: does Linux have problems with large NFS imports? Scott ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-14 21:16 ` Andreas Dilger 2001-11-14 21:49 ` Benjamin LaHaise @ 2001-11-15 1:48 ` William Park 2001-11-15 4:58 ` Andreas Dilger 1 sibling, 1 reply; 17+ messages in thread From: William Park @ 2001-11-15 1:48 UTC (permalink / raw) To: linux kernel; +Cc: Peter T. Breuer, dalecki On Wed, Nov 14, 2001 at 02:16:39PM -0700, Andreas Dilger wrote: > > I at least am getting up to devicesizes at the 8TB range. > > If you are in that ballpark, then get the 64-bit blocknumber patch, > and start testing/fixing, instead of complaining. > > Cheers, Andreas Hi Andreas, can you give us URL for this 64-bit patch? I also want to go past 1TB (512 * 2^31) filesystem size. -- William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>. 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-15 1:48 ` William Park @ 2001-11-15 4:58 ` Andreas Dilger 0 siblings, 0 replies; 17+ messages in thread From: Andreas Dilger @ 2001-11-15 4:58 UTC (permalink / raw) To: linux kernel, Peter T. Breuer, dalecki On Nov 14, 2001 20:48 -0500, William Park wrote: > On Wed, Nov 14, 2001 at 02:16:39PM -0700, Andreas Dilger wrote: > > > I at least am getting up to devicesizes at the 8TB range. > > > > If you are in that ballpark, then get the 64-bit blocknumber patch, > > and start testing/fixing, instead of complaining. > > Hi Andreas, can you give us URL for this 64-bit patch? I also want to > go past 1TB (512 * 2^31) filesystem size. I don't have it, try a search of the l-k archives, around June of this year. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-14 20:41 ` Peter T. Breuer 2001-11-14 20:51 ` Martin Dalecki 2001-11-14 21:16 ` Andreas Dilger @ 2001-11-15 5:34 ` William Park 2001-11-15 5:55 ` Andreas Dilger ` (2 more replies) 2 siblings, 3 replies; 17+ messages in thread From: William Park @ 2001-11-15 5:34 UTC (permalink / raw) To: Peter T. Breuer; +Cc: linux kernel On Wed, Nov 14, 2001 at 09:41:11PM +0100, Peter T. Breuer wrote: > Am I making plain the difference between blk_size and blksize? > > blk_size is the number of blocks or KB (which?) in a device. blksize is > the size of the blocks. Is blk_size in KB or blocks? > > It should be in blocks if the size of a device is to reach 8 or 16TB. > If it is in KB, we are limited to 2 or 4TB. I've been following this thread intensely. I need to use Network Block Device to get very large network-RAID. And, resolution to this issue is of great interest to me. Judging by 'driver/block/nbd.c', it counts by BLOCK_SIZE=1204 (BLOCK_SIZE_BITS=10), even though you can set the block size to [512,1024,...,PAGE_SIZE=4096]. Since NBD counts this 1KB block using 'u64' integer, the ultimate size of filesystem is determined by the kernel block device support. Looking at 'fs/block_dev.c', you can set the block size to [512,1024,...,PAGE_SIZE=4096] also. But, 'max_block()' returns block count in whatever block size of the device, not in BLOCK_SIZE: static unsigned long max_block(kdev_t dev) { unsigned int retval = ~0U; int major = MAJOR(dev); if (blk_size[major]) { int minor = MINOR(dev); unsigned int blocks = blk_size[major][minor]; if (blocks) { unsigned int size = block_size(dev); unsigned int sizebits = blksize_bits(size); blocks += (size-1) >> BLOCK_SIZE_BITS; retval = blocks << (BLOCK_SIZE_BITS - sizebits); if (sizebits > BLOCK_SIZE_BITS) retval = blocks >> (sizebits - BLOCK_SIZE_BITS); } } return retval; } In particular, if block size is 512, then the block count is multiplied by 2; and if block size if 4096, then the block count is divided by 4. It thinks that 'blk_size[][]' is block count in KB. So, I can only deduce that block count is in KB. Also, from 'include/linux/blkdev.h', extern int * blk_size[MAX_BLKDEV]; 'blk_size[][]' is 'int', which means maximum size of block device is 2^10 x 2^31 = 2^41 = 2TB. However, because it is always converted to 'unsigned int' for block count calculation, I think you can take it as 4TB. Am I right? -- William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>. 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-15 5:34 ` William Park @ 2001-11-15 5:55 ` Andreas Dilger 2001-11-15 10:42 ` Anton Altaparmakov 2001-11-15 12:35 ` Peter T. Breuer 2 siblings, 0 replies; 17+ messages in thread From: Andreas Dilger @ 2001-11-15 5:55 UTC (permalink / raw) To: Peter T. Breuer, linux kernel On Nov 15, 2001 00:34 -0500, William Park wrote: > Judging by 'driver/block/nbd.c', it counts by BLOCK_SIZE=1204 > (BLOCK_SIZE_BITS=10), even though you can set the block size to > [512,1024,...,PAGE_SIZE=4096]. Since NBD counts this 1KB block using > 'u64' integer, the ultimate size of filesystem is determined by the > kernel block device support. > > Looking at 'fs/block_dev.c', you can set the block size to > [512,1024,...,PAGE_SIZE=4096] also. But, 'max_block()' returns block > count in whatever block size of the device, not in BLOCK_SIZE: Sadly, while you _might_ be able to change the BLOCK_SIZE to be something other than 1kB, there are probably so many places that assume a 1kB size that you will need a lot of fixing. I'm not saying that fixing these things is bad (it would actually be good for many reasons), but just a heads-up that changing the BLOCK_SIZE define _probably_ won't get you 8TB devices (maybe a broken system, or corrupt fs instead). Use caution. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-15 5:34 ` William Park 2001-11-15 5:55 ` Andreas Dilger @ 2001-11-15 10:42 ` Anton Altaparmakov 2001-11-15 12:35 ` Peter T. Breuer 2 siblings, 0 replies; 17+ messages in thread From: Anton Altaparmakov @ 2001-11-15 10:42 UTC (permalink / raw) To: Andreas Dilger; +Cc: Peter T. Breuer, linux kernel At 05:55 15/11/01, Andreas Dilger wrote: >On Nov 15, 2001 00:34 -0500, William Park wrote: > > Judging by 'driver/block/nbd.c', it counts by BLOCK_SIZE=1204 > > (BLOCK_SIZE_BITS=10), even though you can set the block size to > > [512,1024,...,PAGE_SIZE=4096]. Since NBD counts this 1KB block using > > 'u64' integer, the ultimate size of filesystem is determined by the > > kernel block device support. > > > > Looking at 'fs/block_dev.c', you can set the block size to > > [512,1024,...,PAGE_SIZE=4096] also. But, 'max_block()' returns block > > count in whatever block size of the device, not in BLOCK_SIZE: > >Sadly, while you _might_ be able to change the BLOCK_SIZE to be something >other than 1kB, there are probably so many places that assume a 1kB size >that you will need a lot of fixing. I'm not saying that fixing these >things is bad (it would actually be good for many reasons), but just a >heads-up that changing the BLOCK_SIZE define _probably_ won't get you 8TB >devices (maybe a broken system, or corrupt fs instead). Use caution. I changed BLOCK_SIZE back in the 2.4.0-test8 to 512 and had to do some modifications to drivers/ide, drivers/scsi, fs/partitions and to fs/ext2 to get it to work (patch is 10kiB so not too bad but it doesn't deal with the MD driver nor with any of the devices/fs I don't actually use). It then worked nicely for me. (Only minor problem with floppy disk resulting in a block size error from ll_rw_block but it always went ahead and worked after outputting the error.) And yes, the fixes needed are mostly because of assumptions about BLOCK_SIZE being 1024 bytes... If anyone is interested in having a look, the now outdated patch is available on the web: http://www-stu.christs.cam.ac.uk/~aia21/linux/blksize512.patch Anton -- "I've not lost my mind. It's backed up on tape somewhere." - Unknown -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/ ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-15 5:34 ` William Park 2001-11-15 5:55 ` Andreas Dilger 2001-11-15 10:42 ` Anton Altaparmakov @ 2001-11-15 12:35 ` Peter T. Breuer 2001-11-15 18:31 ` William Park 2 siblings, 1 reply; 17+ messages in thread From: Peter T. Breuer @ 2001-11-15 12:35 UTC (permalink / raw) To: William Park; +Cc: linux kernel "A month of sundays ago William Park wrote:" > On Wed, Nov 14, 2001 at 09:41:11PM +0100, Peter T. Breuer wrote: > > blk_size is the number of blocks or KB (which?) in a device. blksize is > > the size of the blocks. Is blk_size in KB or blocks? > > > > It should be in blocks if the size of a device is to reach 8 or 16TB. > > If it is in KB, we are limited to 2 or 4TB. > > I've been following this thread intensely. I need to use Network Block > Device to get very large network-RAID. And, resolution to this issue is > of great interest to me. Yes, well, you may be the person on whose behalf I started checking it out. To put your mind at rest, all devices (partitions, etc.) are limited by the 32 bit int that holds the number of sectors on a device. This is 31+9 bits of space (i don't know whether negative sectors are counted as positive :), which is 1 (or 2, if you use unsigned interpretation) TB. So the blk_size/blksize business is irrelevant. > Judging by 'driver/block/nbd.c', it counts by BLOCK_SIZE=1204 > (BLOCK_SIZE_BITS=10), even though you can set the block size to > [512,1024,...,PAGE_SIZE=4096]. Since NBD counts this 1KB block using > 'u64' integer, the ultimate size of filesystem is determined by the > kernel block device support. This is correct, but it's quite a deep dependence in the kernel. Though nbd (and my enbd) use 64 bit sizes in their network protocols, you can't get rid of the limitation just like that. The kernel is infested with the limit associated with a 32bit sector count. The 32bit KB count in blk_size is also a limit, but never an active one as the sector count bites first, before the other is reached. The kernel's VM thinks those sector counts are 32 bit and that sectors are 512B, which means the person in charge of the VM must handle any changes. To tell the truth, counting in 512B sectors is the only sane way to go. It sends you mad counting in units of blocks, because that can be a variable size. > Looking at 'fs/block_dev.c', you can set the block size to > [512,1024,...,PAGE_SIZE=4096] also. But, 'max_block()' returns block > count in whatever block size of the device, not in BLOCK_SIZE: It looks to me as though block_dev.c has been "prepared" to be more flexible, and that it will be a short job to either use 64bit sector counts for it, or move to counting in blocks. The same work has not gone into ll_rw_blk.c yet. > In particular, if block size is 512, then the block count is multiplied > by 2; and if block size if 4096, then the block count is divided by 4. > It thinks that 'blk_size[][]' is block count in KB. So, I can only > deduce that block count is in KB. It still is, yes. > Also, from 'include/linux/blkdev.h', > extern int * blk_size[MAX_BLKDEV]; > 'blk_size[][]' is 'int', which means maximum size of block device is > 2^10 x 2^31 = 2^41 = 2TB. However, because it is always converted to > 'unsigned int' for block count calculation, I think you can take it as > 4TB. > > Am I right? As far as I can tell. I was trying to ask here if it had changed, but evidently it has not. What is the forward strategy? I see no alternative but moving to 64bit sector counts. Peter ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-15 12:35 ` Peter T. Breuer @ 2001-11-15 18:31 ` William Park 2001-11-15 20:19 ` Andreas Dilger 0 siblings, 1 reply; 17+ messages in thread From: William Park @ 2001-11-15 18:31 UTC (permalink / raw) To: linux kernel; +Cc: Peter T. Breuer, Andreas Dilger On Thu, Nov 15, 2001 at 01:35:26PM +0100, Peter T. Breuer wrote: > What is the forward strategy? I see no alternative but moving to 64bit > sector counts. Me too. I looked around, and 1KB block size is hard-coded in too many places. For example, function 'generic_make_request()' in 'drivers/block/ll_rw_blk.c' assumes 512 sector and 1024 block size: if (blk_size[major]) minorsize = blk_size[major][MINOR(bh->b_rdev)]; if (minorsize) { unsigned long maxsector = (minorsize << 1) + 1; <-- unsigned long sector = bh->b_rsector; unsigned int count = bh->b_size >> 9; So, using 'u64 *blk_size[][]' seems to be the most straightforward solution, leaving BLOCK_SIZE alone. I thought 'drivers/block/nbd.c' was already using 64-bit count, according to its comment at the top. But, curiously, it reverts back to 'int' count of BLOCK_SIZE. I tried searching list archives for 64-bit patch, but no luck. Any URL would be helpful. Is changing 'int' to 'u64' (and all the dependent code) enough to get 64-bit block devices? I'm willing to do the work. I don't care about filesystem; that's the job for maintainer of particular filesystem. I understand XFS is 64-bit, so I can use that. -- William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>. 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? (was: .. current meaning of blk_size array) 2001-11-15 18:31 ` William Park @ 2001-11-15 20:19 ` Andreas Dilger 2001-11-15 22:04 ` blocks or KB? William Park 0 siblings, 1 reply; 17+ messages in thread From: Andreas Dilger @ 2001-11-15 20:19 UTC (permalink / raw) To: linux kernel, Peter T. Breuer On Nov 15, 2001 13:31 -0500, William Park wrote: > I looked around, and 1KB block size is hard-coded in too many places. > For example, function 'generic_make_request()' in > 'drivers/block/ll_rw_blk.c' assumes 512 sector and 1024 block size: Yes, it _would_ be nice to clean this up, but it is a lot of work. You could check out Anton's patch (posted today) for this as a starting point. > Is changing 'int' to 'u64' (and all the dependent code) enough to get > 64-bit block devices? I'm willing to do the work. It is already done, please don't duplicate. Search for 64 bit block devices around June of this year for a URL to Jens'/Ben's patch. Please repost the URL, as several people have asked. > I don't care about filesystem; that's the job for maintainer of particular > filesystem. I understand XFS is 64-bit, so I can use that. FYI, ext2/ext3 _should_ be OK up to 8TB (possibly 16TB depending on sign issues) filesystem, with individual files at 2TB, when using a 4kB block size. However, there appear to be other issues like VFS and page cache which may have problems at this point as well. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blocks or KB? 2001-11-15 20:19 ` Andreas Dilger @ 2001-11-15 22:04 ` William Park 0 siblings, 0 replies; 17+ messages in thread From: William Park @ 2001-11-15 22:04 UTC (permalink / raw) To: linux kernel On Thu, Nov 15, 2001 at 01:19:38PM -0700, Andreas Dilger wrote: > > Is changing 'int' to 'u64' (and all the dependent code) enough to > > get 64-bit block devices? I'm willing to do the work. > > It is already done, please don't duplicate. Search for 64 bit block > devices around June of this year for a URL to Jens'/Ben's patch. > Please repost the URL, as several people have asked. Found it -- http://people.redhat.com/bcrl/lb/. Strangely, it wasn't in the linux-kernel list. -- William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>. 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2001-11-15 22:05 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-11-13 15:08 what is teh current meaning of blk_size? Peter T. Breuer 2001-11-13 18:51 ` blocks or KB? (was: .. current meaning of blk_size array) Peter T. Breuer 2001-11-14 9:44 ` Martin Dalecki 2001-11-14 20:41 ` Peter T. Breuer 2001-11-14 20:51 ` Martin Dalecki 2001-11-14 21:16 ` Andreas Dilger 2001-11-14 21:49 ` Benjamin LaHaise 2001-11-14 22:33 ` Scott Laird 2001-11-15 1:48 ` William Park 2001-11-15 4:58 ` Andreas Dilger 2001-11-15 5:34 ` William Park 2001-11-15 5:55 ` Andreas Dilger 2001-11-15 10:42 ` Anton Altaparmakov 2001-11-15 12:35 ` Peter T. Breuer 2001-11-15 18:31 ` William Park 2001-11-15 20:19 ` Andreas Dilger 2001-11-15 22:04 ` blocks or KB? William Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox