* understanding speculative preallocation @ 2013-07-26 7:35 jbr 0 siblings, 0 replies; 16+ messages in thread From: jbr @ 2013-07-26 7:35 UTC (permalink / raw) To: linux-xfs Hello, I'm looking for general documentation/help with the speculative preallocation feature in xfs. So far, I haven't really been able to find any definitive, up to date documentation on it. I'm wondering how I can find out definitively which version of xfs I am using, and what the preallocation scheme in use is. We are running apache kafka on our servers, and kafka uses sequential io to write data log files. Kafka uses, by default, a maximum log file size of 1Gb. However, most of the log files end up being 2Gb, and thus the disk fills up twice as fast as it should. We are using xfs on CentOS 2.6.32-358. Is there a way I can know which version of xfs is built into this version of the kernel? What preallocation schedule does it use? If I do a xfs_info -V, it reports 3.1.1. We are using xfs (mounted with no allocsize specified). I've seen varying info suggesting this means it either defaults to an allocsize of 64K (which doesn't seem to match my observations), or that it will use dynamic preallocation. I've also seen hints (but no actual canonical documentation) suggesting that the dynamic preallocation works by progressively doubling the current file size (which does match my observations). What I'm not clear on, is the scheduling for the preallocation. At what point does it decide to preallocate the next doubling of space. Is it when the current preallocated space is used up, or does it happen when the current space is used up within some threshold. What I'd like to do, is keep the doubling behavior in tact, but have it capped so it never increases the file beyond 1Gb. Is there a way to do that? Can I trick the preallocation to not do a final doubling, if I cap my kafka log files at say, 900Mb (or some percentage under 1Gb)? There are numerous references to an allocation schedule like this: freespace max prealloc size >5% full extent (8GB) 4-5% 2GB (8GB >> 2) 3-4% 1GB (8GB >> 3) 2-3% 512MB (8GB >> 4) 1-2% 256MB (8GB >> 5) <1% 128MB (8GB >> 6) I'm just not sure I understand what this is telling me. It seems to tell me what the max prealloc size is, with being reduced if the disk is nearly full. But it doesn't tell me about the progressive doubling in preallocation (I assume up to a max of 8Gb). Is any of this configurable? Can we specify a max prealloc size somewhere? The other issue seems to be that after the files are closed (from within the java jvm), they still don't seem to have their pre-allocated space reclaimed. Are there known issues with closing the files in java not properly causing a flush of the preallocated space? Any help pointing me to any documentation/user guides which accurately describes this would be appreciated! Thanks, Jason -- View this message in context: http://xfs.9218.n7.nabble.com/understanding-speculative-preallocation-tp35003.html Sent from the linux-xfs mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* understanding speculative preallocation @ 2013-07-26 7:23 jbr 2013-07-26 11:50 ` Dave Chinner 0 siblings, 1 reply; 16+ messages in thread From: jbr @ 2013-07-26 7:23 UTC (permalink / raw) To: xfs Hello, I'm looking for general documentation/help with the speculative preallocation feature in xfs. So far, I haven't really been able to find any definitive, up to date documentation on it. I'm wondering how I can find out definitively which version of xfs I am using, and what the preallocation scheme in use is. We are running apache kafka on our servers, and kafka uses sequential io to write data log files. Kafka uses, by default, a maximum log file size of 1Gb. However, most of the log files end up being 2Gb, and thus the disk fills up twice as fast as it should. We are using xfs on CentOS 2.6.32-358. Is there a way I can know which version of xfs is built into this version of the kernel? What preallocation schedule does it use? If I do a xfs_info -V, it reports 3.1.1. We are using xfs (mounted with no allocsize specified). I've seen varying info suggesting this means it either defaults to an allocsize of 64K (which doesn't seem to match my observations), or that it will use dynamic preallocation. I've also seen hints (but no actual canonical documentation) suggesting that the dynamic preallocation works by progressively doubling the current file size (which does match my observations). What I'm not clear on, is the scheduling for the preallocation. At what point does it decide to preallocate the next doubling of space. Is it when the current preallocated space is used up, or does it happen when the current space is used up within some threshold. What I'd like to do, is keep the doubling behavior in tact, but have it capped so it never increases the file beyond 1Gb. Is there a way to do that? Can I trick the preallocation to not do a final doubling, if I cap my kafka log files at say, 900Mb (or some percentage under 1Gb)? There are numerous references to an allocation schedule like this: freespace max prealloc size >5% full extent (8GB) 4-5% 2GB (8GB >> 2) 3-4% 1GB (8GB >> 3) 2-3% 512MB (8GB >> 4) 1-2% 256MB (8GB >> 5) <1% 128MB (8GB >> 6) I'm just not sure I understand what this is telling me. It seems to tell me what the max prealloc size is, with being reduced if the disk is nearly full. But it doesn't tell me about the progressive doubling in preallocation (I assume up to a max of 8Gb). Is any of this configurable? Can we specify a max prealloc size somewhere? The other issue seems to be that after the files are closed (from within the java jvm), they still don't seem to have their pre-allocated space reclaimed. Are there known issues with closing the files in java not properly causing a flush of the preallocated space? Any help pointing me to any documentation/user guides which accurately describes this would be appreciated! Thanks, Jason -- View this message in context: http://xfs.9218.n7.nabble.com/understanding-speculative-preallocation-tp35002.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 7:23 jbr @ 2013-07-26 11:50 ` Dave Chinner 2013-07-26 17:40 ` Jason Rosenberg 0 siblings, 1 reply; 16+ messages in thread From: Dave Chinner @ 2013-07-26 11:50 UTC (permalink / raw) To: jbr; +Cc: xfs On Fri, Jul 26, 2013 at 12:23:40AM -0700, jbr wrote: > Hello, > > I'm looking for general documentation/help with the speculative > preallocation feature in xfs. So far, I haven't really been able to find > any definitive, up to date documentation on it. Read the code - it's documented in the comments. ;) Or ask questions here, because the code changes and the only up to date reference is the code and/or the developers that work on it... > I'm wondering how I can find out definitively which version of xfs I am > using, and what the preallocation scheme in use is. Look at the kernel version, then look at the corresponding source code. > We are running apache kafka on our servers, and kafka uses sequential io to > write data log files. Kafka uses, by default, a maximum log file size of > 1Gb. However, most of the log files end up being 2Gb, and thus the disk > fills up twice as fast as it should. > > We are using xfs on CentOS 2.6.32-358. Is there a way I can know which > version of xfs is built into this version of the kernel? The XFS code is part of the kernel, so look at the kernel source code that CentOS ships. > We are using xfs (mounted with no allocsize specified). I've seen varying > info suggesting this means it either defaults to an allocsize of 64K (which > doesn't seem to match my observations), or that it will use dynamic > preallocation. > > I've also seen hints (but no actual canonical documentation) suggesting that > the dynamic preallocation works by progressively doubling the current file > size (which does match my observations). Well, it started off that way, but it has been refined since to handle many different cases where this behaviour is sub-optimal. > What I'm not clear on, is the scheduling for the preallocation. At what > point does it decide to preallocate the next doubling of space. Depends on the type of IO being done. > Is it when > the current preallocated space is used up, Usually. > or does it happen when the > current space is used up within some threshold. No. > What I'd like to do, is > keep the doubling behavior in tact, but have it capped so it never increases > the file beyond 1Gb. Is there a way to do that? No. > Can I trick the > preallocation to not do a final doubling, if I cap my kafka log files at > say, 900Mb (or some percentage under 1Gb)? No. > There are numerous references to an allocation schedule like this: > > freespace max prealloc size > >5% full extent (8GB) > 4-5% 2GB (8GB >> 2) > 3-4% 1GB (8GB >> 3) > 2-3% 512MB (8GB >> 4) > 1-2% 256MB (8GB >> 5) > <1% 128MB (8GB >> 6) > > I'm just not sure I understand what this is telling me. It seems to tell me > what the max prealloc size is, with being reduced if the disk is nearly > full. Yes, that's correct. Mainline also does this for quota exhaustion, too. > But it doesn't tell me about the progressive doubling in > preallocation (I assume up to a max of 8Gb). Is any of this configurable? No. > Can we specify a max prealloc size somewhere? Use the allocsize mount option. It turns off dynamic behaviour and fixes the pre-allocation size. > The other issue seems to be that after the files are closed (from within the > java jvm), they still don't seem to have their pre-allocated space > reclaimed. Are there known issues with closing the files in java not > properly causing a flush of the preallocated space? Possibly. There's a heuristic that turns of truncation at close - if your applicatin keeps doing "open-write-close" it will not truncate preallocation. Log files typically see this IO pattern from applications, and hence triggering that "no truncate" heuristic is exactly what you want to have happen to avoid severe fragmentation of the log files. > Any help pointing me to any documentation/user guides which accurately > describes this would be appreciated! The mechanism is not documented outside the code as it changes from kernel release to kernel release and supposed to be transparent to userspace. It's being refined and optimisaed as issues are reported. Indeed, I suspect that all your problems would disappear on mainline due to the background removal of preallocation that is no longer needed, and Centos doesn't have that... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 11:50 ` Dave Chinner @ 2013-07-26 17:40 ` Jason Rosenberg 2013-07-26 19:27 ` Stan Hoeppner 2013-07-27 1:26 ` Dave Chinner 0 siblings, 2 replies; 16+ messages in thread From: Jason Rosenberg @ 2013-07-26 17:40 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs [-- Attachment #1.1: Type: text/plain, Size: 6488 bytes --] Hi Dave, Thanks for your responses. I'm a bit confused, as I didn't see your responses on the actual forum (only in my email inbox). Anyway, I'm surprised that you don't have some list or other way to correlate version history of XFS, with os release versions. I'm guessing the version I have is not using the latest/greatest. We actually have another system that uses an older version of the kernel (2.6.32-279), and it behaves differently (it still preallocates space beyond what will ever be used, but not by quite as much). When we rolled out our newer machines to 2.6.32-358, we started seeing a marked increase in disk full problems. If, say you tell me, the mainline xfs code has improved behavior, it would be nice to have a way to know which version of CentOS might include that? Telling me to read source code across multiple kernel versions sounds like an interesting endeavor, but not something that is the most efficient use of my time, unless there truly is no one who can easily tell me anything about xfs version history. Do you have any plans to have some sort of improved documentation story around this? This speculative preallocation behavior is truly unexpected and not transparent to the user. I can see that it's probably a great performance boost (especially for something like kafka), but kafka does have predictable log file rotation capped at fixed sizes, so it would be great if that could be factored in. I suppose using the allocsize setting might work in the short term. But I probably don't want to set allocsize to 1Gb, since that would mean every single file created would start with that size, is that right? Does the allocsize setting basically work by always keeping the file size ahead of consumed space by the allocsize amount? Thanks, Jason On Fri, Jul 26, 2013 at 7:50 AM, Dave Chinner <david@fromorbit.com> wrote: > On Fri, Jul 26, 2013 at 12:23:40AM -0700, jbr wrote: > > Hello, > > > > I'm looking for general documentation/help with the speculative > > preallocation feature in xfs. So far, I haven't really been able to find > > any definitive, up to date documentation on it. > > Read the code - it's documented in the comments. ;) > > Or ask questions here, because the code changes and the only up to > date reference is the code and/or the developers that work on it... > > > I'm wondering how I can find out definitively which version of xfs I am > > using, and what the preallocation scheme in use is. > > Look at the kernel version, then look at the corresponding source > code. > > > We are running apache kafka on our servers, and kafka uses sequential io > to > > write data log files. Kafka uses, by default, a maximum log file size of > > 1Gb. However, most of the log files end up being 2Gb, and thus the disk > > fills up twice as fast as it should. > > > > We are using xfs on CentOS 2.6.32-358. Is there a way I can know which > > version of xfs is built into this version of the kernel? > > The XFS code is part of the kernel, so look at the kernel source > code that CentOS ships. > > > We are using xfs (mounted with no allocsize specified). I've seen > varying > > info suggesting this means it either defaults to an allocsize of 64K > (which > > doesn't seem to match my observations), or that it will use dynamic > > preallocation. > > > > I've also seen hints (but no actual canonical documentation) suggesting > that > > the dynamic preallocation works by progressively doubling the current > file > > size (which does match my observations). > > Well, it started off that way, but it has been refined since to > handle many different cases where this behaviour is sub-optimal. > > > What I'm not clear on, is the scheduling for the preallocation. At what > > point does it decide to preallocate the next doubling of space. > > Depends on the type of IO being done. > > > Is it when > > the current preallocated space is used up, > > Usually. > > > or does it happen when the > > current space is used up within some threshold. > > No. > > > What I'd like to do, is > > keep the doubling behavior in tact, but have it capped so it never > increases > > the file beyond 1Gb. Is there a way to do that? > > No. > > > Can I trick the > > preallocation to not do a final doubling, if I cap my kafka log files at > > say, 900Mb (or some percentage under 1Gb)? > > No. > > > There are numerous references to an allocation schedule like this: > > > > freespace max prealloc size > > >5% full extent (8GB) > > 4-5% 2GB (8GB >> 2) > > 3-4% 1GB (8GB >> 3) > > 2-3% 512MB (8GB >> 4) > > 1-2% 256MB (8GB >> 5) > > <1% 128MB (8GB >> 6) > > > > I'm just not sure I understand what this is telling me. It seems to > tell me > > what the max prealloc size is, with being reduced if the disk is nearly > > full. > > Yes, that's correct. Mainline also does this for quota exhaustion, > too. > > > But it doesn't tell me about the progressive doubling in > > preallocation (I assume up to a max of 8Gb). Is any of this > configurable? > > No. > > > Can we specify a max prealloc size somewhere? > > Use the allocsize mount option. It turns off dynamic behaviour and > fixes the pre-allocation size. > > > The other issue seems to be that after the files are closed (from within > the > > java jvm), they still don't seem to have their pre-allocated space > > reclaimed. Are there known issues with closing the files in java not > > properly causing a flush of the preallocated space? > > Possibly. There's a heuristic that turns of truncation at close - if > your applicatin keeps doing "open-write-close" it will not truncate > preallocation. Log files typically see this IO pattern from > applications, and hence triggering that "no truncate" heuristic is > exactly what you want to have happen to avoid severe fragmentation > of the log files. > > > Any help pointing me to any documentation/user guides which accurately > > describes this would be appreciated! > > The mechanism is not documented outside the code as it changes from > kernel release to kernel release and supposed to be transparent to > userspace. It's being refined and optimisaed as issues are reported. > Indeed, I suspect that all your problems would disappear on mainline > due to the background removal of preallocation that is no longer > needed, and Centos doesn't have that... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > [-- Attachment #1.2: Type: text/html, Size: 7756 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 17:40 ` Jason Rosenberg @ 2013-07-26 19:27 ` Stan Hoeppner 2013-07-26 20:38 ` Jason Rosenberg 2013-07-27 4:26 ` Keith Keller 2013-07-27 1:26 ` Dave Chinner 1 sibling, 2 replies; 16+ messages in thread From: Stan Hoeppner @ 2013-07-26 19:27 UTC (permalink / raw) To: Jason Rosenberg; +Cc: xfs On 7/26/2013 12:40 PM, Jason Rosenberg wrote: > Anyway, I'm surprised that you don't have some list or other way to > correlate version history of XFS, with os release versions. I'm guessing > the version I have is not using the latest/greatest. We actually have > another system that uses an older version of the kernel (2.6.32-279), and 2.6.32-279 - this is not a mainline kernel version. This is a Red Hat specific string describing their internal kernel release. It has zero correlation to any version number of anything else in the world of mainline Linux. > If, say you tell me, the mainline xfs code has improved behavior, it would > be nice to have a way to know which version of CentOS might include that? IMNSHO, CentOS is a free proprietary chrome plated dog turd. It's flashy on the outside and claims it's "ENTERPRISE", "just like RHEL!". Then you crack it open and find nothing but crap inside. So you take it back to the store that gave it away for free and the doors are barred, the place out of business. The chrome has peeled off and you're stuck with crap that difficult to use. Every time you touch it you get dirty in some fashion. RHEL is a proprietary solid chrome turd you pay for. You can't get to the inside, but if you find a scratch and 'return' it Red Hat will say "we can help you fix that". If you avoid the flashy turds altogether while still plunking down no cash, and use a distro based entirely on mainline Linux and GNU user space source, you can get help directly from the folks who wrote the code you're running because they know what is where. Whether it be Linux proper, the XFS subsystem, NFS, Samba, Postix, etc. Such distributions are too numerous to mention. None of them are chrome plated, none claim to be "just like ENTERPRISE distro X". I tell all users of RHEL knock offs every time I see a situation like this: Either pay for and receive the support that's required for the proprietary distribution you're running, or use a completely open source distro based on mainline kernel source and GNU user space. By using a RHEL knock off, you're simply locking yourself into an outdated proprietary code base for which there is no viable support option, because so few people in the community understand the packaging of the constituent parts of the RHEL kernels. This is entirely intentional on the part of Red Hat, specifically to make the life of CentOS users painful, and rightfully so. FYI, many of the folks on the XFS list are Red Hat employees, including Dave. They'll gladly assist RHEL customers here if needed. However, to support CentOS users, especially in your situation, they'd have to use Red Hat Inc resources to hunt down the information related to the CentOS kernel you have that correlates to the RHEL kernel it is copied from. So they've just consumed Red Hat Inc resources to directly assist a free competitor who copied their distro. Thus there's not much incentive to assist CentOS users as they'd in essence be taking money out of their employer's pocket. Taken to the extreme this results in pay cuts, possibly furloughs or pink slips, etc. Surely this can't be the first time you've run into a free community support issue with the CentOS kernel. Surely what I've written isn't news to you. Pay Red Hat for RHEL, or switch to Debian, Ubuntu, Open Suse, etc. Either way you'll be able to get much better support. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 19:27 ` Stan Hoeppner @ 2013-07-26 20:38 ` Jason Rosenberg 2013-07-26 20:50 ` Ben Myers 2013-07-26 21:45 ` Eric Sandeen 2013-07-27 4:26 ` Keith Keller 1 sibling, 2 replies; 16+ messages in thread From: Jason Rosenberg @ 2013-07-26 20:38 UTC (permalink / raw) To: stan; +Cc: xfs [-- Attachment #1.1: Type: text/plain, Size: 4143 bytes --] Hi Stan, Thanks for the info (most of it was, in fact, news to me). I'm an application developer trying to debug a disk space problem, that's all. So far, I've tracked it down to being an XFS issue. So you are saying there's no public information that can correlate XFS versioning to CentOS (or RHEL) versioning? Sad state of affairs. If anyone can volunteer this info (if available to you) I'd be much appreciative. Regardless, is there a version history for XFS vis-a-via mainline Linux? Thanks, Jason On Fri, Jul 26, 2013 at 3:27 PM, Stan Hoeppner <stan@hardwarefreak.com>wrote: > On 7/26/2013 12:40 PM, Jason Rosenberg wrote: > > > Anyway, I'm surprised that you don't have some list or other way to > > correlate version history of XFS, with os release versions. I'm guessing > > the version I have is not using the latest/greatest. We actually have > > another system that uses an older version of the kernel (2.6.32-279), and > > 2.6.32-279 - this is not a mainline kernel version. This is a Red Hat > specific string describing their internal kernel release. It has zero > correlation to any version number of anything else in the world of > mainline Linux. > > > If, say you tell me, the mainline xfs code has improved behavior, it > would > > be nice to have a way to know which version of CentOS might include that? > > IMNSHO, CentOS is a free proprietary chrome plated dog turd. It's > flashy on the outside and claims it's "ENTERPRISE", "just like RHEL!". > Then you crack it open and find nothing but crap inside. So you take it > back to the store that gave it away for free and the doors are barred, > the place out of business. The chrome has peeled off and you're stuck > with crap that difficult to use. Every time you touch it you get dirty > in some fashion. > > RHEL is a proprietary solid chrome turd you pay for. You can't get to > the inside, but if you find a scratch and 'return' it Red Hat will say > "we can help you fix that". > > If you avoid the flashy turds altogether while still plunking down no > cash, and use a distro based entirely on mainline Linux and GNU user > space source, you can get help directly from the folks who wrote the > code you're running because they know what is where. Whether it be > Linux proper, the XFS subsystem, NFS, Samba, Postix, etc. Such > distributions are too numerous to mention. None of them are chrome > plated, none claim to be "just like ENTERPRISE distro X". I tell all > users of RHEL knock offs every time I see a situation like this: > > Either pay for and receive the support that's required for the > proprietary distribution you're running, or use a completely open source > distro based on mainline kernel source and GNU user space. By using a > RHEL knock off, you're simply locking yourself into an outdated > proprietary code base for which there is no viable support option, > because so few people in the community understand the packaging of the > constituent parts of the RHEL kernels. This is entirely intentional on > the part of Red Hat, specifically to make the life of CentOS users > painful, and rightfully so. > > FYI, many of the folks on the XFS list are Red Hat employees, including > Dave. They'll gladly assist RHEL customers here if needed. However, to > support CentOS users, especially in your situation, they'd have to use > Red Hat Inc resources to hunt down the information related to the CentOS > kernel you have that correlates to the RHEL kernel it is copied from. > So they've just consumed Red Hat Inc resources to directly assist a free > competitor who copied their distro. > > Thus there's not much incentive to assist CentOS users as they'd in > essence be taking money out of their employer's pocket. Taken to the > extreme this results in pay cuts, possibly furloughs or pink slips, etc. > > Surely this can't be the first time you've run into a free community > support issue with the CentOS kernel. Surely what I've written isn't > news to you. Pay Red Hat for RHEL, or switch to Debian, Ubuntu, Open > Suse, etc. Either way you'll be able to get much better support. > > -- > Stan > > [-- Attachment #1.2: Type: text/html, Size: 5123 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 20:38 ` Jason Rosenberg @ 2013-07-26 20:50 ` Ben Myers 2013-07-26 21:04 ` Jason Rosenberg 2013-07-26 21:45 ` Eric Sandeen 1 sibling, 1 reply; 16+ messages in thread From: Ben Myers @ 2013-07-26 20:50 UTC (permalink / raw) To: Jason Rosenberg; +Cc: stan, xfs Hi Jason, On Fri, Jul 26, 2013 at 04:38:21PM -0400, Jason Rosenberg wrote: > Thanks for the info (most of it was, in fact, news to me). I'm an > application developer trying to debug a disk space problem, that's all. So > far, I've tracked it down to being an XFS issue. The speculative block reservations have been an issue for awhile. You are not the first person to take issue with it. > Regardless, is there a version history for XFS vis-a-via mainline Linux? You can find a full version history for XFS back to 2.6.12 or so here: http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs.git;a=summary If you're interested in going older than that look here: http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=summary The function you'll most likely want to track is xfs_iomap_write_delay, which calls xfs_iomap_eof_want_preallocate, both of which are in fs/xfs/xfs_iomap.c. Recently Brian Foster added a scanner to remove the speculative block reservations on a timer which may give you some relief. See xfs_queue_eofblocks in fs/xfs/xfs_icache.c Regards, Ben _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 20:50 ` Ben Myers @ 2013-07-26 21:04 ` Jason Rosenberg 2013-07-26 21:11 ` Jason Rosenberg 0 siblings, 1 reply; 16+ messages in thread From: Jason Rosenberg @ 2013-07-26 21:04 UTC (permalink / raw) To: Ben Myers; +Cc: stan, xfs [-- Attachment #1.1: Type: text/plain, Size: 1245 bytes --] Thanks Ben, This is helpful. On Fri, Jul 26, 2013 at 4:50 PM, Ben Myers <bpm@sgi.com> wrote: > Hi Jason, > > On Fri, Jul 26, 2013 at 04:38:21PM -0400, Jason Rosenberg wrote: > > Thanks for the info (most of it was, in fact, news to me). I'm an > > application developer trying to debug a disk space problem, that's all. > So > > far, I've tracked it down to being an XFS issue. > > The speculative block reservations have been an issue for awhile. You are > not > the first person to take issue with it. > > > Regardless, is there a version history for XFS vis-a-via mainline Linux? > > You can find a full version history for XFS back to 2.6.12 or so here: > http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs.git;a=summary > > If you're interested in going older than that look here: > http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=summary > > The function you'll most likely want to track is xfs_iomap_write_delay, > which > calls xfs_iomap_eof_want_preallocate, both of which are in > fs/xfs/xfs_iomap.c. > > Recently Brian Foster added a scanner to remove the speculative block > reservations on a timer which may give you some relief. See > xfs_queue_eofblocks in fs/xfs/xfs_icache.c > > Regards, > Ben > > [-- Attachment #1.2: Type: text/html, Size: 1914 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 21:04 ` Jason Rosenberg @ 2013-07-26 21:11 ` Jason Rosenberg 2013-07-26 21:42 ` Ben Myers 2013-07-27 1:30 ` Dave Chinner 0 siblings, 2 replies; 16+ messages in thread From: Jason Rosenberg @ 2013-07-26 21:11 UTC (permalink / raw) To: Ben Myers; +Cc: stan, xfs [-- Attachment #1.1: Type: text/plain, Size: 1972 bytes --] Is it safe to say that speculative preallocation will not be used if a file is opened read-only? It turns out that the kafka server does indeed write lots of log files, and rotate them after they reach a max size, but never closes the files until the app exits, or until it deletes the files. This is because it needs to make them available for reading, etc. So, an obvious change for kafka might be to close each log file after rotating, and then re-open it read-only for consumers of the data. Does that sound like a solution that would pro-actively release pre-allocated storage? Thanks, Jason On Fri, Jul 26, 2013 at 5:04 PM, Jason Rosenberg <jbr@squareup.com> wrote: > Thanks Ben, > > This is helpful. > > > On Fri, Jul 26, 2013 at 4:50 PM, Ben Myers <bpm@sgi.com> wrote: > >> Hi Jason, >> >> On Fri, Jul 26, 2013 at 04:38:21PM -0400, Jason Rosenberg wrote: >> > Thanks for the info (most of it was, in fact, news to me). I'm an >> > application developer trying to debug a disk space problem, that's all. >> So >> > far, I've tracked it down to being an XFS issue. >> >> The speculative block reservations have been an issue for awhile. You >> are not >> the first person to take issue with it. >> >> > Regardless, is there a version history for XFS vis-a-via mainline Linux? >> >> You can find a full version history for XFS back to 2.6.12 or so here: >> http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs.git;a=summary >> >> If you're interested in going older than that look here: >> http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=summary >> >> The function you'll most likely want to track is xfs_iomap_write_delay, >> which >> calls xfs_iomap_eof_want_preallocate, both of which are in >> fs/xfs/xfs_iomap.c. >> >> Recently Brian Foster added a scanner to remove the speculative block >> reservations on a timer which may give you some relief. See >> xfs_queue_eofblocks in fs/xfs/xfs_icache.c >> >> Regards, >> Ben >> >> > [-- Attachment #1.2: Type: text/html, Size: 2994 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 21:11 ` Jason Rosenberg @ 2013-07-26 21:42 ` Ben Myers 2013-07-27 1:30 ` Dave Chinner 1 sibling, 0 replies; 16+ messages in thread From: Ben Myers @ 2013-07-26 21:42 UTC (permalink / raw) To: Jason Rosenberg; +Cc: stan, xfs Hi Jason, On Fri, Jul 26, 2013 at 05:11:55PM -0400, Jason Rosenberg wrote: > Is it safe to say that speculative preallocation will not be used if a file > is opened read-only? The blocks will only be reserved on an appending write. > It turns out that the kafka server does indeed write lots of log files, and > rotate them after they reach a max size, but never closes the files until > the app exits, or until it deletes the files. This is because it needs to > make them available for reading, etc. So, an obvious change for kafka > might be to close each log file after rotating, and then re-open it > read-only for consumers of the data. Does that sound like a solution that > would pro-actively release pre-allocated storage? An interesting idea, and I'm not quite sure. The blocks past EOF are freed in xfs_release on close in some circumstances, and it looks like you have a chance to call xfs_free_eofblocks (at least in the most uptodate codebase) if you did not use explicit preallocation (e.g. fallocate or an xfs ioctl) and did not open it append-only. You could reopen with read-write flags and it wouldn't make a difference vs read-only, so long as you don't do an appending write. Seems like it's worth a try. Another possibility is to look into what would happen if you do a truncate up to i_size when you're ready to stop appending to the file. I haven't checked that out though. Regards, Ben _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 21:11 ` Jason Rosenberg 2013-07-26 21:42 ` Ben Myers @ 2013-07-27 1:30 ` Dave Chinner 2013-07-28 2:19 ` Jason Rosenberg 1 sibling, 1 reply; 16+ messages in thread From: Dave Chinner @ 2013-07-27 1:30 UTC (permalink / raw) To: Jason Rosenberg; +Cc: Ben Myers, stan, xfs On Fri, Jul 26, 2013 at 05:11:55PM -0400, Jason Rosenberg wrote: > Is it safe to say that speculative preallocation will not be used if a file > is opened read-only? > > It turns out that the kafka server does indeed write lots of log files, and > rotate them after they reach a max size, but never closes the files until > the app exits, or until it deletes the files. This is because it needs to > make them available for reading, etc. So, an obvious change for kafka > might be to close each log file after rotating, and then re-open it > read-only for consumers of the data. Does that sound like a solution that > would pro-actively release pre-allocated storage? No need - the mainline code that has a periodic background scan that stops buildup of unused prealocation. i.e. if the file is clean for 5 minutes, then the prealloc will be removed. Hence it doesn't matter what the application does with it - if it holds it open and doesn't write to the file, then the prealloc will get removed. More will be added the next time the file is written, but until then it won't use excessive space. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-27 1:30 ` Dave Chinner @ 2013-07-28 2:19 ` Jason Rosenberg 2013-07-29 0:04 ` Dave Chinner 0 siblings, 1 reply; 16+ messages in thread From: Jason Rosenberg @ 2013-07-28 2:19 UTC (permalink / raw) To: Dave Chinner; +Cc: Ben Myers, stan, xfs [-- Attachment #1.1: Type: text/plain, Size: 1948 bytes --] Thanks Dave, The automatic prealloc removal, if no new writes after 5 minutes, sounds perfect for my use case. But realistically, I'm not likely to get our org to push/find an os update just for this purpose too easily. So, in the meantime, the question remains, assuming I have the version I have currently (dynamic preallocation, persists indefinitely until the file is closed/app quits, etc.), will this idea work (e.g. close the file after writing, then re-open read-only?). Currently, the app does keep the files open indefinitely long after writing has stopped, and this is of course resulting in the preallocation persisting indefinitely. Jason On Fri, Jul 26, 2013 at 9:30 PM, Dave Chinner <david@fromorbit.com> wrote: > On Fri, Jul 26, 2013 at 05:11:55PM -0400, Jason Rosenberg wrote: > > Is it safe to say that speculative preallocation will not be used if a > file > > is opened read-only? > > > > It turns out that the kafka server does indeed write lots of log files, > and > > rotate them after they reach a max size, but never closes the files until > > the app exits, or until it deletes the files. This is because it needs > to > > make them available for reading, etc. So, an obvious change for kafka > > might be to close each log file after rotating, and then re-open it > > read-only for consumers of the data. Does that sound like a solution > that > > would pro-actively release pre-allocated storage? > > No need - the mainline code that has a periodic background scan that > stops buildup of unused prealocation. i.e. if the file is clean for > 5 minutes, then the prealloc will be removed. Hence it doesn't > matter what the application does with it - if it holds it open and > doesn't write to the file, then the prealloc will get removed. More > will be added the next time the file is written, but until then it > won't use excessive space. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > [-- Attachment #1.2: Type: text/html, Size: 2573 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-28 2:19 ` Jason Rosenberg @ 2013-07-29 0:04 ` Dave Chinner 0 siblings, 0 replies; 16+ messages in thread From: Dave Chinner @ 2013-07-29 0:04 UTC (permalink / raw) To: Jason Rosenberg; +Cc: Ben Myers, stan, xfs On Sat, Jul 27, 2013 at 10:19:17PM -0400, Jason Rosenberg wrote: > Thanks Dave, > > The automatic prealloc removal, if no new writes after 5 minutes, sounds > perfect for my use case. But realistically, I'm not likely to get our org > to push/find an os update just for this purpose too easily. > > So, in the meantime, the question remains, assuming I have the version I > have currently (dynamic preallocation, persists indefinitely until the file > is closed/app quits, etc.), will this idea work (e.g. close the file after > writing, then re-open read-only?). Maybe, maybe not. depends on whether the "don't remove prealloc" heuristic was triggered at any time.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 20:38 ` Jason Rosenberg 2013-07-26 20:50 ` Ben Myers @ 2013-07-26 21:45 ` Eric Sandeen 1 sibling, 0 replies; 16+ messages in thread From: Eric Sandeen @ 2013-07-26 21:45 UTC (permalink / raw) To: Jason Rosenberg; +Cc: stan, xfs On 7/26/13 3:38 PM, Jason Rosenberg wrote: > Hi Stan, > > Thanks for the info (most of it was, in fact, news to me). I'm an > application developer trying to debug a disk space problem, that's > all. So far, I've tracked it down to being an XFS issue. > > So you are saying there's no public information that can correlate > XFS versioning to CentOS (or RHEL) versioning? > > Sad state of affairs. > > If anyone can volunteer this info (if available to you) I'd be much > appreciative. > > Regardless, is there a version history for XFS vis-a-via mainline > Linux? There is no exact version history per se, ie. no "XFS version 2.51" Instead, the best point of reference upstream is the kernel release number, i.e. kernel 2.6.32, kernel 3.2, etc. Ben pointed you at changelogs for that, which you can peruse... Once you get into RHEL, you're into a land of backports - originally 2.6.32, but various & sundry updates & backports, to the point where it is a bit of a special snowflake, based on the requirements of RHEL customers and the RHEL maintainers (who, incidentally, are also major upstream XFS developers). Your best bet for a distro kernel is to look at i.e. the kernel RPM changelog to see what's been going on. But you won't be able to correlate that exactly with any one upstream version, unless maybe you see a "rebase $SUBSYSTEM to $KERNEL_VERSION" code type changelog. Back to your original problem, you may find that just setting a fixed allocsize as a mount option has more pros than cons for your usecase. HTH, -Eric > Thanks, > > Jason > > > On Fri, Jul 26, 2013 at 3:27 PM, Stan Hoeppner <stan@hardwarefreak.com <mailto:stan@hardwarefreak.com>> wrote: > > On 7/26/2013 12:40 PM, Jason Rosenberg wrote: > > > Anyway, I'm surprised that you don't have some list or other way to > > correlate version history of XFS, with os release versions. I'm guessing > > the version I have is not using the latest/greatest. We actually have > > another system that uses an older version of the kernel (2.6.32-279), and > > 2.6.32-279 - this is not a mainline kernel version. This is a Red Hat > specific string describing their internal kernel release. It has zero > correlation to any version number of anything else in the world of > mainline Linux. > > > If, say you tell me, the mainline xfs code has improved behavior, it would > > be nice to have a way to know which version of CentOS might include that? > > IMNSHO, CentOS is a free proprietary chrome plated dog turd. It's > flashy on the outside and claims it's "ENTERPRISE", "just like RHEL!". > Then you crack it open and find nothing but crap inside. So you take it > back to the store that gave it away for free and the doors are barred, > the place out of business. The chrome has peeled off and you're stuck > with crap that difficult to use. Every time you touch it you get dirty > in some fashion. > > RHEL is a proprietary solid chrome turd you pay for. You can't get to > the inside, but if you find a scratch and 'return' it Red Hat will say > "we can help you fix that". > > If you avoid the flashy turds altogether while still plunking down no > cash, and use a distro based entirely on mainline Linux and GNU user > space source, you can get help directly from the folks who wrote the > code you're running because they know what is where. Whether it be > Linux proper, the XFS subsystem, NFS, Samba, Postix, etc. Such > distributions are too numerous to mention. None of them are chrome > plated, none claim to be "just like ENTERPRISE distro X". I tell all > users of RHEL knock offs every time I see a situation like this: > > Either pay for and receive the support that's required for the > proprietary distribution you're running, or use a completely open source > distro based on mainline kernel source and GNU user space. By using a > RHEL knock off, you're simply locking yourself into an outdated > proprietary code base for which there is no viable support option, > because so few people in the community understand the packaging of the > constituent parts of the RHEL kernels. This is entirely intentional on > the part of Red Hat, specifically to make the life of CentOS users > painful, and rightfully so. > > FYI, many of the folks on the XFS list are Red Hat employees, including > Dave. They'll gladly assist RHEL customers here if needed. However, to > support CentOS users, especially in your situation, they'd have to use > Red Hat Inc resources to hunt down the information related to the CentOS > kernel you have that correlates to the RHEL kernel it is copied from. > So they've just consumed Red Hat Inc resources to directly assist a free > competitor who copied their distro. > > Thus there's not much incentive to assist CentOS users as they'd in > essence be taking money out of their employer's pocket. Taken to the > extreme this results in pay cuts, possibly furloughs or pink slips, etc. > > Surely this can't be the first time you've run into a free community > support issue with the CentOS kernel. Surely what I've written isn't > news to you. Pay Red Hat for RHEL, or switch to Debian, Ubuntu, Open > Suse, etc. Either way you'll be able to get much better support. > > -- > Stan > > > > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 19:27 ` Stan Hoeppner 2013-07-26 20:38 ` Jason Rosenberg @ 2013-07-27 4:26 ` Keith Keller 1 sibling, 0 replies; 16+ messages in thread From: Keith Keller @ 2013-07-27 4:26 UTC (permalink / raw) To: linux-xfs On 2013-07-26, Stan Hoeppner <stan@hardwarefreak.com> wrote: > Either pay for and receive the support that's required for the > proprietary distribution you're running, or use a completely open source > distro based on mainline kernel source and GNU user space. Couldn't you simply run a mainline kernel on CentOS? elrepo supports both the mainline kernel and the longterm kernel. http://elrepo.org/tiki/kernel-ml http://elrepo.org/tiki/kernel-lt It's not perfect, of course, but at least it's closer to what the XFS developers (RH employees and others) are working on. --keith -- kkeller@wombat.san-francisco.ca.us _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: understanding speculative preallocation 2013-07-26 17:40 ` Jason Rosenberg 2013-07-26 19:27 ` Stan Hoeppner @ 2013-07-27 1:26 ` Dave Chinner 1 sibling, 0 replies; 16+ messages in thread From: Dave Chinner @ 2013-07-27 1:26 UTC (permalink / raw) To: Jason Rosenberg; +Cc: xfs On Fri, Jul 26, 2013 at 01:40:16PM -0400, Jason Rosenberg wrote: > Hi Dave, > > Thanks for your responses. I'm a bit confused, as I didn't see your > responses on the actual forum (only in my email inbox). I'm replying from the list ;) [ If you have a dup filter like I do, then I only get one copy of anything that is sent to me when there are multiple cc's. My procmail rules determine where it gets stored.... ] > Anyway, I'm surprised that you don't have some list or other way to > correlate version history of XFS, with os release versions. Of course we do. > I'm guessing > the version I have is not using the latest/greatest. We actually have > another system that uses an older version of the kernel (2.6.32-279), and > it behaves differently (it still preallocates space beyond what will ever > be used, but not by quite as much). When we rolled out our newer machines > to 2.6.32-358, we started seeing a marked increase in disk full problems. Disclaimer: I'm the primary RHEL XFS developer, employed by Red Hat. CentOS is a rebadged RHEL product that is released for free. If you want bugs fixed in CentOS, then generally you are on your own. If you want paid support where people will fix problems you have, you need to pay for RHEL. You get what you pay for. > If, say you tell me, the mainline xfs code has improved behavior, it would > be nice to have a way to know which version of CentOS might include that? Well, that depends on what Red Hat do with RHEL, because CentOS simply rebuild what Red Hat releases. > Telling me to read source code across multiple kernel versions sounds like > an interesting endeavor, but not something that is the most efficient use > of my time, unless there truly is no one who can easily tell me anything > about xfs version history. So, you want me to read it for you instead, then document it for you? i.e. spend a day not fixing bugs and developing new code to document the history of something that you can find out by looking in git and rpms yourself? Unless you are offering to pay someone to do it, I don't see it happening... Open source != free lunch. > Do you have any plans to have some sort of improved documentation story > around this? No. > This speculative preallocation behavior is truly unexpected > and not transparent to the user. Which is why we've been fixing it. The problems you are reporting are already fixed in the mainline kernel. > I can see that it's probably a great > performance boost (especially for something like kafka), but kafka does > have predictable log file rotation capped at fixed sizes, so it would be > great if that could be factored in. Mainline already does handle this, in a much more generic manner. If any file with speculative prealloc beyond EOF remains clean for 5 minutes, then the specualtive prealloc is removed. Hence 5 minutes after you log file is rotated, it will have the excess space removed. > I suppose using the allocsize setting might work in the short term. But I > probably don't want to set allocsize to 1Gb, since that would mean every > single file created would start with that size, is that right? Does the > allocsize setting basically work by always keeping the file size ahead of > consumed space by the allocsize amount? Effectively. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-07-29 0:04 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-07-26 7:35 understanding speculative preallocation jbr -- strict thread matches above, loose matches on Subject: below -- 2013-07-26 7:23 jbr 2013-07-26 11:50 ` Dave Chinner 2013-07-26 17:40 ` Jason Rosenberg 2013-07-26 19:27 ` Stan Hoeppner 2013-07-26 20:38 ` Jason Rosenberg 2013-07-26 20:50 ` Ben Myers 2013-07-26 21:04 ` Jason Rosenberg 2013-07-26 21:11 ` Jason Rosenberg 2013-07-26 21:42 ` Ben Myers 2013-07-27 1:30 ` Dave Chinner 2013-07-28 2:19 ` Jason Rosenberg 2013-07-29 0:04 ` Dave Chinner 2013-07-26 21:45 ` Eric Sandeen 2013-07-27 4:26 ` Keith Keller 2013-07-27 1:26 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox