* [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns
@ 2013-07-22 17:28 Jiri Popelka
2013-07-22 17:46 ` Till Kamppeter
[not found] ` <51ED6CC4.1040501@redhat.com>
0 siblings, 2 replies; 18+ messages in thread
From: Jiri Popelka @ 2013-07-22 17:28 UTC (permalink / raw)
To: Michael Sweet; +Cc: printing-architecture, sandeen, sbergman27
Hello,
When cupsd lives on a filesystem with delayed allocation, like ext4 and
it experience an unclean shutdown under heavy load, its printers.conf
very often ends up being truncated to zero.
Even original report (https://bugzilla.redhat.com/show_bug.cgi?id=984883)
has been against cups-1.4.2 I've seen no reason to think this has been
fixed in recent versions.
I see cupsd since 1.5 (due to STR #3715) has been more carefully
creating and removing conf files,
but that doesn't seem to be sufficient.
Especially updating of printers.conf probably needs some sort of
synchronization data to disk.
I have a patch (in comment #11), which makes cupsd read backup
filename.O file if filename is truncated to zero, but that's only a
work-around
and I'm afraid it won't work if the file gets updated couple times
between last sync and the unexpected shutdown.
I've promised interested parties to ask you publicly
and since bug tracker @ cups.org is still down ... here I am ;-)
With regards,
Jiri
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns 2013-07-22 17:28 [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns Jiri Popelka @ 2013-07-22 17:46 ` Till Kamppeter [not found] ` <51EDADB4.5000807@redhat.com> [not found] ` <51ED6CC4.1040501@redhat.com> 1 sibling, 1 reply; 18+ messages in thread From: Till Kamppeter @ 2013-07-22 17:46 UTC (permalink / raw) To: Jiri Popelka; +Cc: printing-architecture, sandeen, sbergman27 On 07/22/2013 07:28 PM, Jiri Popelka wrote: > Hello, > > When cupsd lives on a filesystem with delayed allocation, like ext4 and > it experience an unclean shutdown under heavy load, its printers.conf > very often ends up being truncated to zero. > I have seen some Ubuntu bug reports showing this, too, but never had an idea how this can happen. Jiri, thank you for finding this out. Principally, this should be a bug of ext4, as the idea of a journaling file system is to avoid data loss on an unexpected shutdown. Fixes in CUPS are more workarounds for the shortcomings/bugs of ext4. > Even original report (https://bugzilla.redhat.com/show_bug.cgi?id=984883) > has been against cups-1.4.2 I've seen no reason to think this has been > fixed in recent versions. > I see cupsd since 1.5 (due to STR #3715) has been more carefully > creating and removing conf files, > but that doesn't seem to be sufficient. > Especially updating of printers.conf probably needs some sort of > synchronization data to disk. > > I have a patch (in comment #11), which makes cupsd read backup > filename.O file if filename is truncated to zero, but that's only a > work-around > and I'm afraid it won't work if the file gets updated couple times > between last sync and the unexpected shutdown. > As CUPS is modifying printers.conf rather frequently, it is perhaps worth to have always two copies of the most recent printers.conf and to let CUPS modifying them one after the other to not lose the full file but never more than the last modification. Till ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <51EDADB4.5000807@redhat.com>]
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns [not found] ` <51EDADB4.5000807@redhat.com> @ 2013-07-23 3:03 ` Michael Sweet 0 siblings, 0 replies; 18+ messages in thread From: Michael Sweet @ 2013-07-23 3:03 UTC (permalink / raw) To: Eric Sandeen; +Cc: printing-architecture, sbergman27, Till Kamppeter Eric, On 2013-07-22, at 6:09 PM, Eric Sandeen <sandeen@redhat.com> wrote: > ... > It might be nice to have queue state (which is pretty disposable?) in a > file separate from queue definitions (which are not disposable, and > rarely modified?). But then I don't know much about cups internals. One of the things I'm looking at for future versions of cupsd is separation of configuration from state information. _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <51ED6CC4.1040501@redhat.com>]
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns [not found] ` <51ED6CC4.1040501@redhat.com> @ 2013-07-23 3:00 ` Michael Sweet 2013-07-23 9:08 ` Tim Waugh [not found] ` <51EE1183.5080704@redhat.com> 0 siblings, 2 replies; 18+ messages in thread From: Michael Sweet @ 2013-07-23 3:00 UTC (permalink / raw) To: Eric Sandeen; +Cc: printing-architecture, sbergman27 Eric, On 2013-07-22, at 1:32 PM, Eric Sandeen <sandeen@redhat.com> wrote: > .. > Any time an application does buffered IO, and requires it to be there even if we crash, it needs to take steps to persist that data on disk. See also http://lwn.net/Articles/457667/ The problem with fsync() is that it is a blocking API. Blocking cupsd (single-threaded daemon process) is *not* a good idea. Future versions of cupsd will be multi-threaded where we can look at calling fsync or fcntl(fd, F_FULLSYNC), but adding fsync calls now will just cause more problems that it will solve. _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns 2013-07-23 3:00 ` Michael Sweet @ 2013-07-23 9:08 ` Tim Waugh 2013-07-23 12:08 ` Michael Sweet [not found] ` <51EE1183.5080704@redhat.com> 1 sibling, 1 reply; 18+ messages in thread From: Tim Waugh @ 2013-07-23 9:08 UTC (permalink / raw) To: Michael Sweet; +Cc: printing-architecture, Eric Sandeen, sbergman27 [-- Attachment #1: Type: text/plain, Size: 445 bytes --] On Mon, 2013-07-22 at 23:00 -0400, Michael Sweet wrote: > The problem with fsync() is that it is a blocking API. Blocking cupsd > (single-threaded daemon process) is *not* a good idea. Aren't changes to printers.conf deferred anyway, to allow changes to be batched up? So it wouldn't be an fsync call every time the values change, but every time those values are written back -- which is much less often on a busy system. Tim. */ [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 482 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns 2013-07-23 9:08 ` Tim Waugh @ 2013-07-23 12:08 ` Michael Sweet [not found] ` <51EE90D7.9070607@gmail.com> 0 siblings, 1 reply; 18+ messages in thread From: Michael Sweet @ 2013-07-23 12:08 UTC (permalink / raw) To: Tim Waugh; +Cc: printing-architecture, Eric Sandeen, sbergman27 Tim, On 2013-07-23, at 5:08 AM, Tim Waugh <twaugh@redhat.com> wrote: > On Mon, 2013-07-22 at 23:00 -0400, Michael Sweet wrote: >> The problem with fsync() is that it is a blocking API. Blocking cupsd >> (single-threaded daemon process) is *not* a good idea. > > Aren't changes to printers.conf deferred anyway, to allow changes to be > batched up? So it wouldn't be an fsync call every time the values > change, but every time those values are written back -- which is much > less often on a busy system. They are batched up, but blocking every 30 seconds is still going to cause support calls about cupsd being unresponsive, and (I hope) more calls than "I lost my printer configuration because my system crashed" (which I really hope is not common; who wants to depend on flakey hardware?) _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <51EE90D7.9070607@gmail.com>]
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns [not found] ` <51EE90D7.9070607@gmail.com> @ 2013-07-23 16:40 ` Michael Sweet [not found] ` <51EEBF83.2090006@gmail.com> 0 siblings, 1 reply; 18+ messages in thread From: Michael Sweet @ 2013-07-23 16:40 UTC (permalink / raw) To: Steve Bergman; +Cc: Eric Sandeen, printing-architecture Steve, On 2013-07-23, at 10:19 AM, Steve Bergman <sbergman27@gmail.com> wrote: > ... > There is a whole universe of use-cases out there which lies outside of "pristine datacenter" that I'm ever-surprised to see maintainers of various packages acting as though they were oblivious to. (No offense intended; CUPS is far from alone in this particular folly.) > [END SOAPBOX] With due respect, having worked on everything from bare bones 8-bit controllers through high-end multi-CPU/core servers/workstations, I am well aware of such "real world" situations and the (often unrealistic) expectations involved. > Obviously, we need to do some real life bench-marking, looking at the effect of proper fsyncs of printers.conf on performance. Such benchmarks have already (but unintentionally) been done with similar-sized files on live OS X systems in the field. The result of calling fsync caused delays of several seconds in all applications for several million affected users. > I have a suitable real life retail server that handles receipt printing and display poles for 9 to 15 retail stores, depending upon the time of year. This time of year, I see about 2000 print jobs per hour during business hours. I'll patch the CUPS in RHEL6.4 based upon Eric's original suggestion in the bug report and see how it performs. (If anyone would prefer that I use another patch, I'd be happy to do that, too.) Good luck. From our own experience, calling fsync on a file on a network file system resulted in sporadic, long-duration delays (up to 60 seconds in our testing) while local spinning disks tended to be in the 1-2 second range. SSDs actually had slightly longer delays depending on how much data was buffered up... ... From the perspective of what is coming for future versions of CUPS, we *are* moving to a multi-threaded implementation to eliminate any blocking issues that prevent users from spooling jobs or getting capabilities or status. We will also be separating state from configuration so that the configuration files are updated less often than the status files, and the loss of the status files can be easily recovered from. _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <51EEBF83.2090006@gmail.com>]
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns [not found] ` <51EEBF83.2090006@gmail.com> @ 2013-07-23 18:16 ` Michael Sweet [not found] ` <51EECB0D.5050208@redhat.com> [not found] ` <51EECDF8.9070209@gmail.com> 0 siblings, 2 replies; 18+ messages in thread From: Michael Sweet @ 2013-07-23 18:16 UTC (permalink / raw) To: Steve Bergman; +Cc: Eric Sandeen, printing-architecture Steve, On 2013-07-23, at 1:38 PM, Steve Bergman <sbergman27@gmail.com> wrote: > On 07/23/2013 11:40 AM, Michael Sweet wrote: >> With due respect, having worked on everything from bare bones 8-bit controllers through high-end multi-CPU/core servers/workstations, I am well aware of such "real world" situations and the (often unrealistic) expectations involved. > > There is nothing "unrealistic" about expecting not to lose configuration info for 112 printers the majority of times that a server shuts down uncleanly. That's something I was able to take for granted for over a decade. Punting the issue by saying it's an "unrealistic expectation" is not very convincing. In this case the unrealistic expectation is that if you pull the plug on a running system that you won't lose any data. Systems that offer that level of assurance are designed accordingly and generally include some form of backup power that will allow pending writes to complete. Even if we added fsync calls in cupsd, that still doesn't guarantee that the data will actually get on disk/flash. For that we need to sync() and do some lower-level stuff to ensure that the hardware is told to write the data immediately, and then wait to hear back that the data has been written. So, set your expectations accordingly - fsync might not fix this issue for you. >> Good luck. From our own experience, calling fsync on a file on a network file system resulted in sporadic, long-duration delays (up to 60 seconds in our testing) while local spinning disks tended to be in the 1-2 second range. SSDs actually had slightly longer delays depending on how much data was buffered up... > > I'm taking that to mean that you're not really interested in what my results show, and that you consider this to be a RH problem and not a general CUPS problem. So I'll be posting the results to the RH Bugzilla ticket. Take this to mean that we have operational experience that makes us both leery of enabling fsync and recognize that fsync isn't a magic bullet for the problem you are experiencing. > FWIW, PostgreSQL makes the fsync a configurable option. Not fsync'ing by default is, in my opinion irresponsible of cupsd. Your opinion might be different if you were one of the millions affected by the fsync issues we previously had. > An option should at least be provided to allow responsible administrators to tell it to handle its configuration file updates properly. (Yes, I understand that you probably try to keep the number of options down. But this is a very basic and important issue.) I don't agree with your wording, but as I mentioned in a separate message I have filed a bug to track adding a SyncOnClose directive in 1.7.0. _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <51EECB0D.5050208@redhat.com>]
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns [not found] ` <51EECB0D.5050208@redhat.com> @ 2013-07-23 18:59 ` Michael Sweet 0 siblings, 0 replies; 18+ messages in thread From: Michael Sweet @ 2013-07-23 18:59 UTC (permalink / raw) To: Eric Sandeen; +Cc: Steve Bergman, printing-architecture Eric, On 2013-07-23, at 2:27 PM, Eric Sandeen <sandeen@redhat.com> wrote: > ... > Data persistence does have performance implications, so I can't speak > directly to your other experiences where it was problematic; there are > techniques which can be used to minimize the overhead and/or minimize > the need for data persistence in the first place. But avoiding it > altogether, with the end result of lost configurations, is IMHO > undesirable - so although I think application-critical data should > be persisted in a way that allows graceful recovery from unforeseen > events, I'm glad that you're at least considering a mechanism to > enable it. The thing is, we *used* to fsync and now we don't because of the performance issues. We added the .O/.N dance to help reduce the likelihood of lost config files. Like I said, I am not convinced that fsync will solve this issue, but will get the new directive in there so at least people can try it out on a case-by-case basis. Longer term we should be able to enable it by default again once cupsd itself won't block on it... _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <51EECDF8.9070209@gmail.com>]
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns [not found] ` <51EECDF8.9070209@gmail.com> @ 2013-07-23 19:01 ` Michael Sweet 2013-07-23 20:51 ` Till Kamppeter 2013-07-26 21:36 ` Michael Sweet 0 siblings, 2 replies; 18+ messages in thread From: Michael Sweet @ 2013-07-23 19:01 UTC (permalink / raw) To: Steve Bergman; +Cc: Eric Sandeen, printing-architecture I can make sure that the changes apply cleanly to 1.6.x and are included in the final 1.6.x release (1.6.4). Beyond that, yes, you'll need to coordinate with your distro of choice or edit and compile yourself. On 2013-07-23, at 2:39 PM, Steve Bergman <sbergman27@gmail.com> wrote: > On 07/23/2013 01:16 PM, Michael Sweet wrote: >> In this case the unrealistic expectation is that if you pull the plug on a running system that you won't lose any data. Systems that offer that level of assurance are designed accordingly and generally include some form of backup power that will allow pending writes to complete. > > That flies in the face of the 12 years of experience I have with the same workloads running under ext3. And running under ext4/delalloc or xfs, cups should be able to do even better for both performance and reliability if it does things right. > > But that's all just spilled milk over the dam now. Most all current Unix/Linux filesystems now require fsync in order to provide sane integrity guarantees. That's just a fact of life that we've all got to live with. If offering a fix in the next major CUPS release is the best you're willing to do, then it is, at least, better than nothing at all. Thank you, at least, for that much. > > This moves the focus to convincing the CUPS package maintainers of existing distros to patch their pre-1.7 versions with the necessary functionality. Which is kind of where I started. So back to the RH Bugzilla thread... > > -Steve _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns 2013-07-23 19:01 ` Michael Sweet @ 2013-07-23 20:51 ` Till Kamppeter 2013-07-23 23:04 ` Michael R Sweet 2013-07-24 10:11 ` Jiri Popelka 2013-07-26 21:36 ` Michael Sweet 1 sibling, 2 replies; 18+ messages in thread From: Till Kamppeter @ 2013-07-23 20:51 UTC (permalink / raw) To: Michael Sweet; +Cc: printing-architecture, Steve Bergman, Eric Sandeen As a measure of automatic recovery one could perhaps let the startup script of CUPS check whether printers.conf is zero length and if so, copy printers.conf.O to printers.conf. And if printers.conf.N exists and is of non-zero length one could copy printers.conf.N to printers.conf and only after that start the scheduler. Till On 07/23/2013 09:01 PM, Michael Sweet wrote: > I can make sure that the changes apply cleanly to 1.6.x and are included in the final 1.6.x release (1.6.4). Beyond that, yes, you'll need to coordinate with your distro of choice or edit and compile yourself. > > > On 2013-07-23, at 2:39 PM, Steve Bergman <sbergman27@gmail.com> wrote: > >> On 07/23/2013 01:16 PM, Michael Sweet wrote: >>> In this case the unrealistic expectation is that if you pull the plug on a running system that you won't lose any data. Systems that offer that level of assurance are designed accordingly and generally include some form of backup power that will allow pending writes to complete. >> >> That flies in the face of the 12 years of experience I have with the same workloads running under ext3. And running under ext4/delalloc or xfs, cups should be able to do even better for both performance and reliability if it does things right. >> >> But that's all just spilled milk over the dam now. Most all current Unix/Linux filesystems now require fsync in order to provide sane integrity guarantees. That's just a fact of life that we've all got to live with. If offering a fix in the next major CUPS release is the best you're willing to do, then it is, at least, better than nothing at all. Thank you, at least, for that much. >> >> This moves the focus to convincing the CUPS package maintainers of existing distros to patch their pre-1.7 versions with the necessary functionality. Which is kind of where I started. So back to the RH Bugzilla thread... >> >> -Steve > > _________________________________________________________ > Michael Sweet, Senior Printing System Engineer, PWG Chair > > _______________________________________________ > Printing-architecture mailing list > Printing-architecture@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/printing-architecture > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns 2013-07-23 20:51 ` Till Kamppeter @ 2013-07-23 23:04 ` Michael R Sweet 2013-07-24 10:11 ` Jiri Popelka 1 sibling, 0 replies; 18+ messages in thread From: Michael R Sweet @ 2013-07-23 23:04 UTC (permalink / raw) To: Till Kamppeter Cc: printing-architecture@lists.linux-foundation.org, Steve Bergman, Eric Sandeen We might use the .O file but never the .N file - it is likely incomplete. Sent from my iPhone On 2013-07-23, at 4:51 PM, Till Kamppeter <till.kamppeter@gmail.com> wrote: > As a measure of automatic recovery one could perhaps let the startup > script of CUPS check whether printers.conf is zero length and if so, > copy printers.conf.O to printers.conf. And if printers.conf.N exists and > is of non-zero length one could copy printers.conf.N to printers.conf > and only after that start the scheduler. > > Till > > On 07/23/2013 09:01 PM, Michael Sweet wrote: >> I can make sure that the changes apply cleanly to 1.6.x and are included in the final 1.6.x release (1.6.4). Beyond that, yes, you'll need to coordinate with your distro of choice or edit and compile yourself. >> >> >> On 2013-07-23, at 2:39 PM, Steve Bergman <sbergman27@gmail.com> wrote: >> >>> On 07/23/2013 01:16 PM, Michael Sweet wrote: >>>> In this case the unrealistic expectation is that if you pull the plug on a running system that you won't lose any data. Systems that offer that level of assurance are designed accordingly and generally include some form of backup power that will allow pending writes to complete. >>> >>> That flies in the face of the 12 years of experience I have with the same workloads running under ext3. And running under ext4/delalloc or xfs, cups should be able to do even better for both performance and reliability if it does things right. >>> >>> But that's all just spilled milk over the dam now. Most all current Unix/Linux filesystems now require fsync in order to provide sane integrity guarantees. That's just a fact of life that we've all got to live with. If offering a fix in the next major CUPS release is the best you're willing to do, then it is, at least, better than nothing at all. Thank you, at least, for that much. >>> >>> This moves the focus to convincing the CUPS package maintainers of existing distros to patch their pre-1.7 versions with the necessary functionality. Which is kind of where I started. So back to the RH Bugzilla thread... >>> >>> -Steve >> >> _________________________________________________________ >> Michael Sweet, Senior Printing System Engineer, PWG Chair >> >> _______________________________________________ >> Printing-architecture mailing list >> Printing-architecture@lists.linux-foundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/printing-architecture > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns 2013-07-23 20:51 ` Till Kamppeter 2013-07-23 23:04 ` Michael R Sweet @ 2013-07-24 10:11 ` Jiri Popelka 2013-07-24 11:39 ` Michael Sweet 1 sibling, 1 reply; 18+ messages in thread From: Jiri Popelka @ 2013-07-24 10:11 UTC (permalink / raw) To: Till Kamppeter; +Cc: printing-architecture, Steve Bergman, Eric Sandeen As I mentioned in the initial mail, I've had a patch https://bugzilla.redhat.com/attachment.cgi?id=776945 which makes cupsdOpenConfFile() read filename.O in case filename is truncated to zero. Previously the filename.O was read only if filename had not existed. Wouldn't that be an alternative ? -- Jiri On 07/23/2013 10:51 PM, Till Kamppeter wrote: > As a measure of automatic recovery one could perhaps let the startup > script of CUPS check whether printers.conf is zero length and if so, > copy printers.conf.O to printers.conf. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns 2013-07-24 10:11 ` Jiri Popelka @ 2013-07-24 11:39 ` Michael Sweet 0 siblings, 0 replies; 18+ messages in thread From: Michael Sweet @ 2013-07-24 11:39 UTC (permalink / raw) To: Jiri Popelka Cc: printing-architecture, Steve Bergman, Eric Sandeen, Till Kamppeter Jiri, Yes, that would be the alternative, although we need to be careful - delete the last printer and then do a clean shutdown. If we don't do anything about it, starting up cupsd might load a previous printers.conf file with old printer definitions but without the associates PPDs so you'd have a bunch of "zombie" raw queues... Assuming that calling fsync will fix the truncated file issue on Linux (at least), let's focus on *that* and see where it takes us. And just to make sure this problem goes away for good in the future, I've filed the following bug to track changes in cupsd to how we persist printer/class configuration: <rdar://problem/14532427> cups.org: CUPS 2.0 - separate configuration from state and persist configuration across crashes My goal (and I've added corresponding comments to the bug) is to move away from the monolithic printers.conf and classes.conf files to separate configuration and state files. Not only will this minimize the likelihood of a crash causing complete loss of all print queues, but it will also allow us the luxury of using fsync (and whatever other platform-specific incantation) to ensure that the infrequently-changing configuration bits are carefully persisted. I haven't made any specific implementation decisions yet (do we use a database engine like SQLite, separate files per queue, file versioning, etc.) but I *do* want to speed up cupsd startup with large numbers of queues and address transient queues for IPP Everywhere/AirPrint. On 2013-07-24, at 6:11 AM, Jiri Popelka <jpopelka@redhat.com> wrote: > As I mentioned in the initial mail, I've had a patch > https://bugzilla.redhat.com/attachment.cgi?id=776945 > which makes cupsdOpenConfFile() read filename.O in case filename is truncated to zero. > Previously the filename.O was read only if filename had not existed. > Wouldn't that be an alternative ? > > -- > Jiri > > On 07/23/2013 10:51 PM, Till Kamppeter wrote: >> As a measure of automatic recovery one could perhaps let the startup >> script of CUPS check whether printers.conf is zero length and if so, >> copy printers.conf.O to printers.conf. > _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns 2013-07-23 19:01 ` Michael Sweet 2013-07-23 20:51 ` Till Kamppeter @ 2013-07-26 21:36 ` Michael Sweet [not found] ` <51F2F05E.3000207@redhat.com> 1 sibling, 1 reply; 18+ messages in thread From: Michael Sweet @ 2013-07-26 21:36 UTC (permalink / raw) To: Steve Bergman Cc: Eric Sandeen, printing-architecture@lists.linux-foundation.org [-- Attachment #1: Type: text/plain, Size: 100 bytes --] All, Here is the proposed change that adds a "SyncOnClose" directive to the cups-files.conf file: [-- Attachment #2: rdar14523043.patch --] [-- Type: application/octet-stream, Size: 4639 bytes --] Index: conf/cups-files.conf.in =================================================================== --- conf/cups-files.conf.in (revision 11192) +++ conf/cups-files.conf.in (working copy) @@ -8,6 +8,9 @@ # List of events that are considered fatal errors for the scheduler... #FatalErrors @CUPS_FATAL_ERRORS@ +# Do we call fsync() after writing configuration or status files? +#SyncOnClose No + # Default user and group for filters/backends/helper programs; this cannot be # any user or group that resolves to ID 0 for security reasons... #User @CUPS_USER@ Index: doc/help/ref-cups-files-conf.html.in =================================================================== --- doc/help/ref-cups-files-conf.html.in (revision 11192) +++ doc/help/ref-cups-files-conf.html.in (working copy) @@ -429,6 +429,31 @@ default server directory is <VAR>/etc/cups</VAR>.</P> +<H2 CLASS="title"><SPAN CLASS="info">CUPS 1.6.4</SPAN><A NAME="SyncOnClose">SyncOnClose</A></H2> + +<H3>Examples</H3> + +<PRE CLASS="command"> +SyncOnClose No +SyncOnClose Yes +</PRE> + +<H3>Description</H3> + +<P>The <CODE>SyncOnClose</CODE> directive determines whether the scheduler +flushes changes to configuration and state files to disk. The default is +<CODE>No</CODE> which relies on the operating system to schedule a suitable +time to write changes to disk.</P> + +<BLOCKQUOTE><B>Note:</B> + +<P>Setting <CODE>SyncOnClose</CODE> to <CODE>Yes</CODE> makes the scheduler use the <CODE>fsync(2)</CODE> system call to write all changes to disk, however the drive or network file system server may still delay writing data to disk. Do not depend on this functionality to prevent data loss in the event of unexpected hardware failure.</P> + +<P>Enabling <CODE>SyncOnClose</CODE> may also cause the scheduler to periodically become unresponsive while it waits for changes to be written.</P> + +</BLOCKQUOTE> + + <H2 CLASS="title"><A NAME="SystemGroup">SystemGroup</A></H2> <H3>Examples</H3> Index: man/cups-files.conf.man.in =================================================================== --- man/cups-files.conf.man.in (revision 11192) +++ man/cups-files.conf.man.in (working copy) @@ -122,6 +122,12 @@ .br Specifies the directory where the server configuration files can be found. .TP 5 +SyncOnClose Yes +.TP 5 +SyncOnClose No +Specifies whether the scheduler calls \fIfsync(2)\fR after writing configuration +or state files. The default is No. +.TP 5 SystemGroup group-name [group-name ...] .br Specifies the group(s) to use for System class authentication. Index: scheduler/conf.c =================================================================== --- scheduler/conf.c (revision 11192) +++ scheduler/conf.c (working copy) @@ -174,6 +174,7 @@ { "ServerRoot", &ServerRoot, CUPSD_VARTYPE_PATHNAME }, { "SMBConfigFile", &SMBConfigFile, CUPSD_VARTYPE_STRING }, { "StateDir", &StateDir, CUPSD_VARTYPE_STRING }, + { "SyncOnClose", &SyncOnClose, CUPSD_VARTYPE_BOOLEAN }, #ifdef HAVE_AUTHORIZATION_H { "SystemGroupAuthKey", &SystemGroupAuthKey, CUPSD_VARTYPE_STRING }, #endif /* HAVE_AUTHORIZATION_H */ @@ -734,6 +735,7 @@ ReloadTimeout = DEFAULT_KEEPALIVE; RootCertDuration = 300; StrictConformance = FALSE; + SyncOnClose = FALSE; Timeout = DEFAULT_TIMEOUT; WebInterface = CUPS_DEFAULT_WEBIF; Index: scheduler/conf.h =================================================================== --- scheduler/conf.h (revision 11192) +++ scheduler/conf.h (working copy) @@ -172,6 +172,8 @@ /* Which errors are fatal? */ StrictConformance VALUE(FALSE), /* Require strict IPP conformance? */ + SyncOnClose VALUE(FALSE), + /* Call fsync() when closing files? */ LogFilePerm VALUE(0644); /* Permissions for log files */ VAR cupsd_loglevel_t LogLevel VALUE(CUPSD_LOG_WARN); Index: scheduler/file.c =================================================================== --- scheduler/file.c (revision 11192) +++ scheduler/file.c (working copy) @@ -109,6 +109,29 @@ /* + * Synchronize changes to disk if SyncOnClose is enabled. + */ + + if (SyncOnClose) + { + if (cupsFileFlush(fp)) + { + cupsdLogMessage(CUPSD_LOG_ERROR, "Unable to write changes to \"%s\": %s", + filename, strerror(errno)); + cupsFileClose(fp); + return (-1); + } + + if (fsync(cupsFileNumber(fp))) + { + cupsdLogMessage(CUPSD_LOG_ERROR, "Unable to sync changes to \"%s\": %s", + filename, strerror(errno)); + cupsFileClose(fp); + return (-1); + } + } + + /* * First close the file... */ [-- Attachment #3: Type: text/plain, Size: 2047 bytes --] As I mentioned before the default is off (SyncOnClose No), and I included a big disclaimer on its use in the HTML documentation. Let me know if you have any feedback - this will be included in CUPS 1.6.4 and 1.7.0. On Jul 23, 2013, at 3:01 PM, Michael Sweet <msweet@apple.com> wrote: > I can make sure that the changes apply cleanly to 1.6.x and are included in the final 1.6.x release (1.6.4). Beyond that, yes, you'll need to coordinate with your distro of choice or edit and compile yourself. > > > On 2013-07-23, at 2:39 PM, Steve Bergman <sbergman27@gmail.com> wrote: > >> On 07/23/2013 01:16 PM, Michael Sweet wrote: >>> In this case the unrealistic expectation is that if you pull the plug on a running system that you won't lose any data. Systems that offer that level of assurance are designed accordingly and generally include some form of backup power that will allow pending writes to complete. >> >> That flies in the face of the 12 years of experience I have with the same workloads running under ext3. And running under ext4/delalloc or xfs, cups should be able to do even better for both performance and reliability if it does things right. >> >> But that's all just spilled milk over the dam now. Most all current Unix/Linux filesystems now require fsync in order to provide sane integrity guarantees. That's just a fact of life that we've all got to live with. If offering a fix in the next major CUPS release is the best you're willing to do, then it is, at least, better than nothing at all. Thank you, at least, for that much. >> >> This moves the focus to convincing the CUPS package maintainers of existing distros to patch their pre-1.7 versions with the necessary functionality. Which is kind of where I started. So back to the RH Bugzilla thread... >> >> -Steve > > _________________________________________________________ > Michael Sweet, Senior Printing System Engineer, PWG Chair > _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <51F2F05E.3000207@redhat.com>]
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns [not found] ` <51F2F05E.3000207@redhat.com> @ 2013-07-26 22:10 ` Michael Sweet 0 siblings, 0 replies; 18+ messages in thread From: Michael Sweet @ 2013-07-26 22:10 UTC (permalink / raw) To: Eric Sandeen Cc: Steve Bergman, printing-architecture@lists.linux-foundation.org Eric, On Jul 26, 2013, at 5:55 PM, Eric Sandeen <sandeen@redhat.com> wrote: > ... > Otherwise, this seems good. The icing on the cake would be to fsync the parent dir after the renames are done... Yeah, but then we start getting into the really-OS-and-filesystem-specific code. I'm happy to incorporate Linux-specific code if you provide it, but I know for sure that a simple open+fsync+close of directories doesn't work in general - the only portable way I know of is sync(2), which is a sledgehammer... _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <51EE1183.5080704@redhat.com>]
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns [not found] ` <51EE1183.5080704@redhat.com> @ 2013-07-23 12:05 ` Michael Sweet [not found] ` <51EEA872.4070009@redhat.com> 0 siblings, 1 reply; 18+ messages in thread From: Michael Sweet @ 2013-07-23 12:05 UTC (permalink / raw) To: Eric Sandeen; +Cc: printing-architecture, sbergman27 Eric, On 2013-07-23, at 1:15 AM, Eric Sandeen <sandeen@redhat.com> wrote: >> ... >> The problem with fsync() is that it is a blocking API. Blocking >> cupsd (single-threaded daemon process) is *not* a good idea. >> >> Future versions of cupsd will be multi-threaded where we can look at >> calling fsync or fcntl(fd, F_FULLSYNC), but adding fsync calls now >> will just cause more problems that it will solve. > > How often is the file rewritten? TBH I didn't expect that the cups > daemon would be terribly latency-sensitive, at least not to the point > where losing it was better than taking the perf hit from an fsync... printers.conf gets rewritten any time the state or configuration changes. We *do* try to minimize the number of writes by coalescing them into 30 second snapshots, but that will still be often enough that I wouldn't want to chance adding a call that could block for a long time (if printers.conf is on a network filesystem, for example). _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <51EEA872.4070009@redhat.com>]
* Re: [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns [not found] ` <51EEA872.4070009@redhat.com> @ 2013-07-23 17:51 ` Michael Sweet 0 siblings, 0 replies; 18+ messages in thread From: Michael Sweet @ 2013-07-23 17:51 UTC (permalink / raw) To: Eric Sandeen; +Cc: printing-architecture, sbergman27 Eric, On 2013-07-23, at 11:59 AM, Eric Sandeen <sandeen@redhat.com> wrote: > ... > Well, given the choice between safe and fast, IMHO safe is usually > the best choice. I can't imagine that an fsync of a few-K file > every 30s is going to hurt that much. > > Is this based on a suspicion or have you actually seen problems? In OS X 10.7/10.8 we had millions of users affected by a previously undetected fsync issue. In the final analysis we found that the reason for the sudden reports was that fsync takes longer on systems with SSDs due to the larger buffering that typically occurs to minimize the number of write cycles used overall. > ... > p.s. this patch should almost DTRT, although it should sync > the dir containing the file as well, after the .N / .O filename juggling. You probably also want to call cupsFileFlush(fp) to ensure that the remaining data has been written. I will consider adding a cupsd.conf directive to force an fsync call (default off) in the 1.7.0 release. Filed as the following Apple bug: <rdar://problem/14523043> cups.org: Add SyncOnClose directive to fsync calls prior to closing .conf file updates. _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2013-07-26 22:10 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-22 17:28 [Printing-architecture] printers.conf frequently gets truncated to zero length after unclean shutdowns Jiri Popelka
2013-07-22 17:46 ` Till Kamppeter
[not found] ` <51EDADB4.5000807@redhat.com>
2013-07-23 3:03 ` Michael Sweet
[not found] ` <51ED6CC4.1040501@redhat.com>
2013-07-23 3:00 ` Michael Sweet
2013-07-23 9:08 ` Tim Waugh
2013-07-23 12:08 ` Michael Sweet
[not found] ` <51EE90D7.9070607@gmail.com>
2013-07-23 16:40 ` Michael Sweet
[not found] ` <51EEBF83.2090006@gmail.com>
2013-07-23 18:16 ` Michael Sweet
[not found] ` <51EECB0D.5050208@redhat.com>
2013-07-23 18:59 ` Michael Sweet
[not found] ` <51EECDF8.9070209@gmail.com>
2013-07-23 19:01 ` Michael Sweet
2013-07-23 20:51 ` Till Kamppeter
2013-07-23 23:04 ` Michael R Sweet
2013-07-24 10:11 ` Jiri Popelka
2013-07-24 11:39 ` Michael Sweet
2013-07-26 21:36 ` Michael Sweet
[not found] ` <51F2F05E.3000207@redhat.com>
2013-07-26 22:10 ` Michael Sweet
[not found] ` <51EE1183.5080704@redhat.com>
2013-07-23 12:05 ` Michael Sweet
[not found] ` <51EEA872.4070009@redhat.com>
2013-07-23 17:51 ` Michael Sweet
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.