public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: sandeen@redhat.com, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 27/29] xfs_scrub: integrate services with systemd
Date: Thu, 25 Jan 2018 15:10:48 -0800	[thread overview]
Message-ID: <20180125231048.GR9068@magnolia> (raw)
In-Reply-To: <20180125214411.GP9068@magnolia>

On Thu, Jan 25, 2018 at 01:44:11PM -0800, Darrick J. Wong wrote:
> On Thu, Jan 25, 2018 at 03:21:25PM -0600, Eric Sandeen wrote:
> > On 1/17/18 4:04 PM, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Create a systemd service unit so that we can run the online scrubber
> > > under systemd with (somewhat) appropriate containment.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > <continue random review method>
> > 
> > ...
> > 
> > > +/*
> > > + * If we are running as a service, we need to be careful about what
> > > + * error codes we return to the calling process.
> > > + */
> > > +static bool			is_service;
> > > +
> > >  static void __attribute__((noreturn))
> > >  usage(void)
> > >  {
> > > @@ -617,6 +623,9 @@ _("Only one of the options -n or -y may be specified.\n"));
> > >  	if (stdout_isatty && !progress_fp)
> > >  		progress_fp = fdopen(1, "w+");
> > >  
> > > +	if (getenv("SERVICE_MODE"))
> > > +		is_service = true;
> > > +
> > >  	/* Find the mount record for the passed-in argument. */
> > >  	if (stat(argv[optind], &ctx.mnt_sb) < 0) {
> > >  		fprintf(stderr,
> > > @@ -717,5 +726,21 @@ _("%s: %llu warnings found.\n"),
> > >  	free(ctx.blkdev);
> > >  	free(ctx.mntpoint);
> > >  
> > > +	/*
> > > +	 * If we're running as a service, bump return code up by 150 to
> > > +	 * avoid conflicting with (sysvinit) service return codes.
> > > +	 */
> > > +	if (is_service) {
> > > +		/*
> > > +		 * journald queries /proc as part of taking in log
> > > +		 * messages; it uses this information to associate the
> > > +		 * message with systemd units, etc.  This races with
> > > +		 * process exit, so delay that a couple of seconds so
> > > +		 * that we capture the summary outputs in the job log.
> > > +		 */
> > > +		sleep(2);
> > > +		if (ret)
> > > +			ret += 150;
> > > +	}
> > 
> > Ok this seems batshit crazy, no offense.  I don't blame /you/ ;)  You pointed me
> > at  http://refspecs.linuxbase.org/LSB_2.0.1/LSB-PDA/LSB-PDA/iniscrptact.html
> > but it says that the /initscript/ should exit with specific codes, not that
> > the application it /calls/ should do so.
> 
> Yes.  At the moment, the systemd service calls xfs_scrub directly, hence
> it interprets the return code as an initscript error code.  So in theory
> we could create a wrapper script that does all the is_service junk, but
> now that's another weird little script to break.  On the plus side we'd
> contain the systemd workaround crap to some random wrapper in /usr/lib.
> 
> Ok, I'll think about it.

We can't do the wrapper thing because of that stupid systemd journal bug
wherein there are delays between the point where the service logs a
message and journald processes.  The process must still be in memory in
order to index (the log message (which encodes the pid)) -> (process) ->
(service) so that /all/ the log messages are associated with the systemd
unit.

Because if we don't do that, xfs_scrub_fail won't see the end of the
xfs_scrub messages and therefore won't email the entire error transcript
to the admin, which is stupid.

Stupid stupid stupid stupid.  If it weren't for the easy sandboxing and
io prioritization stuff none of this would be worth it.

--D

> > > If the status command is given, /the init script/ will return the following exit status codes. 
> > 
> > So maybe this is getting philosophical, if xfs_scrub detects corruption,
> > does that mean it "failed?"  Shouldn't we use your _fail script to interpret
> > the scrub errors, and then give an appropriate return value back to the thing that
> > called us?
> 
> We want the xfs_scrub@ service to return a failure code so that systemd
> will notice the failure and start the xfs_scrub_fail@ fail service.
> This "fail service" doesn't receive the error code of the service that
> failed passed to it, so all it can do is rifle through the journal
> output for whatever xfs_scrub spat out and email it somewhere.
> 
> Hm, yeah, all that crap doesn't need to be there for the non-systemd
> case.
> 
> > Anyway, can't we do return code mangling as appropriate in the script(s)
> > that call xfs_scrub, vs. in the binary itself?  The above just seems really
> > odd and non-obvious to me.
> > 
> > I'll try to plow through more of the scripty stuff in a bit.
> 
> Yeah.
> 
> --D
> 
> > 
> > Thanks,
> > -Eric
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2018-01-25 23:10 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-17 22:01 [PATCH v11.1 00/29] xfsprogs: online scrub/repair support Darrick J. Wong
2018-01-17 22:01 ` [PATCH 01/29] xfs_scrub: create online filesystem scrub program Darrick J. Wong
2018-01-17 22:02 ` [PATCH 02/29] xfs_scrub: common error handling Darrick J. Wong
2018-01-17 22:02 ` [PATCH 03/29] xfs_scrub: set up command line argument parsing Darrick J. Wong
2018-01-17 22:02 ` [PATCH 04/29] xfs_scrub: dispatch the various phases of the scrub program Darrick J. Wong
2018-01-17 22:02 ` [PATCH 05/29] xfs_scrub: figure out how many threads we're going to need Darrick J. Wong
2018-01-17 22:02 ` [PATCH 06/29] xfs_scrub: create an abstraction for a block device Darrick J. Wong
2018-01-17 22:02 ` [PATCH 07/29] xfs_scrub: find XFS filesystem geometry Darrick J. Wong
2018-01-17 22:02 ` [PATCH 08/29] xfs_scrub: add inode iteration functions Darrick J. Wong
2018-01-17 22:02 ` [PATCH 09/29] xfs_scrub: add space map " Darrick J. Wong
2018-01-17 22:02 ` [PATCH 10/29] xfs_scrub: add file " Darrick J. Wong
2018-01-17 22:03 ` [PATCH 11/29] xfs_scrub: filesystem counter collection functions Darrick J. Wong
2018-01-17 22:03 ` [PATCH 12/29] xfs_scrub: wrap the scrub ioctl Darrick J. Wong
2018-01-17 22:03 ` [PATCH 13/29] xfs_scrub: scan filesystem and AG metadata Darrick J. Wong
2018-01-17 22:03 ` [PATCH 14/29] xfs_scrub: thread-safe stats counter Darrick J. Wong
2018-01-17 22:03 ` [PATCH 15/29] xfs_scrub: scan inodes Darrick J. Wong
2018-01-17 22:03 ` [PATCH 16/29] xfs_scrub: check directory connectivity Darrick J. Wong
2018-01-17 22:03 ` [PATCH 17/29] xfs_scrub: warn about suspicious characters in directory/xattr names Darrick J. Wong
2018-01-17 22:03 ` [PATCH 18/29] xfs_scrub: warn about normalized Unicode name collisions Darrick J. Wong
2018-01-17 22:03 ` [PATCH 19/29] xfs_scrub: create a bitmap data structure Darrick J. Wong
2018-01-17 22:03 ` [PATCH 20/29] xfs_scrub: create infrastructure to read verify data blocks Darrick J. Wong
2018-01-17 22:04 ` [PATCH 21/29] xfs_scrub: scrub file " Darrick J. Wong
2018-01-17 22:04 ` [PATCH 22/29] xfs_scrub: optionally use SCSI READ VERIFY commands to scrub data blocks on disk Darrick J. Wong
2018-01-17 22:04 ` [PATCH 23/29] xfs_scrub: check summary counters Darrick J. Wong
2018-01-17 22:04 ` [PATCH 24/29] xfs_scrub: fstrim the free areas if there are no errors on the filesystem Darrick J. Wong
2018-01-17 22:04 ` [PATCH 25/29] xfs_scrub: progress indicator Darrick J. Wong
2018-01-17 22:04 ` [PATCH 26/29] xfs_scrub: create a script to scrub all xfs filesystems Darrick J. Wong
2018-01-17 22:04 ` [PATCH 27/29] xfs_scrub: integrate services with systemd Darrick J. Wong
2018-01-25 21:21   ` Eric Sandeen
2018-01-25 21:44     ` Darrick J. Wong
2018-01-25 22:16       ` Eric Sandeen
2018-01-25 22:36         ` Darrick J. Wong
2018-01-25 23:10       ` Darrick J. Wong [this message]
2018-01-26 18:15   ` [PATCH v2 " Darrick J. Wong
2018-01-17 22:04 ` [PATCH 28/29] xfs_scrub: wire up repair ioctl Darrick J. Wong
2018-01-17 22:04 ` [PATCH 29/29] xfs_scrub: schedule and manage optimizations/repairs to the filesystem Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2018-01-31  3:06 [PATCH v11.2 00/29] xfsprogs: online scrub/repair support Darrick J. Wong
2018-01-31  3:09 ` [PATCH 27/29] xfs_scrub: integrate services with systemd Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180125231048.GR9068@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox