All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Paul Turner <pjt@google.com>
Cc: linux-kernel@vger.kernel.org, adobriyan@gmail.com
Subject: Re: Migration of kernel interfaces to seq_files breaks pread() consumers
Date: Sat, 24 Jan 2009 18:19:24 -0800	[thread overview]
Message-ID: <20090124181924.d633523c.akpm@linux-foundation.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0901162229420.12858@kitami.corp.google.com>

On Fri, 16 Jan 2009 23:51:35 -0800 (PST) Paul Turner <pjt@google.com> wrote:

> 
> (Specifically) Several interfaces under /proc have been migrated to use 
> seq_files.  This was previously observed to be a problem with VMware's 
> reading of /proc/uptime.  We're now running into the same problem on  
> /proc/<pid>/stat; we have many consumers performing preads on this 
> interface which break under new kernels.
> 
> Reverting these migrations presents other problems and doesn't scale with 
> everyones' pet dependencies over an abi that's been
> broken :(

We changed userspace-visible behaviour and broke real applications. 
This is a serious matter.  So serious in fact that your report has
languished without reply for a week.

Reverting those changes until we have a suitable reimplementation which
doesn't bust userspace is 100% justifiable.

In which kernel versions is this regression present?

What would a revert look like?  Big and ugly or small and simple?  Do
the original commits (which were they?) still revert OK?

> Part of the problem in implementing pread in seq_files is that we don't  
> know know whether the read was issued by pread(2) or read(2).  It's not 
> nice to shoehorn this information down the stack.  I've attached a 
> skeleton patch which shows one way we could push it up (although something 
> like a second f_pos would be necessary to make it maintain pread 
> semantics against reads).
> 
> One advantage of this style of approach is that it doesn't break on 
> partial record reads.  But it's a little gross at the same time.
> 

Yes, that is a bit gross.

Does this patch actually 100% solve the problem, or is it a precursor
to some other fix or what?  It's hard to comment sensibly if it's a
partial thing with no sign how it will be used.

> diff --git a/fs/read_write.c b/fs/read_write.c
> index 2fc2980..744094a 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -407,6 +407,16 @@ asmlinkage ssize_t sys_pread64(unsigned int fd, char __user *buf,
>  		ret = -ESPIPE;
>  		if (file->f_mode & FMODE_PREAD)
>  			ret = vfs_read(file, buf, count, &pos);
> +		else if (file->f_mode & FMODE_SEQ_FILE) {
> +			/*
> +			 * We break the pread semantic and actually make it
> +			 * seek, this prevents inconsistent record reads across
> +			 * boundaries.
> +			 */
> +			vfs_llseek(file, pos, SEEK_SET);
> +			ret = vfs_read(file, buf, count, &pos);
> +			file_pos_write(file, pos);
> +		}

Well yes, that's a userspace-visible wrong change too.

>  		fput_light(file, fput_needed);
>  	}
>  
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index 3f54dbd..f8c5379 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -50,6 +50,8 @@ int seq_open(struct file *file, const struct seq_operations *op)
>  
>  	/* SEQ files support lseek, but not pread/pwrite */
>  	file->f_mode &= ~(FMODE_PREAD | FMODE_PWRITE);
> +	file->f_mode |= FMODE_SEQ_FILE;
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL(seq_open);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 5f7b912..c3b5916 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -76,6 +76,8 @@ extern int dir_notify_enable;
>     behavior for cross-node execution/opening_for_writing of files */
>  #define FMODE_EXEC	16
>  
> +#define FMODE_SEQ_FILE_PREAD	32

-EWONTCOMPILE, btw.

  reply	other threads:[~2009-01-25  2:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-17  7:51 Migration of kernel interfaces to seq_files breaks pread() consumers Paul Turner
2009-01-25  2:19 ` Andrew Morton [this message]
2009-01-25  3:40   ` Paul Turner
2009-01-25 12:08   ` Alexey Dobriyan
2009-01-27 21:47     ` Eric W. Biederman
2009-01-27 21:48       ` [PATCH 1/2] seq_file: Move traverse so it can be used from seq_read Eric W. Biederman
2009-01-27 21:49         ` [PATCH 2/2] seq_file: Properly cope with pread Eric W. Biederman
2009-01-30  1:26       ` Migration of kernel interfaces to seq_files breaks pread() consumers Paul Turner
2009-01-30  3:01         ` Eric W. Biederman
2009-01-30  6:09           ` Paul Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090124181924.d633523c.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=adobriyan@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pjt@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.