Age | Commit message (Collapse) | Author |
|
We'll use this elsewhere.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The stateowner has some fields that only make sense for openowners, and
some that only make sense for lockowners, and I find it a lot clearer if
those are separated out.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Move the CLOSE_STATE case into the unique caller that cares about it
rather than putting it in preprocess_seqid_op.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
I don't see the point of having this check in nfs4_preprocess_seqid_op()
when it's only needed by the one caller.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Sometimes the single-exit style is good, sometimes it's unnecessarily
convoluted....
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
This is used only as a local variable.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Maybe we'll bring it back some day, but we don't have much real use for
it now.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
If open fails with any error other than nfserr_replay_me, then the main
nfsd4_proc_compound() loop continues unconditionally to
nfsd4_encode_operation(), which will always call encode_seqid_op_tail.
Thus the condition we check for here does not occur.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
There are currently a couple races in the seqid replay code: a
retransmission could come while we're still encoding the original reply,
or a new seqid-mutating call could come as we're encoding a replay.
So, extend the state lock over the encoding (both encoding of a replayed
reply and caching of the original encoded reply).
I really hate doing this, and previously added the stateowner
reference-counting code to avoid it (which was insufficient)--but I
don't see a less complicated alternative at the moment.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Now that the replay owner is in the cstate we can remove it from a lot
of other individual operations and further simplify
nfs4_preprocess_seqid_op().
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Set the stateowner associated with a replay in one spot in
nfs4_preprocess_seqid_op() and keep it in cstate. This allows removing
a few lines of boilerplate from all the nfs4_preprocess_seqid_op()
callers.
Also turn ENCODE_SEQID_OP_TAIL into a function while we're here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Thanks to Casey for reminding me that 5661 gives a special meaning to a
value of 0 in the stateid's seqid field, so all stateid's should start
out with si_generation 1. We were doing that in the open and lock
cases for minorversion 1, but not for the delegation stateid, and not
for openstateid's with v4.0.
It doesn't *really* matter much for v4.0 or for delegation stateid's
(which never get the seqid field incremented), but we may as well do the
same for all of them.
Reported-by: Casey Bodley <cbodley@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Follow the recommendation from rfc3530bis for stateid generation number
wraparound, simplify some code, and fix or remove incorrect comments.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
There's no reason to have two separate hash tables for open and lock
stateid's.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The trick free_stateid is using is a little cheesy, and we'll have more
uses for this field later.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Wow, I wonder how long that typo's been there.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The values here represent highest slotid numbers. Since slotid's are
numbered starting from zero, the highest should be one less than the
number of slots.
Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We don't need this any more.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
When called with OPEN_STATE, preprocess_seqid_op only returns an open
stateid, hence only an open owner.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We've got some lock-specific code here in nfs4_preprocess_seqid_op which
is only used by nfsd4_lock(). Move it to the caller.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Note that the special handling for the lock stateid case is already done
by nfs4_check_openmode() (as of 02921914170e3b7fea1cd82dac9713685d2de5e2
"nfsd4: fix openmode checking on IO using lock stateid") so we no longer
need these two cases in the caller.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
This flag doesn't really buy us anything.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Share some common code, stop doing silly things like initializing a list
head immediately before adding it to a list, etc.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
These appear to be generic (for both open and lock owners), but they're
actually just for open owners. This has confused me more than once.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
For all the usual reasons. (Type safety, readability.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The server is returning nfserr_resource for both permanent errors and
for errors (like allocation failures) that might be resolved by retrying
later. Save nfserr_resource for the former and use delay/jukebox for
the latter.
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Even if we fail to write a recovery record, we should still mark the
client as having acquired its first state. Otherwise we leave 4.1
clients with indefinite ERR_GRACE returns.
However, an inability to write stable storage records may cause failures
of reboot recovery, and the problem should still be brought to the
server administrator's attention.
So, make sure the error is logged.
These errors shouldn't normally be triggered on a corectly functioning
server--this isn't a case where a misconfigured client could spam the
logs.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Move around some of this code, simplify a bit.
Reviewed-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Acked-by: Jim Rees <rees@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
A client that wants to execute a file must be able to read it. Read
opens over nfs are therefore implicitly allowed for executable files
even when those files are not readable.
NFSv2/v3 get this right by using a passed-in NFSD_MAY_OWNER_OVERRIDE on
read requests, but NFSv4 has gotten this wrong ever since
dc730e173785e29b297aa605786c94adaffe2544 "nfsd4: fix owner-override on
open", when we realized that the file owner shouldn't override
permissions on non-reclaim NFSv4 opens.
So we can't use NFSD_MAY_OWNER_OVERRIDE to tell nfsd_permission to allow
reads of executable files.
So, do the same thing we do whenever we encounter another weird NFS
permission nit: define yet another NFSD_MAY_* flag.
The industry's future standardization on 128-bit processors will be
motivated primarily by the need for integers with enough bits for all
the NFSD_MAY_* flags.
Reported-by: Leonardo Borda <leonardoborda@gmail.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Userspace shouldn't have a use for these constants. Nothing here is
used outside fs/nfsd.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The nfsd4 code has a bunch of special exceptions for error returns which
map nfserr_symlink to other errors.
In fact, the spec makes it clear that nfserr_symlink is to be preferred
over less specific errors where possible.
The patch that introduced it back in 2.6.4 is "kNFSd: correct symlink
related error returns.", which claims that these special exceptions are
represent an NFSv4 break from v2/v3 tradition--when in fact the symlink
error was introduced with v4.
I suspect what happened was pynfs tests were written that were overly
faithful to the (known-incomplete) rfc3530 error return lists, and then
code was fixed up mindlessly to make the tests pass.
Delete these unnecessary exceptions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Zero means "I don't care what kind of file this is". And that's
probably what we want--acls are also settable at least on directories,
and if the filesystem doesn't want them on other objects, leave it to it
to complain.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Add some more comments, simplify logic, do & S_IFMT just once, name
"type" more helpfully.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We allow the fh_verify caller to specify that any object *except* those
of a given type is allowed, by passing a negative type. But only one
caller actually uses it. Open-code that check in the one caller.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
A slightly unconventional approach to make the code more compact I could
live with, but let's give the poor reader *some* chance.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Use NUMA aware allocations to reduce latencies and increase throughput.
sunrpc kthreads can use kthread_create_on_node() if pool_mode is
"percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can
also take into account NUMA node affinity for memory allocations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: Neil Brown <neilb@suse.de>
CC: David Miller <davem@davemloft.net>
Reviewed-by: Greg Banks <gnb@fastmail.fm>
[bfields@redhat.com: fix up caller nfs41_callback_up]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
There's an incorrect comment here. Also clean up the logic: the
"rdlease" and "wrlease" locals are confusingly named, and don't really
add anything since we can make a decision as soon as we hit one of these
cases.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We currently use a bit in fl_flags to record whether a lease is being
broken, and set fl_type to the type (RDLCK or UNLCK) that it will
eventually have. This means that once the lease break starts, we forget
what the lease's type *used* to be. Breaking a read lease will then
result in blocking read opens, even though there's no conflict--because
the lease type is now F_UNLCK and we can no longer tell whether it was
previously a read or write lease.
So, instead keep fl_type as the original type (the type which we
enforce), and keep track of whether we're unlocking or merely
downgrading by replacing the single FL_INPROGRESS flag by
FL_UNLOCK_PENDING and FL_DOWNGRADE_PENDING flags.
To get this right we also need to track separate downgrade and break
times, to handle the case where a write-leased file gets conflicting
opens first for read, then later for write.
(I first considered just eliminating the downgrade behavior
completely--nfsv4 doesn't need it, and nobody as far as I can tell
actually uses it currently--but Jeremy Allison tells me that Windows
oplocks do behave this way, so Samba will probably use this some day.)
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
F_INPROGRESS isn't exposed to userspace. To me it makes more sense in
fl_flags....
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Use a helper function, to simplify upcoming changes.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Without this, an attempt to open a device special file without first
stat'ing it will fail.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The set of errors here does *not* agree with the set of errors specified
in the rfc!
While we're there, turn this macros into a function, for the usual
reasons, and move it to the one place where it's actually used.
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Fan Yong <yong.fan@whamcloud.com> noticed setting
FMODE_32bithash wouldn't work with nfsd v4, as
nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530
cookies have a 64 bit type and cookies are also defined as u64 in
'struct nfsd4_readdir'. So remove the test for >32-bit values.
Cc: stable@kernel.org
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Al points out that the do_follow_link() helper function really is
misnamed - it's about whether we should try to follow a symlink or not,
not about actually doing the following.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After commit 3567866bf261: "RCUify freeing acls, let check_acl() go ahead in
RCU mode if acl is cached" posix_acl_permission is being called with an
unsupported flag and the permission check fails. This patch fixes the issue.
Signed-off-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
ore: Make ore its own module
exofs: Rename raid engine from exofs/ios.c => ore
exofs: ios: Move to a per inode components & device-table
exofs: Move exofs specific osd operations out of ios.c
exofs: Add offset/length to exofs_get_io_state
exofs: Fix truncate for the raid-groups case
exofs: Small cleanup of exofs_fill_super
exofs: BUG: Avoid sbi realloc
exofs: Remove pnfs-osd private definitions
nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4
|
|
The inode structure layout is largely random, and some of the vfs paths
really do care. The path lookup in particular is already quite D$
intensive, and profiles show that accessing the 'inode->i_op->xyz'
fields is quite costly.
We already optimized the dcache to not unnecessarily load the d_op
structure for members that are often NULL using the DCACHE_OP_xyz bits
in dentry->d_flags, and this does something very similar for the inode
ops that are used during pathname lookup.
It also re-orders the fields so that the fields accessed by 'stat' are
together at the beginning of the inode structure, and roughly in the
order accessed.
The effect of this seems to be in the 1-2% range for an empty kernel
"make -j" run (which is fairly kernel-intensive, mostly in filename
lookup), so it's visible. The numbers are fairly noisy, though, and
likely depend a lot on exact microarchitecture. So there's more tuning
to be done.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|