Age | Commit message (Collapse) | Author |
|
Let it return current nfs_pgio_mirror in use depending on pg_mirror_count.
For read, we always use pg_mirrors[0], so this effectively gives us freedom
to use pg_mirror_idx to track the actual mirror to read from through out the
IO stack.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
|
|
so that we don't reset desc->pg_mirror_idx for read unnecessarily.
Remove WARN_ON_ONCE from __nfs_pageio_add_request to allow LD to
set pg_mirror_idx for read where pg_mirror_count is always 1.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
|
|
So that we can detect the case if some layout segments are still
pinned which is surely a bug that we need to fix.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
|
|
This skips the WARN_ON_ONCE, but doesnt change behavior (the memcmp would
fail).
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
The current mirroring code only notices short writes to the first
mirror. This patch keeps per-mirror byte counts and only considers
a byte to be written once all mirrors report so.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
|
|
This patch adds mirrored write support to the pgio layer. The default
is to use one mirror, but pgio callers may define callbacks to change
this to any value up to the (arbitrarily selected) limit of 16.
The basic idea is to break out members of nfs_pageio_descriptor that cannot
be shared between mirrored DSes and put them in a new structure.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
|
|
Pass ds_commit_idx through the nfs commit path. It's used to select
the commit bucket when using pnfs and is ignored when not using pnfs.
Several functions had to be changed: nfs_retry_commit,
nfs_mark_request_commit, pnfs_mark_request_commit and the pnfs layout
driver .mark_request_commit functions.
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
|
|
'ds_commit_idx' is a better name - it is used to select the right
commit bucket for pnfs.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
|
|
This is needed for mirrored DS support, where multuple requests
cover the same range.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
|
|
This is needed to support mirrored writes - the first write can't just
trash the lseg, we need to keep it around until all mirrors have
written.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
|
|
Add a new operation to nfs_pageio_ops that is called on nfs_pageio_complete.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
|
|
Instead of calling layoutreturn directly, call pnfs_error_mark_layout_for_return
to mark layouts for return and let generic code return layout when
layout segments are freed.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
Conflicts:
fs/nfs/filelayout/filelayout.c
|
|
So that pnfs path is not disabled for ever.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
If current lseg is the last lseg marked with NFS_LSEG_LAYOUTRETURN,
send layoutreturn.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
And if we are to return the same type of layouts, don't bother
sending more layoutgets.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
It marks all matching layout segments as NFS_LSEG_LAYOUTRETURN,
which is an indicator for pnfs_put_lseg() to send layoutreturn,
and also prevents pnfs_update_layout() from using the returning
segments. Once it is set, it never gets cleared.
It also sets proper io failure bit so that pnfs path can be retried
after PNFS_LAYOUTGET_RETRY_TIMEOUT second.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
It allows to specify different iomode to return.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
So that it is possible to return a specific iomode layouts.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
Flexfiles layout would want to use them to report DS IO status.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
Per RFC 5661 Errata 3208:
| A client MAY always forget its layout state and associated
| layout stateid at any time (See also section 12.5.5.1).
| In such case, the client MUST use a non-layout stateid for the next
| LAYOUTGET operation. This will signal the server that the client has
| no more layouts on the file and its respective layout state can be
| released before issuing a new layout in response to LAYOUTGET.
In order to make such a signal unique to server, client needs to serialize
all layoutgets using non-layout stateid. We implement this by serializing
layoutgets when client has no layout segments at hand.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
flexfiles needs to start layoutcommit when necessary
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
|
|
lockd assumes hostname exists otherwise kernel oops.
It can be reproduced by following steps:
1. mount flexfile MDS
2. write some files
3. mount DS via nfsv3
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8134f332>] strlen+0x2/0x20
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: nfsd(F) nfs_layout_flexfiles(F) rpcsec_gss_krb5(F) auth_rpcgss(F) nfsv4(F) dns_resolver(F) nfsv3(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) ebtable_nat(F) nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) ip6table_filter(F) ip6_tables(F) bnep(F) snd_ens1371(F) snd_rawmidi(F) snd_ac97_codec(F) btusb(F) ac97_bus(F) snd_seq(F) snd_seq_device(F) snd_pcm(F) ppdev(F) bluetooth(F) 6lowpan_iphc(F) rfkill(F) vmw_balloon(F) snd_timer(F) snd(F) soundcore(F) gameport(F) i2c_piix4(F) e1000(F) vmw_vmci(F) parport_pc(F) parport(F) shpchp(F) uinput(F) xfs(F) libcrc32c(F) vmwgfx(F) ttm(F) drm(F) mptspi(F) scsi_transport_spi(F) mptscsih(F) mptbase(F) i2c_core(F)
CPU: 0 PID: 10397 Comm: mount.nfs Tainted: GF 3.14.7-100.pd_client.001.fc16.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
task: ffff880008942600 ti: ffff880007990000 task.ti: ffff880007990000
RIP: 0010:[<ffffffff8134f332>] [<ffffffff8134f332>] strlen+0x2/0x20
RSP: 0018:ffff880007991aa0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880038d39c20 RCX: 0000000000000004
RDX: 0000000000000006 RSI: 0000000000000010 RDI: 0000000000000000
RBP: ffff880007991b38 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000014600 R11: 0000000000000400 R12: ffffffff81cc8580
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000004
FS: 00007f90cd2ef880(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001710000 CR4: 00000000001407f0
Stack:
ffffffffa045f52c ffff880001782230 ffff880004141e28 0006880007991ac8
ffffffff816dc14b ffff880000000000 ffff880038d39c20 0000000000000010
0000000481cc0006 0000000000000000 ffffffffa0410be8 000000000000c014
Call Trace:
[<ffffffffa045f52c>] ? nlmclnt_lookup_host+0x4c/0x2c0 [lockd]
[<ffffffff816dc14b>] ? _raw_spin_unlock_bh+0x1b/0x20
[<ffffffffa0410be8>] ? svc_destroy+0xb8/0x140 [sunrpc]
[<ffffffffa045c323>] nlmclnt_init+0x53/0xc0 [lockd]
[<ffffffffa047d2dc>] ? nfs_get_client+0x1cc/0x340 [nfs]
[<ffffffffa047c2e7>] nfs_start_lockd+0xa7/0xd0 [nfs]
[<ffffffffa047df71>] nfs_create_server+0x181/0x5c0 [nfs]
[<ffffffffa04460f3>] nfs3_create_server+0x13/0x30 [nfsv3]
[<ffffffffa048a0bc>] nfs_try_mount+0x21c/0x300 [nfs]
[<ffffffff811ca32d>] ? __kmalloc_track_caller+0x1ad/0x240
[<ffffffffa048b677>] ? nfs_fs_mount+0xc37/0xd80 [nfs]
[<ffffffffa048ad05>] nfs_fs_mount+0x2c5/0xd80 [nfs]
[<ffffffffa048a830>] ? nfs_clone_super+0x140/0x140 [nfs]
[<ffffffffa048a240>] ? nfs_clone_sb_security+0x40/0x40 [nfs]
[<ffffffff811e7e43>] mount_fs+0x43/0x1b0
[<ffffffff81193100>] ? __alloc_percpu+0x10/0x20
[<ffffffff812026e6>] vfs_kern_mount+0x76/0x120
[<ffffffff81204917>] do_mount+0x237/0xa80
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Enable pNFS callbacks to allow flex files to work correctly with a
NFSv3-enabled data server.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
so that flexfile layout client can pass in DS credential instead of
using user cred, which will be done in the next patch.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
flexclient needs this as there is no nfs_server to DS connection.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
pnfs flexfile layout client may want to use NFSv3 ops rather
than the default MDS v4 ops.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
|
|
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
flexfile layout may need to set such when making DS connections.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
The flexfiles layout wants to create DS connection over NFSv3.
Add nfs3_set_ds_client to allow that to happen.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
They can be reused by flexfile layout as well.
Also add a code such that if read fails on one DS and
there are other DSes available to use, don't resend
through MDS but through pNFS so that client can read
from other DSes.
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
flexfile layout may use different auth flavor as specified by MDS.
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
It can be reused by flexfiles layout client.
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
It can be reused by flexfile layout.
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
Also pull nfs4_pnfs_ds_addr and nfs4_pnfs_ds to generic pnfs.
They can all be reused by flexfile layout as well.
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
|
|
Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
|
|
The flexfilelayout driver will share some common code
with the filelayout driver. This set of changes refactors
that common code out to avoid any module depenencies.
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is one kernfs fix for a reported issue for 3.19-rc5.
It has been in linux-next for a while"
* tag 'driver-core-3.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: Fix kernfs_name_compare
|
|
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
- Stable fix for a NFSv3/lockd race
- Fixes for several NFSv4.1 client id trunking bugs
- Remove an incorrect test when checking for delegated opens"
* tag 'nfs-for-3.19-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Remove incorrect check in can_open_delegated()
NFS: Ignore transport protocol when detecting server trunking
NFSv4/v4.1: Verify the client owner id during trunking detection
NFSv4: Cache the NFSv4/v4.1 client owner_id in the struct nfs_client
NFSv4.1: Fix client id trunking on Linux
LOCKD: Fix a race when initialising nlmsvc_timeout
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"This fixes a regression in the latest fuse update plus a fix for a
rather theoretical memory ordering issue"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: add memory barrier to INIT
fuse: fix LOOKUP vs INIT compat handling
|
|
commit 0efaa7e82f02fe69c05ad28e905f31fc86e6f08e
locks: generic_delete_lease doesn't need a file_lock at all
moves the call to fl->fl_lmops->lm_change() to a place in the
code where fl might be a non-lease lock.
When that happens, fl_lmops is NULL and an Oops ensures.
So add an extra test to restore correct functioning.
Reported-by: Linda Walsh <suse@tlinx.org>
Link: https://bugzilla.suse.com/show_bug.cgi?id=912569
Cc: stable@vger.kernel.org (v3.18)
Fixes: 0efaa7e82f02fe69c05ad28e905f31fc86e6f08e
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc fixes: group scheduling corner case fix, two deadline scheduler
fixes, effective_load() overflow fix, nested sleep fix, 6144 CPUs
system fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group()
sched/deadline: Avoid double-accounting in case of missed deadlines
sched/deadline: Fix migration of SCHED_DEADLINE tasks
sched: Fix odd values in effective_load() calculations
sched, fanotify: Deal with nested sleeps
sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation
|
|
Pull two nfsd bugfixes from Bruce Fields.
* 'for-3.19' of git://linux-nfs.org/~bfields/linux:
rpc: fix xdr_truncate_encode to handle buffer ending on page boundary
nfsd: fix fi_delegees leak when fi_had_conflict returns true
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull two Ceph fixes from Sage Weil:
"These are both pretty trivial: a sparse warning fix and size_t printk
thing"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: fix sparse endianness warnings
ceph: use %zu for len in ceph_fill_inline_data()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"None of these are huge, but my commit does fix a regression from 3.18
that could cause lost files during log replay.
This also adds Dave Sterba to the list of Btrfs maintainers. It
doesn't mean we're doing things differently, but Dave has really been
helping with the maintainer workload for years"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: don't delay inode ref updates during log replay
Btrfs: correctly get tree level in tree_backref_for_extent
Btrfs: call inode_dec_link_count() on mkdir error path
Btrfs: abort transaction if we don't find the block group
Btrfs, scrub: uninitialized variable in scrub_extent_for_parity()
Btrfs: add more maintainers
|
|
Returning a difference from a comparison functions is usually wrong
(see acbbe6fbb240 "kcmp: fix standard comparison bug" for the long
story). Here there is the additional twist that if the void pointers
ns and kn->ns happen to differ by a multiple of 2^32,
kernfs_name_compare returns 0, falsely reporting a match to the
caller.
Technically 'hash - kn->hash' is ok since the hashes are restricted to
31 bits, but it's better to avoid that subtlety.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
As per e23738a7300a ("sched, inotify: Deal with nested sleeps").
fanotify_read is a wait loop with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).
Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/20141216152838.GZ3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Fix clashing values for O_PATH and FMODE_NONOTIFY on sparc. The
clashing O_PATH value was added in commit 5229645bdc35 ("vfs: add
nonconflicting values for O_PATH") but this can't be changed as it is
user-visible.
FMODE_NONOTIFY is only used internally in the kernel, but it is in the
same numbering space as the other O_* flags, as indicated by the comment
at the top of include/uapi/asm-generic/fcntl.h (and its use in
fs/notify/fanotify/fanotify_user.c). So renumber it to avoid the clash.
All of this has happened before (commit 12ed2e36c98a: "fanotify:
FMODE_NONOTIFY and __O_SYNC in sparc conflict"), and all of this will
happen again -- so update the uniqueness check in fcntl_init() to
include __FMODE_NONOTIFY.
Signed-off-by: David Drysdale <drysdale@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|