summaryrefslogtreecommitdiff
path: root/fs/nilfs2
AgeCommit message (Collapse)Author
2015-08-13block: remove bio_get_nr_vecs()Kent Overstreet
We can always fill up the bio now, no need to estimate the possible size based on queue parameters. Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [hch: rebased and wrote a changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-29block: add a bi_error field to struct bioChristoph Hellwig
Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-09ioctl_compat: handle FITRIMMikulas Patocka
The FITRIM ioctl has the same arguments on 32-bit and 64-bit architectures, so we can add it to the list of compatible ioctls and drop it from compat_ioctl method of various filesystems. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ted Ts'o <tytso@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...
2015-06-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second patchbomb from Andrew Morton: - most of the rest of MM - lots of misc things - procfs updates - printk feature work - updates to get_maintainer, MAINTAINERS, checkpatch - lib/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits) exit,stats: /* obey this comment */ coredump: add __printf attribute to cn_*printf functions coredump: use from_kuid/kgid when formatting corename fs/reiserfs: remove unneeded cast NILFS2: support NFSv2 export fs/befs/btree.c: remove unneeded initializations fs/minix: remove unneeded cast init/do_mounts.c: add create_dev() failure log kasan: remove duplicate definition of the macro KASAN_FREE_PAGE fs/efs: femove unneeded cast checkpatch: emit "NOTE: <types>" message only once after multiple files checkpatch: emit an error when there's a diff in a changelog checkpatch: validate MODULE_LICENSE content checkpatch: add multi-line handling for PREFER_ETHER_ADDR_COPY checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr() checkpatch: fix processing of MEMSET issues checkpatch: suggest using ether_addr_equal*() checkpatch: avoid NOT_UNIFIED_DIFF errors on cover-letter.patch files checkpatch: remove local from codespell path checkpatch: add --showfile to allow input via pipe to show filenames ...
2015-06-25NILFS2: support NFSv2 exportNeilBrown
The "fh_len" passed to ->fh_to_* is not guaranteed to be that same as that returned by encode_fh - it may be larger. With NFSv2, the filehandle is fixed length, so it may appear longer than expected and be zero-padded. So we must test that fh_len is at least some value, not exactly equal to it. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25Merge branch 'for-4.2/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block IO update from Jens Axboe: "Nothing really major in here, mostly a collection of smaller optimizations and cleanups, mixed with various fixes. In more detail, this contains: - Addition of policy specific data to blkcg for block cgroups. From Arianna Avanzini. - Various cleanups around command types from Christoph. - Cleanup of the suspend block I/O path from Christoph. - Plugging updates from Shaohua and Jeff Moyer, for blk-mq. - Eliminating atomic inc/dec of both remaining IO count and reference count in a bio. From me. - Fixes for SG gap and chunk size support for data-less (discards) IO, so we can merge these better. From me. - Small restructuring of blk-mq shared tag support, freeing drivers from iterating hardware queues. From Keith Busch. - A few cfq-iosched tweaks, from Tahsin Erdogan and me. Makes the IOPS mode the default for non-rotational storage" * 'for-4.2/core' of git://git.kernel.dk/linux-block: (35 commits) cfq-iosched: fix other locations where blkcg_to_cfqgd() can return NULL cfq-iosched: fix sysfs oops when attempting to read unconfigured weights cfq-iosched: move group scheduling functions under ifdef cfq-iosched: fix the setting of IOPS mode on SSDs blktrace: Add blktrace.c to BLOCK LAYER in MAINTAINERS file block, cgroup: implement policy-specific per-blkcg data block: Make CFQ default to IOPS mode on SSDs block: add blk_set_queue_dying() to blkdev.h blk-mq: Shared tag enhancements block: don't honor chunk sizes for data-less IO block: only honor SG gap prevention for merges that contain data block: fix returnvar.cocci warnings block, dm: don't copy bios for request clones block: remove management of bi_remaining when restoring original bi_end_io block: replace trylock with mutex_lock in blkdev_reread_part() block: export blkdev_reread_part() and __blkdev_reread_part() suspend: simplify block I/O handling block: collapse bio bit space block: remove unused BIO_RW_BLOCK and BIO_EOF flags block: remove BIO_EOPNOTSUPP ...
2015-06-23pagemap.h: move dir_pages() over thereFabian Frederick
That function was declared in a lot of filesystems to calculate directory pages. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23nilfs2_direct_IO(): remove dead codeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-19block: remove BIO_EOPNOTSUPPChristoph Hellwig
Since the big barrier rewrite/removal in 2007 we never fail FLUSH or FUA requests, which means we can remove the magic BIO_EOPNOTSUPP flag to help propagating those to the buffer_head layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()Ryusuke Konishi
The range check for b-tree level parameter in nilfs_btree_root_broken() is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even though the level is limited to values in the range of 0 to (NILFS_BTREE_LEVEL_MAX - 1). Since the level parameter is read from storage device and used to index nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it can cause memory overrun during btree operations if the boundary value is set to the level parameter on device. This fixes the broken sanity check and adds a comment to clarify that the upper bound NILFS_BTREE_LEVEL_MAX is exclusive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge third patchbomb from Andrew Morton: - various misc things - a couple of lib/ optimisations - provide DIV_ROUND_CLOSEST_ULL() - checkpatch updates - rtc tree - befs, nilfs2, hfs, hfsplus, fatfs, adfs, affs, bfs - ptrace fixes - fork() fixes - seccomp cleanups - more mmap_sem hold time reductions from Davidlohr * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (138 commits) proc: show locks in /proc/pid/fdinfo/X docs: add missing and new /proc/PID/status file entries, fix typos drivers/rtc/rtc-at91rm9200.c: make IO endian agnostic Documentation/spi/spidev_test.c: fix warning drivers/rtc/rtc-s5m.c: allow usage on device type different than main MFD type .gitignore: ignore *.tar MAINTAINERS: add Mediatek SoC mailing list tomoyo: reduce mmap_sem hold for mm->exe_file powerpc/oprofile: reduce mmap_sem hold for exe_file oprofile: reduce mmap_sem hold for mm->exe_file mips: ip32: add platform data hooks to use DS1685 driver lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help text x86: switch to using asm-generic for seccomp.h sparc: switch to using asm-generic for seccomp.h powerpc: switch to using asm-generic for seccomp.h parisc: switch to using asm-generic for seccomp.h mips: switch to using asm-generic for seccomp.h microblaze: use asm-generic for seccomp.h arm: use asm-generic for seccomp.h seccomp: allow COMPAT sigreturn overrides ...
2015-04-17nilfs2: use inode_set_flags() in nilfs_set_inode_flags()Ryusuke Konishi
Use inode_set_flags() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the FS_IMMUTABLE_FL, FS_APPEND_FL flags to avoid a race where an immutable file has the immutable flag cleared for a brief window of time. This is a similar fix to commit 5f16f3225b06 ("ext4: atomically set inode->i_flags in ext4_set_inode_flags()"). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17nilfs2: put out gfp mask manipulation from nilfs_set_inode_flags()Ryusuke Konishi
nilfs_set_inode_flags() function adjusts gfp-mask of inode->i_mapping as well as i_flags, however, this coupling of operations is not appropriate. For instance, nilfs_ioctl_setflags(), one of three callers of nilfs_set_inode_flags(), doesn't need to reinitialize the gfp-mask at all. In addition, nilfs_new_inode(), another caller of nilfs_set_inode_flags(), doesn't either because it has already initialized the gfp-mask. Only __nilfs_read_inode(), the remaining caller, needs it. So, this moves the gfp mask manipulation to __nilfs_read_inode() from nilfs_set_inode_flags(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17nilfs2: fix gcc warning at nilfs_checkpoint_is_mounted()Ryusuke Konishi
Fix the following build warning: fs/nilfs2/super.c: In function 'nilfs_checkpoint_is_mounted': fs/nilfs2/super.c:1023:10: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (cno < 0 || cno > nilfs->ns_cno) ^ This warning indicates that the comparision "cno < 0" is useless because variable "cno" has an unsigned integer type "__u64". Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17nilfs2: improve execution time of NILFS_IOCTL_GET_CPINFO ioctlRyusuke Konishi
The older a filesystem gets, the slower lscp command becomes. This is because nilfs_cpfile_do_get_cpinfo() function meets more hole blocks as the start offset of valid checkpoint numbers gets bigger. This reduces the overhead by skipping hole blocks efficiently with nilfs_mdt_find_block() helper. A measurement result of this patch is as follows: Before: $ time lscp CNO DATE TIME MODE FLG BLKCNT ICNT 5769303 2015-02-22 19:31:33 cp - 108 1 5769304 2015-02-22 19:38:54 cp - 108 1 real 0m0.182s user 0m0.003s sys 0m0.180s After: $ time lscp CNO DATE TIME MODE FLG BLKCNT ICNT 5769303 2015-02-22 19:31:33 cp - 108 1 5769304 2015-02-22 19:38:54 cp - 108 1 real 0m0.003s user 0m0.001s sys 0m0.002s Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17nilfs2: add helper to find existent block on metadata fileRyusuke Konishi
Add a new metadata file function, nilfs_mdt_find_block(), which finds an existent block on a metadata file in a given range of blocks. This function skips continuous hole blocks efficiently by using nilfs_bmap_seek_key(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17nilfs2: add bmap function to seek a valid keyRyusuke Konishi
Add a new bmap function, nilfs_bmap_seek_key(), which seeks a valid entry and returns its key starting from a given key. This function can be used to skip hole blocks efficiently. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17nilfs2: unify type of key arguments in bmap interfaceRyusuke Konishi
The type of key arguments in block mapping interface varies depending on function. For instance, nilfs_bmap_lookup_at_level() takes "__u64" for its key argument whereas nilfs_bmap_lookup() takes "unsigned long". This fits them to "__u64" to eliminate the variation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17nilfs2: use bgl_lock_ptr()Ryusuke Konishi
Simplify nilfs_mdt_bgl_lock() by utilizing bgl_lock_ptr() helper in <linux/blockgroup_lock.h>. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17nilfs2: use set_mask_bits() for operations on buffer state bitmapRyusuke Konishi
nilfs_forget_buffer(), nilfs_clear_dirty_page(), and nilfs_segctor_complete_write() are using a bunch of atomic bit operations against buffer state bitmap. This reduces the number of them by utilizing set_mask_bits() macro. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17nilfs2: do not use async write flag for segment summary buffersRyusuke Konishi
The async write flag is introduced to nilfs2 in the commit 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks"), but the flag only makes sense for data buffers and btree node buffers. It is not needed for segment summary buffers. This gets rid of the latter uses as part of refactoring of atomic bit operations on buffer state bitmap. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15VFS: normal filesystems (and lustre): d_inode() annotationsDavid Howells
that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11direct_IO: remove rw from a_ops->direct_IO()Omar Sandoval
Now that no one is using rw, remove it completely. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11direct_IO: use iov_iter_rw() instead of rw everywhereOmar Sandoval
The rw parameter to direct_IO is redundant with iov_iter->type, and treated slightly differently just about everywhere it's used: some users do rw & WRITE, and others do rw == WRITE where they should be doing a bitwise check. Simplify this with the new iov_iter_rw() helper, which always returns either READ or WRITE. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11Remove rw from {,__,do_}blockdev_direct_IO()Omar Sandoval
Most filesystems call through to these at some point, so we'll start here. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11make new_sync_{read,write}() staticAl Viro
All places outside of core VFS that checked ->read and ->write for being NULL or called the methods directly are gone now, so NULL {read,write} with non-NULL {read,write}_iter will do the right thing in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11Merge branch 'iocb' into for-nextAl Viro
2015-03-25fs: move struct kiocb to fs.hChristoph Hellwig
struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-12nilfs2: fix deadlock of segment constructor during recoveryRyusuke Konishi
According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck during recovery at mount time. The code path that caused the deadlock was as follows: nilfs_fill_super() load_nilfs() nilfs_salvage_orphan_logs() * Do roll-forwarding, attach segment constructor for recovery, and kick it. nilfs_segctor_thread() nilfs_segctor_thread_construct() * A lock is held with nilfs_transaction_lock() nilfs_segctor_do_construct() nilfs_segctor_drop_written_files() iput() iput_final() write_inode_now() writeback_single_inode() __writeback_single_inode() do_writepages() nilfs_writepage() nilfs_construct_dsync_segment() nilfs_transaction_lock() --> deadlock This can happen if commit 7ef3ff2fea8b ("nilfs2: fix deadlock of segment constructor over I_SYNC flag") is applied and roll-forward recovery was performed at mount time. The roll-forward recovery can happen if datasync write is done and the file system crashes immediately after that. For instance, we can reproduce the issue with the following steps: < nilfs2 is mounted on /nilfs (device: /dev/sdb1) > # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k count=1 && reboot -nfh < the system will immediately reboot > # mount -t nilfs2 /dev/sdb1 /nilfs The deadlock occurs because iput() can run segment constructor through writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags. The above commit changed segment constructor so that it calls iput() asynchronously for inodes with i_nlink == 0, but that change was imperfect. This fixes the another deadlock by deferring iput() in segment constructor even for the case that mount is not finished, that is, for the case that MS_ACTIVE flag is not set. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reported-by: Yuxuan Shui <yshuiv7@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-28nilfs2: fix potential memory overrun on inodeRyusuke Konishi
Each inode of nilfs2 stores a root node of a b-tree, and it turned out to have a memory overrun issue: Each b-tree node of nilfs2 stores a set of key-value pairs and the number of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as a few other "bn_*" members. Since the value of "bn_nchildren" is used for operations on the key-values within the b-tree node, it can cause memory access overrun if a large number is incorrectly set to "bn_nchildren". For instance, nilfs_btree_node_lookup() function determines the range of binary search with it, and too large "bn_nchildren" leads nilfs_btree_node_get_key() in that function to overrun. As for intermediate b-tree nodes, this is prevented by a sanity check performed when each node is read from a drive, however, no sanity check has been done for root nodes stored in inodes. This patch fixes the issue by adding missing sanity check against b-tree root nodes so that it's called when on-memory inodes are read from ifile, inode metadata file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull backing device changes from Jens Axboe: "This contains a cleanup of how the backing device is handled, in preparation for a rework of the life time rules. In this part, the most important change is to split the unrelated nommu mmap flags from it, but also removing a backing_dev_info pointer from the address_space (and inode), and a cleanup of other various minor bits. Christoph did all the work here, I just fixed an oops with pages that have a swap backing. Arnd fixed a missing export, and Oleg killed the lustre backing_dev_info from staging. Last patch was from Al, unexporting parts that are now no longer needed outside" * 'for-3.20/bdi' of git://git.kernel.dk/linux-block: Make super_blocks and sb_lock static mtd: export new mtd_mmap_capabilities fs: make inode_to_bdi() handle NULL inode staging/lustre/llite: get rid of backing_dev_info fs: remove default_backing_dev_info fs: don't reassign dirty inodes to default_backing_dev_info nfs: don't call bdi_unregister ceph: remove call to bdi_unregister fs: remove mapping->backing_dev_info fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info nilfs2: set up s_bdi like the generic mount_bdev code block_dev: get bdev inode bdi directly from the block device block_dev: only write bdev inode on close fs: introduce f_op->mmap_capabilities for nommu mmap support fs: kill BDI_CAP_SWAP_BACKED fs: deduplicate noop_backing_dev_info
2015-02-10mm: drop vm_ops->remap_pages and generic_file_remap_pages() stubKirill A. Shutemov
Nobody uses it anymore. [akpm@linux-foundation.org: fix filemap_xip.c] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-05nilfs2: fix deadlock of segment constructor over I_SYNC flagRyusuke Konishi
Nilfs2 eventually hangs in a stress test with fsstress program. This issue was caused by the following deadlock over I_SYNC flag between nilfs_segctor_thread() and writeback_sb_inodes(): nilfs_segctor_thread() nilfs_segctor_thread_construct() nilfs_segctor_unlock() nilfs_dispose_list() iput() iput_final() evict() inode_wait_for_writeback() * wait for I_SYNC flag writeback_sb_inodes() * set I_SYNC flag on inode->i_state __writeback_single_inode() do_writepages() nilfs_writepages() nilfs_construct_dsync_segment() nilfs_segctor_sync() * wait for completion of segment constructor inode_sync_complete() * clear I_SYNC flag after __writeback_single_inode() completed writeback_sb_inodes() calls do_writepages() for dirty inodes after setting I_SYNC flag on inode->i_state. do_writepages() in turn calls nilfs_writepages(), which can run segment constructor and wait for its completion. On the other hand, segment constructor calls iput(), which can call evict() and wait for the I_SYNC flag on inode_wait_for_writeback(). Since segment constructor doesn't know when I_SYNC will be set, it cannot know whether iput() will block or not unless inode->i_nlink has a non-zero count. We can prevent evict() from being called in iput() by implementing sop->drop_inode(), but it's not preferable to leave inodes with i_nlink == 0 for long periods because it even defers file truncation and inode deallocation. So, this instead resolves the deadlock by calling iput() asynchronously with a workqueue for inodes with i_nlink == 0. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-20fs: remove mapping->backing_dev_infoChristoph Hellwig
Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-20nilfs2: set up s_bdi like the generic mount_bdev codeChristoph Hellwig
mapping->backing_dev_info will go away, so don't rely on it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-12-10nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() racesRyusuke Konishi
Same story as in commit 41080b5a2401 ("nfsd race fixes: ext2") (similar ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead of insert_inode_locked() and a bug of a check for dead inodes needs to be fixed. If nilfs_iget() is called from nfsd after nilfs_new_inode() calls insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode. If nilfs_iget() is called before nilfs_new_inode() calls insert_inode_locked4(), it will create an in-core inode and read its data from the on-disk inode. But, nilfs_iget() will find i_nlink equals zero and fail at nilfs_read_inode_common(), which will lead it to call iget_failed() and cleanly fail. However, this sanity check doesn't work as expected for reused on-disk inodes because they leave a non-zero value in i_mode field and it hinders the test of i_nlink. This patch also fixes the issue by removing the test on i_mode that nilfs2 doesn't need. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10nilfs2: deletion of an unnecessary check before the function call "iput"Markus Elfring
The iput() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10nilfs2: avoid duplicate segment construction for fsync()Andreas Rohner
This patch removes filemap_write_and_wait_range() from nilfs_sync_file(), because it triggers a data segment construction by calling nilfs_writepages() with WB_SYNC_ALL. A data segment construction does not remove the inode from the i_dirty list and it does not clear the NILFS_I_DIRTY flag. Therefore nilfs_inode_dirty() still returns true, which leads to an unnecessary duplicate segment construction in nilfs_sync_file(). A call to filemap_write_and_wait_range() is not needed, because NILFS2 does not rely on the generic writeback mechanisms. Instead it implements its own mechanism to collect all dirty pages and write them into segments. It is more efficient to initiate the segment construction directly in nilfs_sync_file() without the detour over filemap_write_and_wait_range(). Additionally the lock of i_mutex is not needed, because all code blocks that are protected by i_mutex are also protected by a NILFS transaction: Function i_mutex nilfs_transaction ------------------------------------------------------ nilfs_ioctl_setflags: yes yes nilfs_fiemap: yes no nilfs_write_begin: yes yes nilfs_write_end: yes yes nilfs_lookup: yes no nilfs_create: yes yes nilfs_link: yes yes nilfs_mknod: yes yes nilfs_symlink: yes yes nilfs_mkdir: yes yes nilfs_unlink: yes yes nilfs_rmdir: yes yes nilfs_rename: yes yes nilfs_setattr: yes yes For nilfs_lookup() i_mutex is held for the parent directory, to protect it from modification. The segment construction does not modify directory inodes, so no lock is needed. nilfs_fiemap() reads the block layout on the disk, by using nilfs_bmap_lookup_contig(). This is already protected by bmap->b_sem. Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14nilfs2: improve the performance of fdatasync()Andreas Rohner
Support for fdatasync() has been implemented in NILFS2 for a long time, but whenever the corresponding inode is dirty the implementation falls back to a full-flegded sync(). Since every write operation has to update the modification time of the file, the inode will almost always be dirty and fdatasync() will fall back to sync() most of the time. But this fallback is only necessary for a change of the file size and not for a change of the various timestamps. This patch adds a new flag NILFS_I_INODE_SYNC to differentiate between those two situations. * If it is set the file size was changed and a full sync is necessary. * If it is not set then only the timestamps were updated and fdatasync() can go ahead. There is already a similar flag I_DIRTY_DATASYNC on the VFS layer with the exact same semantics. Unfortunately it cannot be used directly, because NILFS2 doesn't implement write_inode() and doesn't clear the VFS flags when inodes are written out. So the VFS writeback thread can clear I_DIRTY_DATASYNC at any time without notifying NILFS2. So I_DIRTY_DATASYNC has to be mapped onto NILFS_I_INODE_SYNC in nilfs_update_inode(). Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()Andreas Rohner
Under normal circumstances nilfs_sync_fs() writes out the super block, which causes a flush of the underlying block device. But this depends on the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the last segment crosses a segment boundary. So if only a small amount of data is written before the call to nilfs_sync_fs(), no flush of the block device occurs. In the above case an additional call to blkdev_issue_flush() is needed. To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device is introduced, which is cleared whenever new logs are written and set whenever the block device is flushed. For convenience the function nilfs_flush_device() is added, which contains the above logic. Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-26nilfs2: fix data loss with mmap()Andreas Rohner
This bug leads to reproducible silent data loss, despite the use of msync(), sync() and a clean unmount of the file system. It is easily reproducible with the following script: ----------------[BEGIN SCRIPT]-------------------- mkfs.nilfs2 -f /dev/sdb mount /dev/sdb /mnt dd if=/dev/zero bs=1M count=30 of=/mnt/testfile umount /mnt mount /dev/sdb /mnt CHECKSUM_BEFORE="$(md5sum /mnt/testfile)" /root/mmaptest/mmaptest /mnt/testfile 30 10 5 sync CHECKSUM_AFTER="$(md5sum /mnt/testfile)" umount /mnt mount /dev/sdb /mnt CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)" umount /mnt echo "BEFORE MMAP:\t$CHECKSUM_BEFORE" echo "AFTER MMAP:\t$CHECKSUM_AFTER" echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT" ----------------[END SCRIPT]-------------------- The mmaptest tool looks something like this (very simplified, with error checking removed): ----------------[BEGIN mmaptest]-------------------- data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE, MAP_SHARED, fd, file_offset); for (i = 0; i < write_count; ++i) { memcpy(data + i * 4096, buf, sizeof(buf)); msync(data, file_size - file_offset, MS_SYNC)) } ----------------[END mmaptest]-------------------- The output of the script looks something like this: BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile So it is clear, that the changes done using mmap() do not survive a remount. This can be reproduced a 100% of the time. The problem was introduced in commit 136e8770cd5d ("nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary"). If the page was read with mpage_readpage() or mpage_readpages() for example, then it has no buffers attached to it. In that case page_has_buffers(page) in nilfs_set_page_dirty() will be false. Therefore nilfs_set_file_dirty() is never called and the pages are never collected and never written to disk. This patch fixes the problem by also calling nilfs_set_file_dirty() if the page has no buffers attached to it. [akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/] Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Tested-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Stuff in here: - acct.c fixes and general rework of mnt_pin mechanism. That allows to go for delayed-mntput stuff, which will permit mntput() on deep stack without worrying about stack overflows - fs shutdown will happen on shallow stack. IOW, we can do Eric's umount-on-rmdir series without introducing tons of stack overflows on new mntput() call chains it introduces. - Bruce's d_splice_alias() patches - more Miklos' rename() stuff. - a couple of regression fixes (stable fodder, in the end of branch) and a fix for API idiocy in iov_iter.c. There definitely will be another pile, maybe even two. I'd like to get Eric's series in this time, but even if we miss it, it'll go right in the beginning of for-next in the next cycle - the tricky part of prereqs is in this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) fix copy_tree() regression __generic_file_write_iter(): fix handling of sync error after DIO switch iov_iter_get_pages() to passing maximal number of pages fs: mark __d_obtain_alias static dcache: d_splice_alias should detect loops exportfs: update Exporting documentation dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED dcache: remove unused d_find_alias parameter dcache: d_obtain_alias callers don't all want DISCONNECTED dcache: d_splice_alias should ignore DCACHE_DISCONNECTED dcache: d_splice_alias mustn't create directory aliases dcache: close d_move race in d_splice_alias dcache: move d_splice_alias namei: trivial fix to vfs_rename_dir comment VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode. cifs: support RENAME_NOREPLACE hostfs: support rename flags shmem: support RENAME_EXCHANGE shmem: support RENAME_NOREPLACE btrfs: add RENAME_NOREPLACE ...
2014-08-08nilfs2: integrate sysfs support into driverVyacheslav Dubeyko
This patch integrates creation of sysfs groups and attributes into NILFS file system driver. It was found the issue with nilfs_sysfs_{create/delete}_snapshot_group functions by Michael L Semon <mlsemon35@gmail.com> in the first version of the patch: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:579 in_atomic(): 1, irqs_disabled(): 0, pid: 32676, name: umount.nilfs2 2 locks held by umount.nilfs2/32676: #0: (&type->s_umount_key#21){++++..}, at: [<790c18e2>] deactivate_super+0x37/0x58 #1: (&(&nilfs->ns_cptree_lock)->rlock){+.+...}, at: [<791bf659>] nilfs_put_root+0x23/0x5a Preemption disabled at:[<791bf659>] nilfs_put_root+0x23/0x5a CPU: 0 PID: 32676 Comm: umount.nilfs2 Not tainted 3.14.0+ #2 Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002 Call Trace: dump_stack+0x4b/0x75 __might_sleep+0x111/0x16f mutex_lock_nested+0x1e/0x3ad kernfs_remove+0x12/0x26 sysfs_remove_dir+0x3d/0x62 kobject_del+0x13/0x38 nilfs_sysfs_delete_snapshot_group+0xb/0xd nilfs_put_root+0x2a/0x5a nilfs_detach_log_writer+0x1ab/0x2c1 nilfs_put_super+0x13/0x68 generic_shutdown_super+0x60/0xd1 kill_block_super+0x1d/0x60 deactivate_locked_super+0x22/0x3f deactivate_super+0x3e/0x58 mntput_no_expire+0xe2/0x141 SyS_oldumount+0x70/0xa5 syscall_call+0x7/0xb The reason of the issue was placement of nilfs_sysfs_{create/delete}_snapshot_group() call under nilfs->ns_cptree_lock protection. But this protection is unnecessary and wrong solution. The second version of the patch fixes this issue. [fengguang.wu@intel.com: nilfs_sysfs_create_mounted_snapshots_group can be static] Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08nilfs2: add /sys/fs/nilfs2/<device>/mounted_snapshots/<snapshot> groupVyacheslav Dubeyko
This patch adds creation of <snapshot> group for every mounted snapshot in /sys/fs/nilfs2/<device>/mounted_snapshots group. The group contains details about mounted snapshot: (1) inodes_count - show number of inodes for snapshot. (2) blocks_count - show number of blocks for snapshot. Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08nilfs2: add /sys/fs/nilfs2/<device>/mounted_snapshots groupVyacheslav Dubeyko
This patch adds creation of /sys/fs/nilfs2/<device>/mounted_snapshots group. The mounted_snapshots group contains group for every mounted snapshot. Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08nilfs2: add /sys/fs/nilfs2/<device>/checkpoints groupVyacheslav Dubeyko
This patch adds creation of /sys/fs/nilfs2/<device>/checkpoints group. The checkpoints group contains attributes that describe details about volume's checkpoints: (1) checkpoints_number - show number of checkpoints on volume. (2) snapshots_number - show number of snapshots on volume. (3) last_seg_checkpoint - show checkpoint number of the latest segment. (4) next_checkpoint - show next checkpoint number. Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08nilfs2: add /sys/fs/nilfs2/<device>/segments groupVyacheslav Dubeyko
This patch adds creation of /sys/fs/nilfs2/<device>/segments group. The segments group contains attributes that describe details about volume's segments: (1) segments_number - show number of segments on volume. (2) blocks_per_segment - show number of blocks in segment. (3) clean_segments - show count of clean segments. (4) dirty_segments - show count of dirty segments. Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08nilfs2: add /sys/fs/nilfs2/<device>/segctor groupVyacheslav Dubeyko
This patch adds creation of /sys/fs/nilfs2/<device>/segctor group. The segctor group contains attributes that describe segctor thread activity details: (1) last_pseg_block - show start block number of the latest segment. (2) last_seg_sequence - show sequence value of the latest segment. (3) last_seg_checkpoint - show checkpoint number of the latest segment. (4) current_seg_sequence - show segment sequence counter. (5) current_last_full_seg - show index number of the latest full segment. (6) next_full_seg - show index number of the full segment index to be used next. (7) next_pseg_offset - show offset of next partial segment in the current full segment. (8) next_checkpoint - show next checkpoint number. (9) last_seg_write_time - show write time of the last segment in human-readable format. (10) last_seg_write_time_secs - show write time of the last segment in seconds. (11) last_nongc_write_time - show write time of the last segment not for cleaner operation in human-readable format. (12) last_nongc_write_time_secs - show write time of the last segment not for cleaner operation in seconds. (13) dirty_data_blocks_count - show number of dirty data blocks. Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>