summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-12-13xfs: format log items write directly into the linear CIL bufferChristoph Hellwig
Instead of setting up pointers to memory locations in iop_format which then get copied into the CIL linear buffer after return move the copy into the individual inode items. This avoids the need to always have a memory block in the exact same layout that gets written into the log around, and allow the log items to be much more flexible in their in-memory layouts. The only caveat is that we need to properly align the data for each iovec so that don't have structures misaligned in subsequent iovecs. Note that all log item format routines now need to be careful to modify the copy of the item that was placed into the CIL after calls to xlog_copy_iovec instead of the in-memory copy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13xfs: introduce xlog_copy_iovecChristoph Hellwig
Add a helper to abstract out filling the log iovecs in the log item format handlers. This will allow us to change the way we do the log item formatting more easily. The copy in the name is a bit confusing for now as it just assigns a pointer and lets the CIL code perform the copy, but that will change soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13xfs: refactor xfs_inode_item_formatChristoph Hellwig
Split out a function to handle the data and attr fork, as well as a helper for the really old v1 inodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13xfs: refactor xfs_inode_item_sizeChristoph Hellwig
Split out two helpers to size the data and attribute to make the function more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13xfs: refactor xfs_buf_item_format_segmentChristoph Hellwig
Add two helpers to make the code more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13xfs: remove duplicate code in xlog_cil_insert_format_itemsChristoph Hellwig
Share code that was previously duplicated in two branches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-11xfs: align initial file allocations correctlyDave Chinner
The function xfs_bmap_isaeof() is used to indicate that an allocation is occurring at or past the end of file, and as such should be aligned to the underlying storage geometry if possible. Commit 27a3f8f ("xfs: introduce xfs_bmap_last_extent") changed the behaviour of this function for empty files - it turned off allocation alignment for this case accidentally. Hence large initial allocations from direct IO are not getting correctly aligned to the underlying geometry, and that is cause write performance to drop in alignment sensitive configurations. Fix it by considering allocation into empty files as requiring aligned allocation again. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-11xfs: fix calculation of freed inode cluster blocksBen Myers
rec.ir_startino is an agino rather than an ino. Use the correct macro when dealing with it in xfs_difree. Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2013-12-11xfs: xfs_dir2_block_to_sf temp buffer allocation failsDave Chinner
If we are using a large directory block size, and memory becomes fragmented, we can get memory allocation failures trying to kmem_alloc(64k) for a temporary buffer. However, there is not need for a directory buffer sized allocation, as the end result ends up in the inode literal area. This is, at most, slightly less than 2k of space, and hence we don't need an allocation larger than that fora temporary buffer. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-09xfs: fix infinite loop by detaching the group/project hints from user dquotJie Liu
xfs_quota(8) will hang up if trying to turn group/project quota off before the user quota is off, this could be 100% reproduced by: # mount -ouquota,gquota /dev/sda7 /xfs # mkdir /xfs/test # xfs_quota -xc 'off -g' /xfs <-- hangs up # echo w > /proc/sysrq-trigger # dmesg SysRq : Show Blocked State task PC stack pid father xfs_quota D 0000000000000000 0 27574 2551 0x00000000 [snip] Call Trace: [<ffffffff81aaa21d>] schedule+0xad/0xc0 [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0 [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0 [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0 [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs] [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30 [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs] [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs] [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs] [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs] [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs] [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs] [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs] [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs] [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs] [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0 [<ffffffff812fba4b>] ? iput+0x5b/0x90 [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0 [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b It's fine if we turn user quota off at first, then turn off other kind of quotas if they are enabled since the group/project dquot refcount is decreased to zero once the user quota if off. Otherwise, those dquots refcount is non-zero due to the user dquot might refer to them as hint(s). Hence, above operation cause an infinite loop at xfs_qm_dquot_walk() while trying to purge dquot cache. This problem has been around since Linux 3.4, it was introduced by: [ b84a3a9675 xfs: remove the per-filesystem list of dquots ] Originally we will release the group dquot pointers because the user dquots maybe carrying around as a hint via xfs_qm_detach_gdquots(). However, with above change, there is no such work to be done before purging group/project dquot cache. In order to solve this problem, this patch introduces a special routine xfs_qm_dqpurge_hints(), and it would release the group/project dquot pointers the user dquots maybe carrying around as a hint, and then it will proceed to purge the user dquot cache if requested. Cc: stable@vger.kernel.org Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-09xfs: fix assertion failure at xfs_setattr_nonsizeJie Liu
For CRC enabled v5 super block, change a file's ownership can simply trigger an ASSERT failure at xfs_setattr_nonsize() if both group and project quota are enabled, i.e, [ 305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621 [ 305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable] [ 305.383939] Call Trace: [ 305.385536] [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs] [ 305.387142] [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs] [ 305.388727] [<ffffffff811ca388>] notify_change+0x1a8/0x350 [ 305.390298] [<ffffffff811ac39d>] chown_common+0xfd/0x110 [ 305.391868] [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110 [ 305.393440] [<ffffffff811ad760>] SyS_lchown+0x20/0x30 [ 305.394995] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f [ 305.399870] RIP [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs] This fix adjust the assertion to check if the super block support both quota inodes or not. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-06xfs: add xfs_setattr_timeChristoph Hellwig
Split out a xfs_setattr_time helper to share code between truncate and regular setattr similar to xfs_setattr_mode. I might also have another caller growing for this in the near future. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-06xfs: tiny xfs_setattr_mode cleanupChristoph Hellwig
Remove the pointless tp argument, and properly align the local variable declarations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-06xfs: fix false assertion at xfs_qm_vop_create_dqattachJie Liu
After the previous fix, there still has another ASSERT failure if turning off any type of quota while fsstress is running at the same time. Backtrace in this case: [ 50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118 [ 50.867924] ------------[ cut here ]------------ ... <snip> [ 50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable] [ 50.867999] invalid opcode: 0000 [#1] SMP [ 50.869407] Call Trace: [ 50.869446] [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs] [ 50.869512] [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs] [ 50.869564] [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs] [ 50.869615] [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs] [ 50.869655] [<ffffffff811becd5>] vfs_mkdir+0x95/0x130 [ 50.869689] [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0 [ 50.869723] [<ffffffff811bf689>] SyS_mkdir+0x19/0x20 [ 50.869757] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f [ 50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip> [ 50.870003] RIP [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs] [ 50.870050] RSP <ffff88002941fd60> [ 50.879251] ---[ end trace c93a2b342341c65b ]--- We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(), however the assertion itself is not right IMHO. While performing quota off, we firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking any special locks, see xfs_qm_scall_quotaoff(). Hence there is no guarantee that the desired quota is still active. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-06xfs: integrate xfs_quota_priv header file to xfs_qmJie Liu
The xfs_quota_priv header file is only included by xfs_qm header and there is no much users for its contents, hence we can move those stuff to xfs_qm header file and kill it. This patch also remove an unused macro DQFLAGTO_TYPESTR. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-06xfs: make quota metadata truncation behavior consistent to user spaceJie Liu
In xfs_qm_scall_trunc_qfiles(), we ignore the error if failed to remove the users quota metadata and proceed to remove groups and projects if they are being there. However, in user space, the remove operation will break and return if failed to remove any kind of quota. Also for v5 super block, we can enabled both group and project quota at the same time, in this case the current error handling will cover the group error with projects but they might failed due to different reasons. It seems we'd better the error handling consistent to the user space and don't trying to remove another kind of quota metadata if the previous operation is failed. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05xfs: fix memory leak in xfs_dir2_node_removenameMark Tinguely
Fix the leak of kernel memory in xfs_dir2_node_removename() when xfs_dir2_leafn_remove() returns an error code. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05xfs: free the list of recovery items on errorMark Tinguely
Recovery builds a list of items on the transaction's r_itemq head. Normally these items are committed and freed. But in the event of a recovery error, these allocations are leaked. If the error occurs during item reordering, then reconstruct the r_itemq list before deleting the list to avoid leaking the entries that were on one of the temporary lists. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05fs: fix iversion handlingChristoph Hellwig
Currently notify_change directly updates i_version for size updates, which not only is counter to how all other fields are updated through struct iattr, but also breaks XFS, which need inode updates to happen under its own lock, and synchronized to the structure that gets written to the log. Remove the update in the common code, and it to btrfs and ext4, XFS already does a proper updaste internally and currently gets a double update with the existing code. IMHO this is 3.13 and -stable material and should go in through the XFS tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Acked-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05xfs: growfs overruns AGFL buffer on V4 filesystemsDave Chinner
This loop in xfs_growfs_data_private() is incorrect for V4 superblocks filesystems: for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++) agfl->agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK); For V4 filesystems, we don't have a agfl header structure, and so XFS_AGFL_SIZE() returns an entire sector's worth of entries, which we then index from an offset into the sector. Hence: buffer overrun. This problem was introduced in 3.10 by commit 77c95bba ("xfs: add CRC checks to the AGFL") which changed the AGFL structure but failed to update the growfs code to handle the different structures. Fix it by using the correct offset into the buffer for both V4 and V5 filesystems. Cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04xfs: don't perform discard if the given range length is less than block sizeJie Liu
For discard operation, we should return EINVAL if the given range length is less than a block size, otherwise it will go through the file system to discard data blocks as the end range might be evaluated to -1, e.g, # fstrim -v -o 0 -l 100 /xfs7 /xfs7: 9811378176 bytes were trimmed This issue can be triggered via xfstests/generic/288. Also, it seems to get the request queue pointer via bdev_get_queue() instead of the hard code pointer dereference is not a bad thing. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04xfs: fix the comment explaining xfs_trans_dqlockedjoinChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04xfs: underflow bug in xfs_attrlist_by_handle()Dan Carpenter
If we allocate less than sizeof(struct attrlist) then we end up corrupting memory or doing a ZERO_PTR_SIZE dereference. This can only be triggered with CAP_SYS_ADMIN. Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04xfs: remove unused FI_ flagsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com.> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04xfs: simplify xfs_setsize_buftarg callchain; remove unused argEric Sandeen
The "verbose" argument to xfs_setsize_buftarg_flags() has been unused since: ffe37436 xfs: stop using the page cache to back the buffer cache Remove it, and fold the function into xfs_setsize_buftarg() now that there's no need for different types of callers. Fix inconsistent comment spacing while we're at it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-11-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs dentry reference count fix from Al Viro. This fixes a possible inode_permission NULL pointer dereference (and other problems) that were due to the root dentry count being decremented too much. In commit 48a066e72d97 ("RCU'd vfsmounts") the placement of clearing the LOOKUP_RCU bit changed, and we then returned failure of incrementing the lockref on the parent dentry with LOOKUP_RCU cleared. But that meant we needed to go through the same cleanup routines that the later failures did wrt LOOKUP_ROOT and nd->root. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix bogus path_put() of nd->root after some unlazy_walk() failures
2013-11-29fix bogus path_put() of nd->root after some unlazy_walk() failuresAl Viro
Failure to grab reference to parent dentry should go through the same cleanup as nd->seq mismatch. As it is, we might end up with caller thinking it needs to path_put() nd->root, with obvious nasty results once we'd hit that bug enough times to drive the refcount of root dentry all the way to zero... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-28Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "SMB3 "validate negotiate" is needed to prevent certain types of downgrade attacks. Also changes SMB2/SMB3 copy offload from using the BTRFS copy ioctl (BTRFS_IOC_CLONE) to a cifs specific ioctl (CIFS_IOC_COPYCHUNK_FILE) to address Christoph's comment that there are semantic differences between requesting copy offload in which copy-on-write is mandatory (as in the BTRFS ioctl) and optional in the SMB2/SMB3 case. Also fixes SMB2/SMB3 copychunk for large files" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload Check SMB3 dialects against downgrade attacks Removed duplicated (and unneeded) goto CIFS: Fix SMB2/SMB3 Copy offload support (refcopy) for large files
2013-11-27Merge tag 'driver-core-3.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are 3 patches for sysfs issues that have been reported. Well, 1 patch really, the first one is reverted as it's not really needed (the correct fix is coming in through the different driver subsystems instead) But that 1 sysfs fix is needed, so this is still a good thing to pull in now" Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> * tag 'driver-core-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: Revert "sysfs: handle duplicate removal attempts in sysfs_remove_group()" sysfs: use a separate locking class for open files depending on mmap sysfs: handle duplicate removal attempts in sysfs_remove_group()
2013-11-27remove obsolete references to powertweakDave Jones
This tool hasn't been maintained in over a decade, and is pretty much useless these days. Let's pretend it never happened. Also remove a long-dead email address. Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-27Revert "sysfs: handle duplicate removal attempts in sysfs_remove_group()"Greg Kroah-Hartman
This reverts commit 54d71145a4548330313ca664a4a009772fe8b7dd. The root cause of these "inverted" sysfs removals have now been found, so there is no need for this patch. Keep this functionality around so that this type of error doesn't show up in driver code again. Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-26Merge branch 'for-linus-bugs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph bug-fixes from Sage Weil: "These include a couple fixes to the new fscache code that went in during the last cycle (which will need to go stable@ shortly as well), a couple client-side directory fragmentation fixes, a fix for a race in the cap release queuing path, and a couple race fixes in the request abort and resend code. Obviously some of this could have gone into 3.12 final, but I preferred to overtest rather than send things in for a late -rc, and then my travel schedule intervened" * 'for-linus-bugs' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: allocate non-zero page to fscache in readpage() ceph: wake up 'safe' waiters when unregistering request ceph: cleanup aborted requests when re-sending requests. ceph: handle race between cap reconnect and cap release ceph: set caps count after composing cap reconnect message ceph: queue cap release in __ceph_remove_cap() ceph: handle frag mismatch between readdir request and reply ceph: remove outdated frag information ceph: hung on ceph fscache invalidate in some cases
2013-11-25[CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offloadSteve French
Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead of BTRFS_IOC_CLONE to avoid confusion about whether copy-on-write is required or optional for this operation. SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since they both speed up copy by offloading the copy rather than passing many read and write requests back and forth and both have identical syntax (passing file handles), but for SMB2/SMB3 CopyChunk the server is not required to use copy-on-write to make a copy of the file (although some do), and Christoph has commented that since CopyChunk does not require copy-on-write we should not reuse BTRFS_IOC_CLONE. This patch renames the ioctl to use a cifs specific IOCTL CIFS_IOCTL_COPYCHUNK. This ioctl is particularly important for SMB2/SMB3 since large file copy over the network otherwise can be very slow, and with this is often more than 100 times faster putting less load on server and client. Note that if a copy syscall is ever introduced, depending on its requirements/format it could end up using one of the other three methods that CIFS/SMB2/SMB3 can do for copy offload, but this method is particularly useful for file copy and broadly supported (not just by Samba server). Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: David Disseldorp <ddiss@samba.org>
2013-11-23ceph: allocate non-zero page to fscache in readpage()Li Wang
ceph_osdc_readpages() returns number of bytes read, currently, the code only allocate full-zero page into fscache, this patch fixes this. Signed-off-by: Li Wang <liwang@ubuntukylin.com> Reviewed-by: Milosz Tanski <milosz@adfin.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-11-23ceph: wake up 'safe' waiters when unregistering requestYan, Zheng
We also need to wake up 'safe' waiters if error occurs or request aborted. Otherwise sync(2)/fsync(2) may hang forever. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
2013-11-23ceph: cleanup aborted requests when re-sending requests.Yan, Zheng
Aborted requests usually get cleared when the reply is received. If MDS crashes, no reply will be received. So we need to cleanup aborted requests when re-sending requests. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
2013-11-23ceph: handle race between cap reconnect and cap releaseYan, Zheng
When a cap get released while composing the cap reconnect message. We should skip queuing the release message if the cap hasn't been added to the cap reconnect message. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-11-23ceph: set caps count after composing cap reconnect messageYan, Zheng
It's possible that some caps get released while composing the cap reconnect message. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-11-23ceph: queue cap release in __ceph_remove_cap()Yan, Zheng
call __queue_cap_release() in __ceph_remove_cap(), this avoids acquiring s_cap_lock twice. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-11-23sysfs: use a separate locking class for open files depending on mmapTejun Heo
The following two commits implemented mmap support in the regular file path and merged bin file support into the regular path. 73d9714627ad ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c") 3124eb1679b2 ("sysfs: merge regular and bin file handling") After the merge, the following commands trigger a spurious lockdep warning. "test-mmap-read" simply mmaps the file and dumps the content. $ cat /sys/block/sda/trace/act_mask $ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096 ====================================================== [ INFO: possible circular locking dependency detected ] 3.12.0-work+ #378 Not tainted ------------------------------------------------------- test-mmap-read/567 is trying to acquire lock: (&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120 but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++++}: ... -> #2 (sr_mutex){+.+.+.}: ... -> #1 (&bdev->bd_mutex){+.+.+.}: ... -> #0 (&of->mutex){+.+.+.}: ... other info that might help us debug this: Chain exists of: &of->mutex --> sr_mutex --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(sr_mutex); lock(&mm->mmap_sem); lock(&of->mutex); *** DEADLOCK *** 1 lock held by test-mmap-read/567: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0 stack backtrace: CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80 ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208 0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0 Call Trace: [<ffffffff81611ad2>] dump_stack+0x4e/0x7a [<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf [<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60 [<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0 [<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0 [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120 [<ffffffff8115d363>] mmap_region+0x3b3/0x5b0 [<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0 [<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0 [<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250 [<ffffffff81008282>] SyS_mmap+0x22/0x30 [<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b This happens because one file nests sr_mutex, which nests mm->mmap_sem under it, under of->mutex while mmap implementation naturally nests of->mutex under mm->mmap_sem. The warning is false positive as of->mutex is per open-file and the two paths belong to two different files. This warning didn't trigger before regular and bin file supports were merged because only bin file supported mmap and the other side of locking happened only on regular files which used equivalent but separate locking. It'd be best if we give separate locking classes per file but we can't easily do that. Let's differentiate on ->mmap() for now. Later we'll add explicit file operations struct and can add per-ops lockdep key there. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-23sysfs: handle duplicate removal attempts in sysfs_remove_group()Mika Westerberg
Commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive) changed the behavior so that directory removals will be done recursively. This means that the sysfs group might already be removed if its parent directory has been removed. The current code outputs warnings similar to following log snippet when it detects that there is no group for the given kobject: WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0() sysfs group ffffffff81c6f1e0 not found for kobject 'host7' Modules linked in: CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13 Hardware name: /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013 Workqueue: kacpi_hotplug acpi_hotplug_work_fn 0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8 ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0 ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48 Call Trace: [<ffffffff817daab1>] dump_stack+0x45/0x56 [<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0 [<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50 [<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70 [<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0 [<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50 [<ffffffff8142a0d0>] device_del+0x40/0x1b0 [<ffffffff8142a24d>] device_unregister+0xd/0x20 [<ffffffff8144131a>] scsi_remove_host+0xba/0x110 [<ffffffff8145f526>] ata_host_detach+0xc6/0x100 [<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20 [<ffffffff812e8f48>] pci_device_remove+0x28/0x60 [<ffffffff8142d854>] __device_release_driver+0x64/0xd0 [<ffffffff8142d8de>] device_release_driver+0x1e/0x30 [<ffffffff8142d257>] bus_remove_device+0xf7/0x140 [<ffffffff8142a1b1>] device_del+0x121/0x1b0 [<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0 [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0 [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0 [<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20 [<ffffffff812fc743>] trim_stale_devices+0x73/0xe0 [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0 [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0 [<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0 [<ffffffff812fd90d>] hotplug_event+0xcd/0x160 [<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60 [<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22 [<ffffffff8105cf3a>] process_one_work+0x17a/0x430 [<ffffffff8105db29>] worker_thread+0x119/0x390 [<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0 [<ffffffff81063a5d>] kthread+0xcd/0xf0 [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180 [<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0 [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180 On this particular machine I see ~16 of these message during Thunderbolt hot-unplug. Fix this in similar way that was done for sysfs_remove_one() by checking if the parent directory has already been removed and bailing out early. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-22Merge tag 'ecryptfs-3.13-rc1-quiet-checkers' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull minor eCryptfs fix from Tyler Hicks: "Quiet static checkers by removing unneeded conditionals" * tag 'ecryptfs-3.13-rc1-quiet-checkers' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: file->private_data is always valid
2013-11-22Merge git://git.kvack.org/~bcrl/aio-nextLinus Torvalds
Pull aio fixes from Benjamin LaHaise. * git://git.kvack.org/~bcrl/aio-next: aio: nullify aio->ring_pages after freeing it aio: prevent double free in ioctx_alloc aio: Fix a trinity splat
2013-11-22Merge branch 'for-3.13' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfixes from Bruce Fields: "A couple nfsd bugfixes" * 'for-3.13' of git://linux-nfs.org/~bfields/linux: nfsd4: fix xdr decoding of large non-write compounds nfsd: make sure to balance get/put_write_access nfsd: split up nfsd_setattr
2013-11-22Merge tag 'gfs2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes Pull GFS2 fixes from Steven Whitehouse: "A couple of small, but important bug fixes for GFS2. The first one fixes a possible NULL pointer dereference, and the second one resolves a reference counting issue in one of the lesser used paths through atomic_open" * tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: Fix ref count bug relating to atomic_open GFS2: fix potential NULL pointer dereference
2013-11-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Almost all of these are bug fixes. Dave Sterba's documentation update is the big exception because he removed our promises to set any machine running Btrfs on fire" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Documentation: filesystems: update btrfs tools section Documentation: filesystems: add new btrfs mount options btrfs: update kconfig help text btrfs: fix bio_size_ok() for max_sectors > 0xffff btrfs: Use trace condition for get_extent tracepoint btrfs: fix typo in the log message Btrfs: fix list delete warning when removing ordered root from the list Btrfs: print bytenr instead of page pointer in check-int Btrfs: remove dead codes from ctree.h Btrfs: don't wait for ordered data outside desired range Btrfs: fix lockdep error in async commit Btrfs: avoid heavy operations in btrfs_commit_super Btrfs: fix __btrfs_start_workers retval Btrfs: disable online raid-repair on ro mounts Btrfs: do not inc uncorrectable_errors counter on ro scrubs Btrfs: only drop modified extents if we logged the whole inode Btrfs: make sure to copy everything if we rename Btrfs: don't BUG_ON() if we get an error walking backrefs
2013-11-22Merge tag 'xfs-for-linus-v3.13-rc1-2' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull second xfs update from Ben Myers: "There are a couple of patches that I wasn't quite sure about in time for our initial 3.13 pull request, a bugfix, and an update to add Dave to MAINTAINERS: Here we have a performance fix for inode iversion, increased inode cluster size for v5 superblock filesystems, a fix for error handling in xfs_bmap_add_attrfork, and a MAINTAINERS update to add Dave" * tag 'xfs-for-linus-v3.13-rc1-2' of git://oss.sgi.com/xfs/xfs: xfs: open code inc_inode_iversion when logging an inode xfs: increase inode cluster size for v5 filesystems xfs: fix unlock in xfs_bmap_add_attrfork xfs: update maintainers
2013-11-21Merge branch 'akpm' (fixes from Andrew)Linus Torvalds
Merge patches from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: place page->pmd_huge_pte to right union MAINTAINERS: add keyboard driver to Hyper-V file list x86, mm: do not leak page->ptl for pmd page tables ipc,shm: correct error return value in shmctl (SHM_UNLOCK) mm, mempolicy: silence gcc warning block/partitions/efi.c: fix bound check ARM: drivers/rtc/rtc-at91rm9200.c: disable interrupts at shutdown mm: hugetlbfs: fix hugetlbfs optimization kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS cleanly ipc,shm: fix shm_file deletion races mm: thp: give transparent hugepage code a separate copy_page checkpatch: fix "Use of uninitialized value" warnings configfs: fix race between dentry put and lookup
2013-11-21Merge git://git.infradead.org/users/eparis/auditLinus Torvalds
Pull audit updates from Eric Paris: "Nothing amazing. Formatting, small bug fixes, couple of fixes where we didn't get records due to some old VFS changes, and a change to how we collect execve info..." Fixed conflict in fs/exec.c as per Eric and linux-next. * git://git.infradead.org/users/eparis/audit: (28 commits) audit: fix type of sessionid in audit_set_loginuid() audit: call audit_bprm() only once to add AUDIT_EXECVE information audit: move audit_aux_data_execve contents into audit_context union audit: remove unused envc member of audit_aux_data_execve audit: Kill the unused struct audit_aux_data_capset audit: do not reject all AUDIT_INODE filter types audit: suppress stock memalloc failure warnings since already managed audit: log the audit_names record type audit: add child record before the create to handle case where create fails audit: use given values in tty_audit enable api audit: use nlmsg_len() to get message payload length audit: use memset instead of trying to initialize field by field audit: fix info leak in AUDIT_GET requests audit: update AUDIT_INODE filter rule to comparator function audit: audit feature to set loginuid immutable audit: audit feature to only allow unsetting the loginuid audit: allow unsetting the loginuid (with priv) audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE audit: loginuid functions coding style selinux: apply selinux checks on new audit message types ...
2013-11-21configfs: fix race between dentry put and lookupJunxiao Bi
A race window in configfs, it starts from one dentry is UNHASHED and end before configfs_d_iput is called. In this window, if a lookup happen, since the original dentry was UNHASHED, so a new dentry will be allocated, and then in configfs_attach_attr(), sd->s_dentry will be updated to the new dentry. Then in configfs_d_iput(), BUG_ON(sd->s_dentry != dentry) will be triggered and system panic. sys_open: sys_close: ... fput dput dentry_kill __d_drop <--- dentry unhashed here, but sd->dentry still point to this dentry. lookup_real configfs_lookup configfs_attach_attr---> update sd->s_dentry to new allocated dentry here. d_kill configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry) triggered here. To fix it, change configfs_d_iput to not update sd->s_dentry if sd->s_count > 2, that means there are another dentry is using the sd beside the one that is going to be put. Use configfs_dirent_lock in configfs_attach_attr to sync with configfs_d_iput. With the following steps, you can reproduce the bug. 1. enable ocfs2, this will mount configfs at /sys/kernel/config and fill configure in it. 2. run the following script. while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done & while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done & Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>