summaryrefslogtreecommitdiff
path: root/fs/inode.c
AgeCommit message (Collapse)Author
2019-06-22Merge 4.4.183 into android-4.4-pGreg Kroah-Hartman
Changes in 4.4.183 fs/fat/file.c: issue flush after the writeback of FAT sysctl: return -EINVAL if val violates minmax ipc: prevent lockup on alloc_msg and free_msg hugetlbfs: on restore reserve error path retain subpool reservation mm/cma.c: fix crash on CMA allocation if bitmap allocation fails mm/cma_debug.c: fix the break condition in cma_maxchunk_get() kernel/sys.c: prctl: fix false positive in validate_prctl_map() mfd: intel-lpss: Set the device in reset state when init mfd: twl6040: Fix device init errors for ACCCTL register perf/x86/intel: Allow PEBS multi-entry in watermark mode drm/bridge: adv7511: Fix low refresh rate selection ntp: Allow TAI-UTC offset to be set to zero f2fs: fix to avoid panic in do_recover_data() f2fs: fix to do sanity check on valid block count of segment iommu/vt-d: Set intel_iommu_gfx_mapped correctly ALSA: hda - Register irq handler after the chip initialization nvmem: core: fix read buffer in place fuse: retrieve: cap requested size to negotiated max_write nfsd: allow fh_want_write to be called twice x86/PCI: Fix PCI IRQ routing table memory leak platform/chrome: cros_ec_proto: check for NULL transfer function soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288 ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ahb" clock to SDMA ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ipg" clock to SDMA ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG as "ipg" clock to SDMA PCI: rpadlpar: Fix leaked device_node references in add/remove paths PCI: rcar: Fix a potential NULL pointer dereference video: hgafb: fix potential NULL pointer dereference video: imsttfb: fix potential NULL pointer dereferences PCI: xilinx: Check for __get_free_pages() failure gpio: gpio-omap: add check for off wake capable gpios dmaengine: idma64: Use actual device for DMA transfers pwm: tiehrpwm: Update shadow register for disabling PWMs ARM: dts: exynos: Always enable necessary APIO_1V8 and ABB_1V8 regulators on Arndale Octa pwm: Fix deadlock warning when removing PWM device ARM: exynos: Fix undefined instruction during Exynos5422 resume futex: Fix futex lock the wrong page Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR connections" ALSA: seq: Cover unsubscribe_port() in list_mutex libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node fs/ocfs2: fix race in ocfs2_dentry_attach_lock() signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO ptrace: restore smp_rmb() in __ptrace_may_access() i2c: acorn: fix i2c warning bcache: fix stack corruption by PRECEDING_KEY() cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css() ASoC: cs42xx8: Add regcache mask dirty Drivers: misc: fix out-of-bounds access in function param_set_kgdbts_var scsi: lpfc: add check for loss of ndlp when sending RRQ scsi: bnx2fc: fix incorrect cast to u64 on shift operation usbnet: ipheth: fix racing condition KVM: x86/pmu: do not mask the value that is written to fixed PMUs KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION drm/vmwgfx: integer underflow in vmw_cmd_dx_set_shader() leading to an invalid read drm/vmwgfx: NULL pointer dereference from vmw_cmd_dx_view_define() USB: Fix chipmunk-like voice when using Logitech C270 for recording audio. USB: usb-storage: Add new ID to ums-realtek USB: serial: pl2303: add Allied Telesis VT-Kit3 USB: serial: option: add support for Simcom SIM7500/SIM7600 RNDIS mode USB: serial: option: add Telit 0x1260 and 0x1261 compositions ax25: fix inconsistent lock state in ax25_destroy_timer be2net: Fix number of Rx queues used for flow hashing ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero lapb: fixed leak of control-blocks. neigh: fix use-after-free read in pneigh_get_next sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg mISDN: make sure device name is NUL terminated x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor perf/ring_buffer: Fix exposing a temporarily decreased data_head perf/ring_buffer: Add ordering to rb->nest increment gpio: fix gpio-adp5588 build errors net: tulip: de4x5: Drop redundant MODULE_DEVICE_TABLE() i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr configfs: Fix use-after-free when accessing sd->s_dentry ia64: fix build errors by exporting paddr_to_nid() KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list net: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route() scsi: libsas: delete sas port if expander discover failed Revert "crypto: crypto4xx - properly set IV after de- and encrypt" coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping Abort file_remove_privs() for non-reg. files Linux 4.4.183 Change-Id: I26e2772a587b1dcf557adede5bcff66962f72432 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-22Abort file_remove_privs() for non-reg. filesAlexander Lochmann
commit f69e749a49353d96af1a293f56b5b56de59c668a upstream. file_remove_privs() might be called for non-regular files, e.g. blkdev inode. There is no reason to do its job on things like blkdev inodes, pipes, or cdevs. Hence, abort if file does not refer to a regular inode. AV: more to the point, for devices there might be any number of inodes refering to given device. Which one to strip the permissions from, even if that made any sense in the first place? All of them will be observed with contents modified, after all. Found by LockDoc (Alexander Lochmann, Horst Schirmeier and Olaf Spinczyk) Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de> Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Zubin Mithra <zsm@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03Merge 4.4.178 into android-4.4-pGreg Kroah-Hartman
Changes in 4.4.178 mmc: pxamci: fix enum type confusion drm/vmwgfx: Don't double-free the mode stored in par->set_mode udf: Fix crash on IO error during truncate mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction. MIPS: Fix kernel crash for R6 in jump label branch function futex: Ensure that futex address is aligned in handle_futex_death() ext4: fix NULL pointer dereference while journal is aborted ext4: fix data corruption caused by unaligned direct AIO ext4: brelse all indirect buffer in ext4_ind_remove_space() mmc: tmio_mmc_core: don't claim spurious interrupts media: v4l2-ctrls.c/uvc: zero v4l2_event locking/lockdep: Add debug_locks check in __lock_downgrade() ALSA: hda - Record the current power state before suspend/resume calls ALSA: hda - Enforces runtime_resume after S3 and S4 for each codec mmc: pwrseq_simple: Make reset-gpios optional to match doc mmc: debugfs: Add a restriction to mmc debugfs clock setting mmc: make MAN_BKOPS_EN message a debug mmc: sanitize 'bus width' in debug output mmc: core: shut up "voltage-ranges unspecified" pr_info() usb: dwc3: gadget: Fix suspend/resume during device mode arm64: mm: Add trace_irqflags annotations to do_debug_exception() mmc: core: fix using wrong io voltage if mmc_select_hs200 fails mm/rmap: replace BUG_ON(anon_vma->degree) with VM_WARN_ON extcon: usb-gpio: Don't miss event during suspend/resume kbuild: setlocalversion: print error to STDERR usb: gadget: composite: fix dereference after null check coverify warning usb: gadget: Add the gserial port checking in gs_start_tx() tcp/dccp: drop SYN packets if accept queue is full serial: sprd: adjust TIMEOUT to a big value Hang/soft lockup in d_invalidate with simultaneous calls arm64: traps: disable irq in die() usb: renesas_usbhs: gadget: fix unused-but-set-variable warning serial: sprd: clear timeout interrupt only rather than all interrupts lib/int_sqrt: optimize small argument USB: core: only clean up what we allocated rtc: Fix overflow when converting time64_t to rtc_time ath10k: avoid possible string overflow Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task() mmc: block: Allow more than 8 partitions per card arm64: fix COMPAT_SHMLBA definition for large pages efi: stub: define DISABLE_BRANCH_PROFILING for all architectures ARM: 8458/1: bL_switcher: add GIC dependency ARM: 8494/1: mm: Enable PXN when running non-LPAE kernel on LPAE processor android: unconditionally remove callbacks in sync_fence_free() vmstat: make vmstat_updater deferrable again and shut down on idle hid-sensor-hub.c: fix wrong do_div() usage arm64: hide __efistub_ aliases from kallsyms perf: Synchronously free aux pages in case of allocation failure net: diag: support v4mapped sockets in inet_diag_find_one_icsk() Revert "mmc: block: don't use parameter prefix if built as module" writeback: initialize inode members that track writeback history coresight: fixing lockdep error coresight: coresight_unregister() function cleanup coresight: release reference taken by 'bus_find_device()' coresight: remove csdev's link from topology stm class: Fix locking in unbinding policy path stm class: Fix link list locking stm class: Prevent user-controllable allocations stm class: Support devices with multiple instances stm class: Fix unlocking braino in the error path stm class: Guard output assignment against concurrency stm class: Fix unbalanced module/device refcounting stm class: Fix a race in unlinking coresight: "DEVICE_ATTR_RO" should defined as static. coresight: etm4x: Check every parameter used by dma_xx_coherent. asm-generic: Fix local variable shadow in __set_fixmap_offset staging: ashmem: Avoid deadlock with mmap/shrink staging: ashmem: Add missing include staging: ion: Set minimum carveout heap allocation order to PAGE_SHIFT staging: goldfish: audio: fix compiliation on arm ARM: 8510/1: rework ARM_CPU_SUSPEND dependencies arm64/kernel: fix incorrect EL0 check in inv_entry macro mac80211: fix "warning: ‘target_metric’ may be used uninitialized" perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops arm64: kernel: Include _AC definition in page.h PM / Hibernate: Call flush_icache_range() on pages restored in-place stm class: Do not leak the chrdev in error path stm class: Fix stm device initialization order ipv6: fix endianness error in icmpv6_err usb: gadget: configfs: add mutex lock before unregister gadget usb: gadget: rndis: free response queue during REMOTE_NDIS_RESET_MSG cpu/hotplug: Handle unbalanced hotplug enable/disable video: fbdev: Set pixclock = 0 in goldfishfb arm64: kconfig: drop CONFIG_RTC_LIB dependency mmc: mmc: fix switch timeout issue caused by jiffies precision cfg80211: size various nl80211 messages correctly stmmac: copy unicast mac address to MAC registers dccp: do not use ipv6 header for ipv4 flow mISDN: hfcpci: Test both vendor & device ID for Digium HFC4S net/packet: Set __GFP_NOWARN upon allocation in alloc_pg_vec net: rose: fix a possible stack overflow Add hlist_add_tail_rcu() (Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) packets: Always register packet sk in the same order tcp: do not use ipv6 header for ipv4 flow vxlan: Don't call gro_cells_destroy() before device is unregistered sctp: get sctphdr by offset in sctp_compute_cksum mac8390: Fix mmio access size probe btrfs: remove WARN_ON in log_dir_items btrfs: raid56: properly unmap parity page in finish_parity_scrub() ARM: imx6q: cpuidle: fix bug that CPU might not wake up at expected time ALSA: compress: add support for 32bit calls in a 64bit kernel ALSA: rawmidi: Fix potential Spectre v1 vulnerability ALSA: seq: oss: Fix Spectre v1 vulnerability ALSA: pcm: Fix possible OOB access in PCM oss plugins ALSA: pcm: Don't suspend stream in unrecoverable PCM state scsi: sd: Fix a race between closing an sd device and sd I/O scsi: zfcp: fix rport unblock if deleted SCSI devices on Scsi_Host scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices tty: atmel_serial: fix a potential NULL pointer dereference staging: vt6655: Remove vif check from vnt_interrupt staging: vt6655: Fix interrupt race condition on device start up. serial: max310x: Fix to avoid potential NULL pointer dereference serial: sh-sci: Fix setting SCSCR_TIE while transferring data USB: serial: cp210x: add new device id USB: serial: ftdi_sio: add additional NovaTech products USB: serial: mos7720: fix mos_parport refcount imbalance on error path USB: serial: option: set driver_info for SIM5218 and compatibles USB: serial: option: add Olicard 600 Disable kgdboc failed by echo space to /sys/module/kgdboc/parameters/kgdboc fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links gpio: adnp: Fix testing wrong value in adnp_gpio_direction_input perf intel-pt: Fix TSC slip x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y KVM: Reject device ioctls from processes other than the VM's creator xhci: Fix port resume done detection for SS ports with LPM enabled Revert "USB: core: only clean up what we allocated" arm64: support keyctl() system call in 32-bit mode coresight: removing bind/unbind options from sysfs stm class: Hide STM-specific options if STM is disabled Linux 4.4.178 Change-Id: Iac01be124213731798a36b20d80ea3a8e911d025 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-03writeback: initialize inode members that track writeback historyTahsin Erdogan
[ Upstream commit 3d65ae4634ed8350aee98a4e6f4e41fe40c7d282 ] inode struct members that track cgroup writeback information should be reinitialized when inode gets allocated from kmem_cache. Otherwise, their values remain and get used by the new inode. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Acked-by: Tejun Heo <tj@kernel.org> Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching") Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-07-17Merge 4.4.141 into android-4.4Greg Kroah-Hartman
Changes in 4.4.141 MIPS: Fix ioremap() RAM check ibmasm: don't write out of bounds in read handler vmw_balloon: fix inflation with batching ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS USB: serial: ch341: fix type promotion bug in ch341_control_in() USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick USB: serial: keyspan_pda: fix modem-status error handling USB: yurex: fix out-of-bounds uaccess in read handler USB: serial: mos7840: fix status-register error handling usb: quirks: add delay quirks for Corsair Strafe xhci: xhci-mem: off by one in xhci_stream_id_to_ring() HID: usbhid: add quirk for innomedia INNEX GENESIS/ATARI adapter Fix up non-directory creation in SGID directories tools build: fix # escaping in .cmd files for future Make iw_cxgb4: correctly enforce the max reg_mr depth x86/cpufeature: Move some of the scattered feature bits to x86_capability x86/cpufeature: Cleanup get_cpu_cap() x86/cpu: Provide a config option to disable static_cpu_has x86/fpu: Add an XSTATE_OP() macro x86/fpu: Get rid of xstate_fault() x86/headers: Don't include asm/processor.h in asm/atomic.h x86/cpufeature: Carve out X86_FEATURE_* x86/cpufeature: Replace the old static_cpu_has() with safe variant x86/cpufeature: Get rid of the non-asm goto variant x86/alternatives: Add an auxilary section x86/alternatives: Discard dynamic check after init x86/vdso: Use static_cpu_has() x86/boot: Simplify kernel load address alignment check x86/cpufeature: Speed up cpu_feature_enabled() x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions x86/mm/pkeys: Fix mismerge of protection keys CPUID bits x86/cpu: Add detection of AMD RAS Capabilities x86/cpufeature, x86/mm/pkeys: Fix broken compile-time disabling of pkeys x86/cpufeature: Update cpufeaure macros x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated x86/cpufeature: Add helper macro for mask check macros uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn() netfilter: nf_queue: augment nfqa_cfg_policy netfilter: x_tables: initialise match/target check parameter struct loop: add recursion validation to LOOP_CHANGE_FD PM / hibernate: Fix oops at snapshot_write() RDMA/ucm: Mark UCM interface as BROKEN loop: remember whether sysfs_create_group() was done Linux 4.4.141 Change-Id: I777b39a0ede95b58638add97756d6beaf4a9d154 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-07-17Fix up non-directory creation in SGID directoriesLinus Torvalds
commit 0fa3ecd87848c9c93c2c828ef4c3a8ca36ce46c7 upstream. sgid directories have special semantics, making newly created files in the directory belong to the group of the directory, and newly created subdirectories will also become sgid. This is historically used for group-shared directories. But group directories writable by non-group members should not imply that such non-group members can magically join the group, so make sure to clear the sgid bit on non-directories for non-members (but remember that sgid without group execute means "mandatory locking", just to confuse things even more). Reported-by: Jann Horn <jannh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-20Merge 4.4.116 into android-4.4Greg Kroah-Hartman
Changes in 4.4.116 powerpc/bpf/jit: Disable classic BPF JIT on ppc64le powerpc/64: Fix flush_(d|i)cache_range() called from modules powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC powerpc: Simplify module TOC handling powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper powerpc/64: Add macros for annotating the destination of rfid/hrfid powerpc/64s: Simple RFI macro conversions powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL powerpc/64s: Add support for RFI flush of L1-D cache powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti powerpc/pseries: Query hypervisor for RFI flush settings powerpc/powernv: Check device-tree for RFI flush settings powerpc/64s: Wire up cpu_show_meltdown() powerpc/64s: Allow control of RFI flush via debugfs ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE usbip: vhci_hcd: clear just the USB_PORT_STAT_POWER bit usbip: fix 3eee23c3ec14 tcp_socket address still in the status file net: cdc_ncm: initialize drvflags before usage ASoC: simple-card: Fix misleading error message ASoC: rsnd: don't call free_irq() on Parent SSI ASoC: rsnd: avoid duplicate free_irq() drm: rcar-du: Use the VBK interrupt for vblank events drm: rcar-du: Fix race condition when disabling planes at CRTC stop x86/asm: Fix inline asm call constraints for GCC 4.4 ip6mr: fix stale iterator net: igmp: add a missing rcu locking section qlcnic: fix deadlock bug r8169: fix RTL8168EP take too long to complete driver initialization. tcp: release sk_frag.page in tcp_disconnect vhost_net: stop device during reset owner media: soc_camera: soc_scale_crop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE KEYS: encrypted: fix buffer overread in valid_master_desc() don't put symlink bodies in pagecache into highmem crypto: tcrypt - fix S/G table for test_aead_speed() x86/microcode/AMD: Do not load when running on a hypervisor x86/microcode: Do the family check first powerpc/pseries: include linux/types.h in asm/hvcall.h cifs: Fix missing put_xid in cifs_file_strict_mmap cifs: Fix autonegotiate security settings mismatch CIFS: zero sensitive data when freeing dmaengine: dmatest: fix container_of member in dmatest_callback x86/kaiser: fix build error with KASAN && !FUNCTION_GRAPH_TRACER kaiser: fix compile error without vsyscall netfilter: nf_queue: Make the queue_handler pernet posix-timer: Properly check sigevent->sigev_notify usb: gadget: uvc: Missing files for configfs interface sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() sched/rt: Up the root domain ref count when passing it around via IPIs dccp: CVE-2017-8824: use-after-free in DCCP code media: dvb-usb-v2: lmedm04: Improve logic checking of warm start media: dvb-usb-v2: lmedm04: move ts2020 attach to dm04_lme2510_tuner mtd: cfi: convert inline functions to macros mtd: nand: brcmnand: Disable prefetch by default mtd: nand: Fix nand_do_read_oob() return value mtd: nand: sunxi: Fix ECC strength choice ubi: block: Fix locking for idr_alloc/idr_remove nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds NFS: Add a cond_resched() to nfs_commit_release_pages() NFS: commit direct writes even if they fail partially NFS: reject request for id_legacy key without auxdata kernfs: fix regression in kernfs_fop_write caused by wrong type ahci: Annotate PCI ids for mobile Intel chipsets as such ahci: Add PCI ids for Intel Bay Trail, Cherry Trail and Apollo Lake AHCI ahci: Add Intel Cannon Lake PCH-H PCI ID crypto: hash - introduce crypto_hash_alg_has_setkey() crypto: cryptd - pass through absence of ->setkey() crypto: poly1305 - remove ->setkey() method nsfs: mark dentry with DCACHE_RCUACCESS media: v4l2-ioctl.c: don't copy back the result for -ENOTTY vb2: V4L2_BUF_FLAG_DONE is set after DQBUF media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF media: v4l2-compat-ioctl32.c: fix the indentation media: v4l2-compat-ioctl32.c: move 'helper' functions to __get/put_v4l2_format32 media: v4l2-compat-ioctl32.c: avoid sizeof(type) media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32 media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs media: v4l2-compat-ioctl32: Copy v4l2_window->global_alpha media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32 media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic crypto: caam - fix endless loop when DECO acquire fails arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2 watchdog: imx2_wdt: restore previous timeout after suspend+resume media: ts2020: avoid integer overflows on 32 bit machines media: cxusb, dib0700: ignore XC2028_I2C_FLUSH kernel/async.c: revert "async: simplify lowest_in_progress()" HID: quirks: Fix keyboard + touchpad on Toshiba Click Mini not working Bluetooth: btsdio: Do not bind to non-removable BCM43341 Revert "Bluetooth: btusb: fix QCA Rome suspend/resume" Bluetooth: btusb: Restore QCA Rome suspend/resume fix with a "rewritten" version signal/openrisc: Fix do_unaligned_access to send the proper signal signal/sh: Ensure si_signo is initialized in do_divide_error alpha: fix crash if pthread_create races with signal delivery alpha: fix reboot on Avanti platform xtensa: fix futex_atomic_cmpxchg_inatomic EDAC, octeon: Fix an uninitialized variable warning pktcdvd: Fix pkt_setup_dev() error path btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker nvme: Fix managing degraded controllers ACPI: sbshc: remove raw pointer from printk() message ovl: fix failure to fsync lower dir mn10300/misalignment: Use SIGSEGV SEGV_MAPERR to report a failed user copy ftrace: Remove incorrect setting of glob search field Linux 4.4.116 Change-Id: Id000cb8d59b74de063902e9ad24dd07fe1b1694b Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-02-16don't put symlink bodies in pagecache into highmemAl Viro
commit 21fc61c73c3903c4c312d0802da01ec2b323d174 upstream. kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jin Qian <jinqian@google.com> Signed-off-by: Jin Qian <jinqian@android.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-15Merge remote-tracking branch 'common/android-4.4' into android-4.4.yDmitry Shmidt
Change-Id: Icf907f5067fb6da5935ab0d3271df54b8d5df405
2017-01-26ANDROID: vfs: Add setattr2 for filesystems with per mount permissionsDaniel Rosenberg
This allows filesystems to use their mount private data to influence the permssions they use in setattr2. It has been separated into a new call to avoid disrupting current setattr users. Change-Id: I19959038309284448f1b7f232d579674ef546385 Signed-off-by: Daniel Rosenberg <drosen@google.com>
2016-08-10vfs: fix deadlock in file_remove_privs() on overlayfsMiklos Szeredi
commit c1892c37769cf89c7e7ba57528ae2ccb5d153c9b upstream. file_remove_privs() is called with inode lock on file_inode(), which proceeds to calling notify_change() on file->f_path.dentry. Which triggers the WARN_ON_ONCE(!inode_is_locked(inode)) in addition to deadlocking later when ovl_setattr tries to lock the underlying inode again. Fix this mess by not mixing the layers, but doing everything on underlying dentry/inode. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 07a2daab49c5 ("ovl: Copy up underlying inode's ->i_mode to overlay inode") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-09fs/inode.c: fix kernel-doc warningRandy Dunlap
Fix kernel-doc warning in fs/inode.c: ../fs/inode.c:1606: warning: No description found for parameter 'inode' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-18inode: don't softlockup when evicting inodesJosef Bacik
On a box with a lot of ram (148gb) I can make the box softlockup after running an fs_mark job that creates hundreds of millions of empty files. This is because we never generate enough memory pressure to keep the number of inodes on our unused list low, so when we go to unmount we have to evict ~100 million inodes. This makes one processor a very unhappy person, so add a cond_resched() in dispose_list() and if we need a resched when processing the s_inodes list do that and run dispose_list() on what we've currently culled. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz>
2015-08-17inode: rename i_wb_list to i_io_listDave Chinner
There's a small consistency problem between the inode and writeback naming. Writeback calls the "for IO" inode queues b_io and b_more_io, but the inode calls these the "writeback list" or i_wb_list. This makes it hard to an new "under writeback" list to the inode, or call it an "under IO" list on the bdi because either way we'll have writeback on IO and IO on writeback and it'll just be confusing. I'm getting confused just writing this! So, rename the inode "for IO" list variable to i_io_list so we can add a new "writeback list" in a subsequent patch. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com>
2015-08-17inode: convert inode_sb_list_lock to per-sbDave Chinner
The process of reducing contention on per-superblock inode lists starts with moving the locking to match the per-superblock inode list. This takes the global lock out of the picture and reduces the contention problems to within a single filesystem. This doesn't get rid of contention as the locks still have global CPU scope, but it does isolate operations on different superblocks form each other. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com>
2015-07-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...
2015-06-30vfs: avoid creation of inode number 0 in get_next_inoCarlos Maiolino
currently, get_next_ino() is able to create inodes with inode number = 0. This have a bad impact in the filesystems relying in this function to generate inode numbers. While there is no problem at all in having inodes with number 0, userspace tools which handle file management tasks can have problems handling these files, like for example, the impossiblity of users to delete these files, since glibc will ignore them. So, I believe the best way is kernel to avoid creating them. This problem has been raised previously, but the old thread didn't have any other update for a year+, and I've seen too many users hitting the same issue regarding the impossibility to delete files while using filesystems relying on this function. So, I'm starting the thread again, with the same patch that I believe is enough to address this problem. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-25Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull cgroup writeback support from Jens Axboe: "This is the big pull request for adding cgroup writeback support. This code has been in development for a long time, and it has been simmering in for-next for a good chunk of this cycle too. This is one of those problems that has been talked about for at least half a decade, finally there's a solution and code to go with it. Also see last weeks writeup on LWN: http://lwn.net/Articles/648292/" * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits) writeback, blkio: add documentation for cgroup writeback support vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB writeback: do foreign inode detection iff cgroup writeback is enabled v9fs: fix error handling in v9fs_session_init() bdi: fix wrong error return value in cgwb_create() buffer: remove unusued 'ret' variable writeback: disassociate inodes from dying bdi_writebacks writeback: implement foreign cgroup inode bdi_writeback switching writeback: add lockdep annotation to inode_to_wb() writeback: use unlocked_inode_to_wb transaction in inode_congested() writeback: implement unlocked_inode_to_wb transaction and use it for stat updates writeback: implement [locked_]inode_to_wb_and_lock_list() writeback: implement foreign cgroup inode detection writeback: make writeback_control track the inode being written back writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb() mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use writeback: implement memcg writeback domain based throttling writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes writeback: implement memcg wb_domain writeback: update wb_over_bg_thresh() to use wb_domain aware operations ...
2015-06-23fs: Call security_ops->inode_killpriv on truncateJan Kara
Comment in include/linux/security.h says that ->inode_killpriv() should be called when setuid bit is being removed and that similar security labels (in fact this applies only to file capabilities) should be removed at this time as well. However we don't call ->inode_killpriv() when we remove suid bit on truncate. We fix the problem by calling ->inode_need_killpriv() and subsequently ->inode_killpriv() on truncate the same way as we do it on file write. After this patch there's only one user of should_remove_suid() - ocfs2 - and indeed it's buggy because it doesn't call ->inode_killpriv() on write. However fixing it is difficult because of special locking constraints. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23fs: Provide function telling whether file_remove_privs() will do anythingJan Kara
Provide function telling whether file_remove_privs() will do anything. Currently we only have should_remove_suid() and that does something slightly different. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23fs: Rename file_remove_suid() to file_remove_privs()Jan Kara
file_remove_suid() is a misnomer since it removes also file capabilities stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells something else than whether file_remove_suid() call is necessary which leads to bugs. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23fs: Fix S_NOSEC handlingJan Kara
file_remove_suid() could mistakenly set S_NOSEC inode bit when root was modifying the file. As a result following writes to the file by ordinary user would avoid clearing suid or sgid bits. Fix the bug by checking actual mode bits before setting S_NOSEC. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-02writeback: make backing_dev_info host cgroup-specific bdi_writebacksTejun Heo
For the planned cgroup writeback support, on each bdi (backing_dev_info), each memcg will be served by a separate wb (bdi_writeback). This patch updates bdi so that a bdi can host multiple wbs (bdi_writebacks). On the default hierarchy, blkcg implicitly enables memcg. This allows using memcg's page ownership for attributing writeback IOs, and every memcg - blkcg combination can be served by its own wb by assigning a dedicated wb to each memcg. This means that there may be multiple wb's of a bdi mapped to the same blkcg. As congested state is per blkcg - bdi combination, those wb's should share the same congested state. This is achieved by tracking congested state via bdi_writeback_congested structs which are keyed by blkcg. bdi->wb remains unchanged and will keep serving the root cgroup. cgwb's (cgroup wb's) for non-root cgroups are created on-demand or looked up while dirtying an inode according to the memcg of the page being dirtied or current task. Each cgwb is indexed on bdi->cgwb_tree by its memcg id. Once an inode is associated with its wb, it can be retrieved using inode_to_wb(). Currently, none of the filesystems has FS_CGROUP_WRITEBACK and all pages will keep being associated with bdi->wb. v3: inode_attach_wb() in account_page_dirtied() moved inside mapping_cap_account_dirty() block where it's known to be !NULL. Also, an unnecessary NULL check before kfree() removed. Both detected by the kbuild bot. v2: Updated so that wb association is per inode and wb is per memcg rather than blkcg. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kbuild test robot <fengguang.wu@intel.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-15VFS/namei: make the use of touch_atime() in get_link() RCU-safe.NeilBrown
touch_atime is not RCU-safe, and so cannot be called on an RCU walk. However, in situations where RCU-walk makes a difference, the symlink will likely to accessed much more often than it is useful to update the atime. So split out the test of "Does the atime actually need to be updated" into atime_needs_update(), and have get_link() unlazy if it finds that it will need to do that update. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-10libfs: simple_follow_link()Al Viro
let "fast" symlinks store the pointer to the body into ->i_link and use simple_follow_link for ->follow_link() Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-24direct-io: only inc/dec inode->i_dio_count for file systemsJens Axboe
do_blockdev_direct_IO() increments and decrements the inode ->i_dio_count for each IO operation. It does this to protect against truncate of a file. Block devices don't need this sort of protection. For a capable multiqueue setup, this atomic int is the only shared state between applications accessing the device for O_DIRECT, and it presents a scaling wall for that. In my testing, as much as 30% of system time is spent incrementing and decrementing this value. A mixed read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with better latencies too. Before: clat percentiles (usec): | 1.00th=[ 33], 5.00th=[ 34], 10.00th=[ 34], 20.00th=[ 34], | 30.00th=[ 34], 40.00th=[ 34], 50.00th=[ 35], 60.00th=[ 35], | 70.00th=[ 35], 80.00th=[ 35], 90.00th=[ 37], 95.00th=[ 80], | 99.00th=[ 98], 99.50th=[ 151], 99.90th=[ 155], 99.95th=[ 155], | 99.99th=[ 165] After: clat percentiles (usec): | 1.00th=[ 95], 5.00th=[ 108], 10.00th=[ 129], 20.00th=[ 149], | 30.00th=[ 155], 40.00th=[ 161], 50.00th=[ 167], 60.00th=[ 171], | 70.00th=[ 177], 80.00th=[ 185], 90.00th=[ 201], 95.00th=[ 270], | 99.00th=[ 390], 99.50th=[ 398], 99.90th=[ 418], 99.95th=[ 422], | 99.99th=[ 438] In other setups, Robert Elliott reported seeing good performance improvements: https://lkml.org/lkml/2015/4/3/557 The more applications accessing the device, the worse it gets. Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells do_blockdev_direct_IO() that it need not worry about incrementing or decrementing the inode i_dio_count for this caller. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Elliott, Robert (Server Storage) <elliott@hp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15VFS: fs/inode.c helpers: d_inode() annotationsDavid Howells
these should be used on objects already in top layer Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-17Merge branch 'lazytime' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull lazytime mount option support from Al Viro: "Lazytime stuff from tytso" * 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext4: add optimization for the lazytime mount option vfs: add find_inode_nowait() function vfs: add support for a lazytime mount option
2015-02-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge third set of updates from Andrew Morton: - the rest of MM [ This includes getting rid of the numa hinting bits, in favor of just generic protnone logic. Yay. - Linus ] - core kernel - procfs - some of lib/ (lots of lib/ material this time) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (104 commits) lib/lcm.c: replace include lib/percpu_ida.c: remove redundant includes lib/strncpy_from_user.c: replace module.h include lib/stmp_device.c: replace module.h include lib/sort.c: move include inside #if 0 lib/show_mem.c: remove redundant include lib/radix-tree.c: change to simpler include lib/plist.c: remove redundant include lib/nlattr.c: remove redundant include lib/kobject_uevent.c: remove redundant include lib/llist.c: remove redundant include lib/md5.c: simplify include lib/list_sort.c: rearrange includes lib/genalloc.c: remove redundant include lib/idr.c: remove redundant include lib/halfmd4.c: simplify includes lib/dynamic_queue_limits.c: simplify includes lib/sort.c: use simpler includes lib/interval_tree.c: simplify includes hexdump: make it return number of bytes placed in buffer ...
2015-02-12list_lru: add helpers to isolate itemsVladimir Davydov
Currently, the isolate callback passed to the list_lru_walk family of functions is supposed to just delete an item from the list upon returning LRU_REMOVED or LRU_REMOVED_RETRY, while nr_items counter is fixed by __list_lru_walk_one after the callback returns. Since the callback is allowed to drop the lock after removing an item (it has to return LRU_REMOVED_RETRY then), the nr_items can be less than the actual number of elements on the list even if we check them under the lock. This makes it difficult to move items from one list_lru_one to another, which is required for per-memcg list_lru reparenting - we can't just splice the lists, we have to move entries one by one. This patch therefore introduces helpers that must be used by callback functions to isolate items instead of raw list_del/list_move. These are list_lru_isolate and list_lru_isolate_move. They not only remove the entry from the list, but also fix the nr_items counter, making sure nr_items always reflects the actual number of elements on the list if checked under the appropriate lock. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12list_lru: introduce list_lru_shrink_{count,walk}Vladimir Davydov
Kmem accounting of memcg is unusable now, because it lacks slab shrinker support. That means when we hit the limit we will get ENOMEM w/o any chance to recover. What we should do then is to call shrink_slab, which would reclaim old inode/dentry caches from this cgroup. This is what this patch set is intended to do. Basically, it does two things. First, it introduces the notion of per-memcg slab shrinker. A shrinker that wants to reclaim objects per cgroup should mark itself as SHRINKER_MEMCG_AWARE. Then it will be passed the memory cgroup to scan from in shrink_control->memcg. For such shrinkers shrink_slab iterates over the whole cgroup subtree under the target cgroup and calls the shrinker for each kmem-active memory cgroup. Secondly, this patch set makes the list_lru structure per-memcg. It's done transparently to list_lru users - everything they have to do is to tell list_lru_init that they want memcg-aware list_lru. Then the list_lru will automatically distribute objects among per-memcg lists basing on which cgroup the object is accounted to. This way to make FS shrinkers (icache, dcache) memcg-aware we only need to make them use memcg-aware list_lru, and this is what this patch set does. As before, this patch set only enables per-memcg kmem reclaim when the pressure goes from memory.limit, not from memory.kmem.limit. Handling memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and it is still unclear whether we will have this knob in the unified hierarchy. This patch (of 9): NUMA aware slab shrinkers use the list_lru structure to distribute objects coming from different NUMA nodes to different lists. Whenever such a shrinker needs to count or scan objects from a particular node, it issues commands like this: count = list_lru_count_node(lru, sc->nid); freed = list_lru_walk_node(lru, sc->nid, isolate_func, isolate_arg, &sc->nr_to_scan); where sc is an instance of the shrink_control structure passed to it from vmscan. To simplify this, let's add special list_lru functions to be used by shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which consolidate the nid and nr_to_scan arguments in the shrink_control structure. This will also allow us to avoid patching shrinkers that use list_lru when we make shrink_slab() per-memcg - all we will have to do is extend the shrink_control structure to include the target memcg and make list_lru_shrink_{count,walk} handle this appropriately. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Suggested-by: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Greg Thelen <gthelen@google.com> Cc: Glauber Costa <glommer@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull backing device changes from Jens Axboe: "This contains a cleanup of how the backing device is handled, in preparation for a rework of the life time rules. In this part, the most important change is to split the unrelated nommu mmap flags from it, but also removing a backing_dev_info pointer from the address_space (and inode), and a cleanup of other various minor bits. Christoph did all the work here, I just fixed an oops with pages that have a swap backing. Arnd fixed a missing export, and Oleg killed the lustre backing_dev_info from staging. Last patch was from Al, unexporting parts that are now no longer needed outside" * 'for-3.20/bdi' of git://git.kernel.dk/linux-block: Make super_blocks and sb_lock static mtd: export new mtd_mmap_capabilities fs: make inode_to_bdi() handle NULL inode staging/lustre/llite: get rid of backing_dev_info fs: remove default_backing_dev_info fs: don't reassign dirty inodes to default_backing_dev_info nfs: don't call bdi_unregister ceph: remove call to bdi_unregister fs: remove mapping->backing_dev_info fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info nilfs2: set up s_bdi like the generic mount_bdev code block_dev: get bdev inode bdi directly from the block device block_dev: only write bdev inode on close fs: introduce f_op->mmap_capabilities for nommu mmap support fs: kill BDI_CAP_SWAP_BACKED fs: deduplicate noop_backing_dev_info
2015-02-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "Bite-sized chunks this time, to avoid the MTA ratelimiting woes. - fs/notify updates - ocfs2 - some of MM" That laconic "some MM" is mainly the removal of remap_file_pages(), which is a big simplification of the VM, and which gets rid of a *lot* of random cruft and special cases because we no longer support the non-linear mappings that it used. From a user interface perspective, nothing has changed, because the remap_file_pages() syscall still exists, it's just done by emulating the old behavior by creating a lot of individual small mappings instead of one non-linear one. The emulation is slower than the old "native" non-linear mappings, but nobody really uses or cares about remap_file_pages(), and simplifying the VM is a big advantage. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits) memcg: zap memcg_slab_caches and memcg_slab_mutex memcg: zap memcg_name argument of memcg_create_kmem_cache memcg: zap __memcg_{charge,uncharge}_slab mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check mm: hugetlb: fix type of hugetlb_treat_as_movable variable mm, hugetlb: remove unnecessary lower bound on sysctl handlers"? mm: memory: merge shared-writable dirtying branches in do_wp_page() mm: memory: remove ->vm_file check on shared writable vmas xtensa: drop _PAGE_FILE and pte_file()-related helpers x86: drop _PAGE_FILE and pte_file()-related helpers unicore32: drop pte_file()-related helpers um: drop _PAGE_FILE and pte_file()-related helpers tile: drop pte_file()-related helpers sparc: drop pte_file()-related helpers sh: drop _PAGE_FILE and pte_file()-related helpers score: drop _PAGE_FILE and pte_file()-related helpers s390: drop pte_file()-related helpers parisc: drop _PAGE_FILE and pte_file()-related helpers openrisc: drop _PAGE_FILE and pte_file()-related helpers nios2: drop _PAGE_FILE and pte_file()-related helpers ...
2015-02-10rmap: drop support of non-linear mappingsKirill A. Shutemov
We don't create non-linear mappings anymore. Let's drop code which handles them in rmap. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-05vfs: add find_inode_nowait() functionTheodore Ts'o
Add a new function find_inode_nowait() which is an even more general version of ilookup5_nowait(). It is designed for callers which need very fine grained control over when the function is allowed to block or increment the inode's reference count. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-05vfs: add support for a lazytime mount optionTheodore Ts'o
Add a new mount option which enables a new "lazytime" mode. This mode causes atime, mtime, and ctime updates to only be made to the in-memory version of the inode. The on-disk times will only get updated when (a) if the inode needs to be updated for some non-time related change, (b) if userspace calls fsync(), syncfs() or sync(), or (c) just before an undeleted inode is evicted from memory. This is OK according to POSIX because there are no guarantees after a crash unless userspace explicitly requests via a fsync(2) call. For workloads which feature a large number of random write to a preallocated file, the lazytime mount option significantly reduces writes to the inode table. The repeated 4k writes to a single block will result in undesirable stress on flash devices and SMR disk drives. Even on conventional HDD's, the repeated writes to the inode table block will trigger Adjacent Track Interference (ATI) remediation latencies, which very negatively impact long tail latencies --- which is a very big deal for web serving tiers (for example). Google-Bug-Id: 18297052 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-20fs: remove mapping->backing_dev_infoChristoph Hellwig
Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-16locks: add a new struct file_locking_context pointer to struct inodeJeff Layton
The current scheme of using the i_flock list is really difficult to manage. There is also a legitimate desire for a per-inode spinlock to manage these lists that isn't the i_lock. Start conversion to a new scheme to eventually replace the old i_flock list with a new "file_lock_context" object. We start by adding a new i_flctx to struct inode. For now, it lives in parallel with i_flock list, but will eventually replace it. The idea is to allocate a structure to sit in that pointer and act as a locus for all things file locking. We allocate a file_lock_context for an inode when the first lock is added to it, and it's only freed when the inode is freed. We use the i_lock to protect the assignment, but afterward it should mostly be accessed locklessly. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>
2014-12-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile #2 from Al Viro: "Next pile (and there'll be one or two more). The large piece in this one is getting rid of /proc/*/ns/* weirdness; among other things, it allows to (finally) make nameidata completely opaque outside of fs/namei.c, making for easier further cleanups in there" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coda_venus_readdir(): use file_inode() fs/namei.c: fold link_path_walk() call into path_init() path_init(): don't bother with LOOKUP_PARENT in argument fs/namei.c: new helper (path_cleanup()) path_init(): store the "base" pointer to file in nameidata itself make default ->i_fop have ->open() fail with ENXIO make nameidata completely opaque outside of fs/namei.c kill proc_ns completely take the targets of /proc/*/ns/* symlinks to separate fs bury struct proc_ns in fs/proc copy address of proc_ns_ops into ns_common new helpers: ns_alloc_inum/ns_free_inum make proc_ns_operations work with struct ns_common * instead of void * switch the rest of proc_ns_operations to working with &...->ns netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns common object embedded into various struct ....ns
2014-12-13mm: convert i_mmap_mutex to rwsemDavidlohr Bueso
The i_mmap_mutex is a close cousin of the anon vma lock, both protecting similar data, one for file backed pages and the other for anon memory. To this end, this lock can also be a rwsem. In addition, there are some important opportunities to share the lock when there are no tree modifications. This conversion is straightforward. For now, all users take the write lock. [sfr@canb.auug.org.au: update fremap.c] Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10make default ->i_fop have ->open() fail with ENXIOAl Viro
As it is, default ->i_fop has NULL ->open() (along with all other methods). The only case where it matters is reopening (via procfs symlink) a file that didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned to something sane (default would fail on read/write/ioctl/etc.). Unfortunately, such case exists - alloc_file() users, especially anon_get_file() ones. There we have tons of opened files of very different kinds sharing the same inode. As the result, attempt to reopen those via procfs succeeds and you get a descriptor you can't do anything with. Moreover, in case of sockets we set ->i_fop that will only be used on such reopen attempts - and put a failing ->open() into it to make sure those do not succeed. It would be simpler to put such ->open() into default ->i_fop and leave it unchanged both for anon inode (as we do anyway) and for socket ones. Result: * everything going through do_dentry_open() works as it used to * sock_no_open() kludge is gone * attempts to reopen anon-inode files fail as they really ought to * ditto for aio_private_file() * ditto for perfmon - this one actually tried to imitate sock_no_open() trick, but failed to set ->i_fop, so in the current tree reopens succeed and yield completely useless descriptor. Intent clearly had been to fail with -ENXIO on such reopens; now it actually does. * everything else that used alloc_file() keeps working - it has ->i_fop set for its inodes anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-10vfs: Remove i_dquot field from inodeJan Kara
All filesystems using VFS quotas are now converted to use their private i_dquot fields. Remove the i_dquot field from generic inode structure. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2014-08-08mm: allow drivers to prevent new writable mappingsDavid Herrmann
This patch (of 6): The i_mmap_writable field counts existing writable mappings of an address_space. To allow drivers to prevent new writable mappings, make this counter signed and prevent new writable mappings if it is negative. This is modelled after i_writecount and DENYWRITE. This will be required by the shmem-sealing infrastructure to prevent any new writable mappings after the WRITE seal has been set. In case there exists a writable mapping, this operation will fail with EBUSY. Note that we rely on the fact that iff you already own a writable mapping, you can increase the counter without using the helpers. This is the same that we do for i_writecount. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ryan Lortie <desrt@desrt.ca> Cc: Lennart Poettering <lennart@poettering.net> Cc: Daniel Mack <zonque@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-16sched: Remove proliferation of wait_on_bit() action functionsNeilBrown
The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-10fs,userns: Change inode_capable to capable_wrt_inode_uidgidAndy Lutomirski
The kernel has no concept of capabilities with respect to inodes; inodes exist independently of namespaces. For example, inode_capable(inode, CAP_LINUX_IMMUTABLE) would be nonsense. This patch changes inode_capable to check for uid and gid mappings and renames it to capable_wrt_inode_uidgid, which should make it more obvious what it does. Fixes CVE-2014-4014. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Chinner <david@fromorbit.com> Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06fs: convert use of typedef ctl_table to struct ctl_tableJoe Perches
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-04Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Major changes for 3.14 include support for the newly added ZERO_RANGE and COLLAPSE_RANGE fallocate operations, and scalability improvements in the jbd2 layer and in xattr handling when the extended attributes spill over into an external block. Other than that, the usual clean ups and minor bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits) ext4: fix premature freeing of partial clusters split across leaf blocks ext4: remove unneeded test of ret variable ext4: fix comment typo ext4: make ext4_block_zero_page_range static ext4: atomically set inode->i_flags in ext4_set_inode_flags() ext4: optimize Hurd tests when reading/writing inodes ext4: kill i_version support for Hurd-castrated file systems ext4: each filesystem creates and uses its own mb_cache fs/mbcache.c: doucple the locking of local from global data fs/mbcache.c: change block and index hash chain to hlist_bl_node ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate ext4: refactor ext4_fallocate code ext4: Update inode i_size after the preallocation ext4: fix partial cluster handling for bigalloc file systems ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents ext4: only call sync_filesystm() when remounting read-only fs: push sync_filesystem() down to the file system's remount_fs() jbd2: improve error messages for inconsistent journal heads jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() jbd2: minimize region locked by j_list_lock in journal_get_create_access() ...
2014-04-04Merge branch 'cross-rename' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull renameat2 system call from Miklos Szeredi: "This adds a new syscall, renameat2(), which is the same as renameat() but with a flags argument. The purpose of extending rename is to add cross-rename, a symmetric variant of rename, which exchanges the two files. This allows interesting things, which were not possible before, for example atomically replacing a directory tree with a symlink, etc... This also allows overlayfs and friends to operate on whiteouts atomically. Andy Lutomirski also suggested a "noreplace" flag, which disables the overwriting behavior of rename. These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only implemented for ext4 as an example and for testing" * 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ext4: add cross rename support ext4: rename: split out helper functions ext4: rename: move EMLINK check up ext4: rename: create ext4_renament structure for local vars vfs: add cross-rename vfs: lock_two_nondirectories: allow directory args security: add flags to rename hooks vfs: add RENAME_NOREPLACE flag vfs: add renameat2 syscall vfs: rename: use common code for dir and non-dir vfs: rename: move d_move() up vfs: add d_is_dir()
2014-04-03mm + fs: store shadow entries in page cacheJohannes Weiner
Reclaim will be leaving shadow entries in the page cache radix tree upon evicting the real page. As those pages are found from the LRU, an iput() can lead to the inode being freed concurrently. At this point, reclaim must no longer install shadow pages because the inode freeing code needs to ensure the page tree is really empty. Add an address_space flag, AS_EXITING, that the inode freeing code sets under the tree lock before doing the final truncate. Reclaim will check for this flag before installing shadow pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-01vfs: lock_two_nondirectories: allow directory argsJ. Bruce Fields
lock_two_nondirectories warned if either of its args was a directory. Instead just ignore the directory args. This is needed for locking in cross rename. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>