Age | Commit message (Collapse) | Author |
|
* refs/heads/tmp-b1c4836
Linux 4.4.129
writeback: safer lock nesting
fanotify: fix logic of events on child
ext4: bugfix for mmaped pages in mpage_release_unused_pages()
mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
mm: allow GFP_{FS,IO} for page_cache_read page cache allocation
autofs: mount point create should honour passed in mode
Don't leak MNT_INTERNAL away from internal mounts
rpc_pipefs: fix double-dput()
hypfs_kill_super(): deal with failed allocations
jffs2_kill_sb(): deal with failed allocations
powerpc/lib: Fix off-by-one in alternate feature patching
powerpc/eeh: Fix enabling bridge MMIO windows
MIPS: memset.S: Fix clobber of v1 in last_fixup
MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
MIPS: memset.S: EVA & fault support for small_memset
MIPS: uaccess: Add micromips clobbers to bzero invocation
HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
ALSA: hda - New VIA controller suppor no-snoop path
ALSA: rawmidi: Fix missing input substream checks in compat ioctls
ALSA: line6: Use correct endpoint type for midi output
ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()
ext4: fix crashes in dioread_nolock mode
drm/radeon: Fix PCIe lane width calculation
ext4: don't allow r/w mounts if metadata blocks overlap the superblock
vfio/pci: Virtualize Maximum Read Request Size
vfio/pci: Virtualize Maximum Payload Size
vfio-pci: Virtualize PCIe & AF FLR
ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation
ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
ALSA: pcm: Avoid potential races between OSS ioctls and read/write
ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation
ALSA: oss: consolidate kmalloc/memset 0 call to kzalloc
watchdog: f71808e_wdt: Fix WD_EN register read
thermal: imx: Fix race condition in imx_thermal_probe()
clk: bcm2835: De-assert/assert PLL reset signal when appropriate
clk: mvebu: armada-38x: add support for missing clocks
clk: mvebu: armada-38x: add support for 1866MHz variants
mmc: jz4740: Fix race condition in IRQ mask update
iommu/vt-d: Fix a potential memory leak
um: Use POSIX ucontext_t instead of struct ucontext
dmaengine: at_xdmac: fix rare residue corruption
IB/srp: Fix completion vector assignment algorithm
IB/srp: Fix srp_abort()
ALSA: pcm: Fix UAF at PCM release via PCM timer access
RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
ext4: fail ext4_iget for root directory if unallocated
ext4: don't update checksum of new initialized bitmaps
jbd2: if the journal is aborted then don't allow update of the log tail
random: use a tighter cap in credit_entropy_bits_safe()
thunderbolt: Resume control channel after hibernation image is created
ASoC: ssm2602: Replace reg_default_raw with reg_default
HID: core: Fix size as type u32
HID: Fix hid_report_len usage
powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
HID: i2c-hid: fix size check and type usage
usb: dwc3: pci: Properly cleanup resource
USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E
regmap: Fix reversed bounds check in regmap_raw_write()
xen-netfront: Fix hang on device removal
ARM: dts: at91: sama5d4: fix pinctrl compatible string
ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property
usb: musb: gadget: misplaced out of bounds check
mm, slab: reschedule cache_reap() on the same CPU
ipc/shm: fix use-after-free of shm file via remap_file_pages()
resource: fix integer overflow at reallocation
fs/reiserfs/journal.c: add missing resierfs_warning() arg
ubi: Reject MLC NAND
ubi: Fix error for write access
ubi: fastmap: Don't flush fastmap work on detach
ubifs: Check ubifs_wbuf_sync() return code
tty: make n_tty_read() always abort if hangup is in progress
x86/hweight: Don't clobber %rdi
x86/hweight: Get rid of the special calling convention
lan78xx: Correctly indicate invalid OTP
slip: Check if rstate is initialized before uncompressing
cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
hwmon: (ina2xx) Fix access to uninitialized mutex
rtl8187: Fix NULL pointer dereference in priv->conf_mutex
getname_kernel() needs to make sure that ->name != ->iname in long case
s390/ipl: ensure loadparm valid flag is set
s390/qdio: don't merge ERROR output buffers
s390/qdio: don't retry EQBS after CCQ 96
block/loop: fix deadlock after loop_set_status
Revert "perf tests: Decompress kernel module before objdump"
radeon: hide pointless #warning when compile testing
perf intel-pt: Fix timestamp following overflow
perf intel-pt: Fix error recovery from missing TIP packet
perf intel-pt: Fix sync_switch
perf intel-pt: Fix overlap detection to identify consecutive buffers correctly
parisc: Fix out of array access in match_pci_device()
media: v4l2-compat-ioctl32: don't oops on overlay
f2fs: check cap_resource only for data blocks
Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer"
f2fs: clear PageError on writepage
UPSTREAM: timer: Export destroy_hrtimer_on_stack()
BACKPORT: dm verity: add 'check_at_most_once' option to only validate hashes once
f2fs: call unlock_new_inode() before d_instantiate()
f2fs: refactor read path to allow multiple postprocessing steps
fscrypt: allow synchronous bio decryption
Change-Id: I45f4ac10734d92023b53118d83dcd6c83974a283
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
|
|
Changes in 4.4.129
media: v4l2-compat-ioctl32: don't oops on overlay
parisc: Fix out of array access in match_pci_device()
perf intel-pt: Fix overlap detection to identify consecutive buffers correctly
perf intel-pt: Fix sync_switch
perf intel-pt: Fix error recovery from missing TIP packet
perf intel-pt: Fix timestamp following overflow
radeon: hide pointless #warning when compile testing
Revert "perf tests: Decompress kernel module before objdump"
block/loop: fix deadlock after loop_set_status
s390/qdio: don't retry EQBS after CCQ 96
s390/qdio: don't merge ERROR output buffers
s390/ipl: ensure loadparm valid flag is set
getname_kernel() needs to make sure that ->name != ->iname in long case
rtl8187: Fix NULL pointer dereference in priv->conf_mutex
hwmon: (ina2xx) Fix access to uninitialized mutex
cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
slip: Check if rstate is initialized before uncompressing
lan78xx: Correctly indicate invalid OTP
x86/hweight: Get rid of the special calling convention
x86/hweight: Don't clobber %rdi
tty: make n_tty_read() always abort if hangup is in progress
ubifs: Check ubifs_wbuf_sync() return code
ubi: fastmap: Don't flush fastmap work on detach
ubi: Fix error for write access
ubi: Reject MLC NAND
fs/reiserfs/journal.c: add missing resierfs_warning() arg
resource: fix integer overflow at reallocation
ipc/shm: fix use-after-free of shm file via remap_file_pages()
mm, slab: reschedule cache_reap() on the same CPU
usb: musb: gadget: misplaced out of bounds check
ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property
ARM: dts: at91: sama5d4: fix pinctrl compatible string
xen-netfront: Fix hang on device removal
regmap: Fix reversed bounds check in regmap_raw_write()
ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E
ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
usb: dwc3: pci: Properly cleanup resource
HID: i2c-hid: fix size check and type usage
powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
HID: Fix hid_report_len usage
HID: core: Fix size as type u32
ASoC: ssm2602: Replace reg_default_raw with reg_default
thunderbolt: Resume control channel after hibernation image is created
random: use a tighter cap in credit_entropy_bits_safe()
jbd2: if the journal is aborted then don't allow update of the log tail
ext4: don't update checksum of new initialized bitmaps
ext4: fail ext4_iget for root directory if unallocated
RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
ALSA: pcm: Fix UAF at PCM release via PCM timer access
IB/srp: Fix srp_abort()
IB/srp: Fix completion vector assignment algorithm
dmaengine: at_xdmac: fix rare residue corruption
um: Use POSIX ucontext_t instead of struct ucontext
iommu/vt-d: Fix a potential memory leak
mmc: jz4740: Fix race condition in IRQ mask update
clk: mvebu: armada-38x: add support for 1866MHz variants
clk: mvebu: armada-38x: add support for missing clocks
clk: bcm2835: De-assert/assert PLL reset signal when appropriate
thermal: imx: Fix race condition in imx_thermal_probe()
watchdog: f71808e_wdt: Fix WD_EN register read
ALSA: oss: consolidate kmalloc/memset 0 call to kzalloc
ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation
ALSA: pcm: Avoid potential races between OSS ioctls and read/write
ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation
vfio-pci: Virtualize PCIe & AF FLR
vfio/pci: Virtualize Maximum Payload Size
vfio/pci: Virtualize Maximum Read Request Size
ext4: don't allow r/w mounts if metadata blocks overlap the superblock
drm/radeon: Fix PCIe lane width calculation
ext4: fix crashes in dioread_nolock mode
ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()
ALSA: line6: Use correct endpoint type for midi output
ALSA: rawmidi: Fix missing input substream checks in compat ioctls
ALSA: hda - New VIA controller suppor no-snoop path
HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
MIPS: uaccess: Add micromips clobbers to bzero invocation
MIPS: memset.S: EVA & fault support for small_memset
MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
MIPS: memset.S: Fix clobber of v1 in last_fixup
powerpc/eeh: Fix enabling bridge MMIO windows
powerpc/lib: Fix off-by-one in alternate feature patching
jffs2_kill_sb(): deal with failed allocations
hypfs_kill_super(): deal with failed allocations
rpc_pipefs: fix double-dput()
Don't leak MNT_INTERNAL away from internal mounts
autofs: mount point create should honour passed in mode
mm: allow GFP_{FS,IO} for page_cache_read page cache allocation
mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
ext4: bugfix for mmaped pages in mpage_release_unused_pages()
fanotify: fix logic of events on child
writeback: safer lock nesting
Linux 4.4.129
Change-Id: I8806d2cc92fe512f27a349e8f630ced0cac9a8d7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
|
commit 16a34adb9392b2fe4195267475ab5b472e55292c upstream.
We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
their copies. As it is, creating a deep stack of bindings of /proc/*/ns/*
somewhere in a new namespace and exiting yields a stack overflow.
Cc: stable@kernel.org
Reported-by: Alexander Aring <aring@mojatatu.com>
Bisected-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
* refs/heads/tmp-29d0b65
Linux 4.4.88
xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present
NFS: Fix 2 use after free issues in the I/O code
ARM: 8692/1: mm: abort uaccess retries upon fatal signal
Bluetooth: Properly check L2CAP config option output buffer length
ALSA: msnd: Optimize / harden DSP and MIDI loops
locktorture: Fix potential memory leak with rw lock test
btrfs: resume qgroup rescan on rw remount
drm/bridge: adv7511: Re-write the i2c address before EDID probing
drm/bridge: adv7511: Switch to using drm_kms_helper_hotplug_event()
drm/bridge: adv7511: Use work_struct to defer hotplug handing to out of irq context
drm/bridge: adv7511: Fix mutex deadlock when interrupts are disabled
drm: adv7511: really enable interrupts for EDID detection
scsi: sg: recheck MMAP_IO request length with lock held
scsi: sg: protect against races between mmap() and SG_SET_RESERVED_SIZE
cs5536: add support for IDE controller variant
workqueue: Fix flag collision
drm/nouveau/pci/msi: disable MSI on big-endian platforms by default
mwifiex: correct channel stat buffer overflows
dlm: avoid double-free on error path in dlm_device_{register,unregister}
Bluetooth: Add support of 13d3:3494 RTL8723BE device
rtlwifi: rtl_pci_probe: Fix fail path of _rtl_pci_find_adapter
Input: trackpoint - assume 3 buttons when buttons detection fails
ath10k: fix memory leak in rx ring buffer allocation
intel_th: pci: Add Cannon Lake PCH-LP support
intel_th: pci: Add Cannon Lake PCH-H support
driver core: bus: Fix a potential double free
staging/rts5208: fix incorrect shift to extract upper nybble
USB: core: Avoid race of async_completed() w/ usbdev_release()
usb:xhci:Fix regression when ATI chipsets detected
usb: Add device quirk for Logitech HD Pro Webcam C920-C
USB: serial: option: add support for D-Link DWM-157 C1
usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard
ANDROID: sdcardfs: Add missing break
ANDROID: Sdcardfs: Move gid derivation under flag
ANDROID: mnt: Fix freeing of mount data
drivers: cpufreq: checks to avoid kernel crash in cpufreq_interactive
ANDROID: Use sk_uid to replace uid get from socket file
ANDROID: nf: xt_qtaguid: fix handling for cases where tunnels are used.
Revert "ANDROID: Use sk_uid to replace uid get from socket file"
ANDROID: fiq_debugger: Fix minor bug in code
Conflicts:
drivers/cpufreq/cpufreq_interactive.c
drivers/net/wireless/ath/ath10k/core.c
drivers/staging/android/fiq_debugger/fiq_debugger.c
net/netfilter/xt_qtaguid.c
Change-Id: I49c67ff84d4bee0799691cc1ee0a023e2dd13e66
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
|
|
Fix double free on error paths
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I1c25a175e87e5dd5cafcdcf9d78bf4c0dc3f88ef
Bug: 65386954
Fixes: 6b42d02561d3 ("ANDROID: mnt: Add filesystem private data to mount points")
|
|
|
|
Changes in 4.4.78
net_sched: fix error recovery at qdisc creation
net: sched: Fix one possible panic when no destroy callback
net/phy: micrel: configure intterupts after autoneg workaround
ipv6: avoid unregistering inet6_dev for loopback
net: dp83640: Avoid NULL pointer dereference.
tcp: reset sk_rx_dst in tcp_disconnect()
net: prevent sign extension in dev_get_stats()
bpf: prevent leaking pointer via xadd on unpriviledged
net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
ipv6: dad: don't remove dynamic addresses if link is down
net: ipv6: Compare lwstate in detecting duplicate nexthops
vrf: fix bug_on triggered by rx when destroying a vrf
rds: tcp: use sock_create_lite() to create the accept socket
brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
cfg80211: Check if PMKID attribute is of expected size
irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
parisc: Report SIGSEGV instead of SIGBUS when running out of stack
parisc: use compat_sys_keyctl()
parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs
parisc/mm: Ensure IRQs are off in switch_mm()
tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth
kernel/extable.c: mark core_kernel_text notrace
mm/list_lru.c: fix list_lru_count_node() to be race free
fs/dcache.c: fix spin lockup issue on nlru->lock
checkpatch: silence perl 5.26.0 unescaped left brace warnings
binfmt_elf: use ELF_ET_DYN_BASE only for PIE
arm: move ELF_ET_DYN_BASE to 4MB
arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
s390: reduce ELF_ET_DYN_BASE
exec: Limit arg stack to at most 75% of _STK_LIM
vt: fix unchecked __put_user() in tioclinux ioctls
mnt: In umount propagation reparent in a separate pass
mnt: In propgate_umount handle visiting mounts in any order
mnt: Make propagate_umount less slow for overlapping mount propagation trees
selftests/capabilities: Fix the test_execve test
tpm: Get rid of chip->pdev
tpm: Provide strong locking for device removal
Add "shutdown" to "struct class".
tpm: Issue a TPM2_Shutdown for TPM2 devices.
mm: fix overflow check in expand_upwards()
crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD
crypto: atmel - only treat EBUSY as transient if backlog
crypto: sha1-ssse3 - Disable avx2
crypto: caam - fix signals handling
sched/topology: Fix overlapping sched_group_mask
sched/topology: Optimize build_group_mask()
PM / wakeirq: Convert to SRCU
PM / QoS: return -EINVAL for bogus strings
tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
KVM: x86: disable MPX if host did not enable MPX XSAVE features
kvm: vmx: Do not disable intercepts for BNDCFGS
kvm: x86: Guest BNDCFGS requires guest MPX support
kvm: vmx: Check value written to IA32_BNDCFGS
kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
Linux 4.4.78
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
|
commit 99b19d16471e9c3faa85cad38abc9cbbe04c6d55 upstream.
While investigating some poor umount performance I realized that in
the case of overlapping mount trees where some of the mounts are locked
the code has been failing to unmount all of the mounts it should
have been unmounting.
This failure to unmount all of the necessary
mounts can be reproduced with:
$ cat locked_mounts_test.sh
mount -t tmpfs test-base /mnt
mount --make-shared /mnt
mkdir -p /mnt/b
mount -t tmpfs test1 /mnt/b
mount --make-shared /mnt/b
mkdir -p /mnt/b/10
mount -t tmpfs test2 /mnt/b/10
mount --make-shared /mnt/b/10
mkdir -p /mnt/b/10/20
mount --rbind /mnt/b /mnt/b/10/20
unshare -Urm --propagation unchaged /bin/sh -c 'sleep 5; if [ $(grep test /proc/self/mountinfo | wc -l) -eq 1 ] ; then echo SUCCESS ; else echo FAILURE ; fi'
sleep 1
umount -l /mnt/b
wait %%
$ unshare -Urm ./locked_mounts_test.sh
This failure is corrected by removing the prepass that marks mounts
that may be umounted.
A first pass is added that umounts mounts if possible and if not sets
mount mark if they could be unmounted if they weren't locked and adds
them to a list to umount possibilities. This first pass reconsiders
the mounts parent if it is on the list of umount possibilities, ensuring
that information of umoutability will pass from child to mount parent.
A second pass then walks through all mounts that are umounted and processes
their children unmounting them or marking them for reparenting.
A last pass cleans up the state on the mounts that could not be umounted
and if applicable reparents them to their first parent that remained
mounted.
While a bit longer than the old code this code is much more robust
as it allows information to flow up from the leaves and down
from the trunk making the order in which mounts are encountered
in the umount propgation tree irrelevant.
Fixes: 0c56fe31420c ("mnt: Don't propagate unmounts to locked mounts")
Reviewed-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 570487d3faf2a1d8a220e6ee10f472163123d7da upstream.
It was observed that in some pathlogical cases that the current code
does not unmount everything it should. After investigation it
was determined that the issue is that mnt_change_mntpoint can
can change which mounts are available to be unmounted during mount
propagation which is wrong.
The trivial reproducer is:
$ cat ./pathological.sh
mount -t tmpfs test-base /mnt
cd /mnt
mkdir 1 2 1/1
mount --bind 1 1
mount --make-shared 1
mount --bind 1 2
mount --bind 1/1 1/1
mount --bind 1/1 1/1
echo
grep test-base /proc/self/mountinfo
umount 1/1
echo
grep test-base /proc/self/mountinfo
$ unshare -Urm ./pathological.sh
The expected output looks like:
46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
The output without the fix looks like:
46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
52 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
That last mount in the output was in the propgation tree to be unmounted but
was missed because the mnt_change_mountpoint changed it's parent before the walk
through the mount propagation tree observed it.
Fixes: 1064f874abc0 ("mnt: Tuck mounts under others instead of creating shadow/side mounts.")
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When umount of a partition fails with EBUSY there is
no indication as to what is keeping the mount point busy.
Add support to print a kernel log showing what files
are open on this mount point.
Also add a new new config option CONFIG_FILE_TABLE_DEBUG to
enable this feature.
Change-Id: Id7a3f5e7291b22ffd0f265848ec0a9757f713561
Signed-off-by: Nikhilesh Reddy <reddyn@codeaurora.org>
Signed-off-by: Ankit Jain <ankijain@codeaurora.org>
|
|
Changes in 4.4.65:
tipc: make sure IPv6 header fits in skb headroom
tipc: make dist queue pernet
tipc: re-enable compensation for socket receive buffer double counting
tipc: correct error in node fsm
tty: nozomi: avoid a harmless gcc warning
hostap: avoid uninitialized variable use in hfa384x_get_rid
gfs2: avoid uninitialized variable warning
tipc: fix random link resets while adding a second bearer
tipc: fix socket timer deadlock
mnt: Add a per mount namespace limit on the number of mounts
xc2028: avoid use after free
netfilter: nfnetlink: correctly validate length of batch messages
tipc: check minimum bearer MTU
vfio/pci: Fix integer overflows, bitmask check
staging/android/ion : fix a race condition in the ion driver
ping: implement proper locking
perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
Linux 4.4.65
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
|
commit d29216842a85c7970c536108e093963f02714498 upstream.
CAI Qian <caiqian@redhat.com> pointed out that the semantics
of shared subtrees make it possible to create an exponentially
increasing number of mounts in a mount namespace.
mkdir /tmp/1 /tmp/2
mount --make-rshared /
for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done
Will create create 2^20 or 1048576 mounts, which is a practical problem
as some people have managed to hit this by accident.
As such CVE-2016-6213 was assigned.
Ian Kent <raven@themaw.net> described the situation for autofs users
as follows:
> The number of mounts for direct mount maps is usually not very large because of
> the way they are implemented, large direct mount maps can have performance
> problems. There can be anywhere from a few (likely case a few hundred) to less
> than 10000, plus mounts that have been triggered and not yet expired.
>
> Indirect mounts have one autofs mount at the root plus the number of mounts that
> have been triggered and not yet expired.
>
> The number of autofs indirect map entries can range from a few to the common
> case of several thousand and in rare cases up to between 30000 and 50000. I've
> not heard of people with maps larger than 50000 entries.
>
> The larger the number of map entries the greater the possibility for a large
> number of active mounts so it's not hard to expect cases of a 1000 or somewhat
> more active mounts.
So I am setting the default number of mounts allowed per mount
namespace at 100,000. This is more than enough for any use case I
know of, but small enough to quickly stop an exponential increase
in mounts. Which should be perfect to catch misconfigurations and
malfunctioning programs.
For anyone who needs a higher limit this can be changed by writing
to the new /proc/sys/fs/mount-max sysctl.
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
am: 839d42687d
Change-Id: Id06d474c7f7b46c1ce05056028ca02837cacf215
|
|
commit 1064f874abc0d05eeed8993815f584d847b72486 upstream.
Ever since mount propagation was introduced in cases where a mount in
propagated to parent mount mountpoint pair that is already in use the
code has placed the new mount behind the old mount in the mount hash
table.
This implementation detail is problematic as it allows creating
arbitrary length mount hash chains.
Furthermore it invalidates the constraint maintained elsewhere in the
mount code that a parent mount and a mountpoint pair will have exactly
one mount upon them. Making it hard to deal with and to talk about
this special case in the mount code.
Modify mount propagation to notice when there is already a mount at
the parent mount and mountpoint where a new mount is propagating to
and place that preexisting mount on top of the new mount.
Modify unmount propagation to notice when a mount that is being
unmounted has another mount on top of it (and no other children), and
to replace the unmounted mount with the mount on top of it.
Move the MNT_UMUONT test from __lookup_mnt_last into
__propagate_umount as that is the only call of __lookup_mnt_last where
MNT_UMOUNT may be set on any mount visible in the mount hash table.
These modifications allow:
- __lookup_mnt_last to be removed.
- attach_shadows to be renamed __attach_mnt and its shadow
handling to be removed.
- commit_tree to be simplified
- copy_tree to be simplified
The result is an easier to understand tree of mounts that does not
allow creation of arbitrary length hash chains in the mount hash table.
The result is also a very slight userspace visible difference in semantics.
The following two cases now behave identically, where before order
mattered:
case 1: (explicit user action)
B is a slave of A
mount something on A/a , it will propagate to B/a
and than mount something on B/a
case 2: (tucked mount)
B is a slave of A
mount something on B/a
and than mount something on A/a
Histroically umount A/a would fail in case 1 and succeed in case 2.
Now umount A/a succeeds in both configurations.
This very small change in semantics appears if anything to be a bug
fix to me and my survey of userspace leads me to believe that no programs
will notice or care of this subtle semantic change.
v2: Updated to mnt_change_mountpoint to not call dput or mntput
and instead to decrement the counts directly. It is guaranteed
that there will be other references when mnt_change_mountpoint is
called so this is safe.
v3: Moved put_mountpoint under mount_lock in attach_recursive_mnt
As the locking in fs/namespace.c changed between v2 and v3.
v4: Reworked the logic in propagate_mount_busy and __propagate_umount
that detects when a mount completely covers another mount.
v5: Removed unnecessary tests whose result is alwasy true in
find_topper and attach_recursive_mnt.
v6: Document the user space visible semantic difference.
Fixes: b90fa9ae8f51 ("[PATCH] shared mount handling: bind and rbind")
Tested-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Change-Id: Icf907f5067fb6da5935ab0d3271df54b8d5df405
|
|
Now we pass the vfsmount when mounting and remounting.
This allows the filesystem to actually set up the mount
specific data, although we can't quite do anything with
it yet. show_options is expanded to include data that
lives with the mount.
To avoid changing existing filesystems, these have
been added as new vfs functions.
Change-Id: If80670bfad9f287abb8ac22457e1b034c9697097
Signed-off-by: Daniel Rosenberg <drosen@google.com>
|
|
This starts to add private data associated directly
to mount points. The intent is to give filesystems
a sense of where they have come from, as a means of
letting a filesystem take different actions based on
this information.
Change-Id: Ie769d7b3bb2f5972afe05c1bf16cf88c91647ab2
Signed-off-by: Daniel Rosenberg <drosen@google.com>
|
|
commit 3895dbf8985f656675b5bde610723a29cbce3fa7 upstream.
Protecting the mountpoint hashtable with namespace_sem was sufficient
until a call to umount_mnt was added to mntput_no_expire. At which
point it became possible for multiple calls of put_mountpoint on
the same hash chain to happen on the same time.
Kristen Johansen <kjlx@templeofstupid.com> reported:
> This can cause a panic when simultaneous callers of put_mountpoint
> attempt to free the same mountpoint. This occurs because some callers
> hold the mount_hash_lock, while others hold the namespace lock. Some
> even hold both.
>
> In this submitter's case, the panic manifested itself as a GP fault in
> put_mountpoint() when it called hlist_del() and attempted to dereference
> a m_hash.pprev that had been poisioned by another thread.
Al Viro observed that the simple fix is to switch from using the namespace_sem
to the mount_lock to protect the mountpoint hash table.
I have taken Al's suggested patch moved put_mountpoint in pivot_root
(instead of taking mount_lock an additional time), and have replaced
new_mountpoint with get_mountpoint a function that does the hash table
lookup and addition under the mount_lock. The introduction of get_mounptoint
ensures that only the mount_lock is needed to manipulate the mountpoint
hashtable.
d_set_mounted is modified to only set DCACHE_MOUNTED if it is not
already set. This allows get_mountpoint to use the setting of
DCACHE_MOUNTED to ensure adding a struct mountpoint for a dentry
happens exactly once.
Fixes: ce07d891a089 ("mnt: Honor MNT_LOCKED when detaching mounts")
Reported-by: Krister Johansen <kjlx@templeofstupid.com>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e06b933e6ded42384164d28a2060b7f89243b895 upstream.
- m_start() in fs/namespace.c expects that ns->event is incremented each
time a mount added or removed from ns->list.
- umount_tree() removes items from the list but does not increment event
counter, expecting that it's done before the function is called.
- There are some codepaths that call umount_tree() without updating
"event" counter. e.g. from __detach_mounts().
- When this happens m_start may reuse a cached mount structure that no
longer belongs to ns->list (i.e. use after free which usually leads
to infinite loop).
This change fixes the above problem by incrementing global event counter
before invoking umount_tree().
Change-Id: I622c8e84dcb9fb63542372c5dbf0178ee86bb589
Signed-off-by: Andrey Ulanov <andreyu@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 97c1df3e54e811aed484a036a798b4b25d002ecf upstream.
Add this trivial missing error handling.
Fixes: 1b852bceb0d1 ("mnt: Refactor the logic for mounting sysfs and proc in a user namespace")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 695e9df010e40f407f4830dc11d53dce957710ba upstream.
In rare cases it is possible for s_flags & MS_RDONLY to be set but
MNT_READONLY to be clear. This starting combination can cause
fs_fully_visible to fail to ensure that the new mount is readonly.
Therefore force MNT_LOCK_READONLY in the new mount if MS_RDONLY
is set on the source filesystem of the mount.
In general both MS_RDONLY and MNT_READONLY are set at the same for
mounts so I don't expect any programs to care. Nor do I expect
MS_RDONLY to be set on proc or sysfs in the initial user namespace,
which further decreases the likelyhood of problems.
Which means this change should only affect system configurations by
paranoid sysadmins who should welcome the additional protection
as it keeps people from wriggling out of their policies.
Fixes: 8c6cf9cc829f ("mnt: Modify fs_fully_visible to deal with locked ro nodev and atime")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d71ed6c930ac7d8f88f3cef6624a7e826392d61f upstream.
MNT_LOCKED implies on a child mount implies the child is locked to the
parent. So while looping through the children the children should be
tested (not their parent).
Typically an unshare of a mount namespace locks all mounts together
making both the parent and the slave as locked but there are a few
corner cases where other things work.
Fixes: ceeb0e5d39fc ("vfs: Ignore unlocked mounts in fs_fully_visible")
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace updates from Eric Biederman:
"This finishes up the changes to ensure proc and sysfs do not start
implementing executable files, as the there are application today that
are only secure because such files do not exist.
It akso fixes a long standing misfeature of /proc/<pid>/mountinfo that
did not show the proper source for files bind mounted from
/proc/<pid>/ns/*.
It also straightens out the handling of clone flags related to user
namespaces, fixing an unnecessary failure of unshare(CLONE_NEWUSER)
when files such as /proc/<pid>/environ are read while <pid> is calling
unshare. This winds up fixing a minor bug in unshare flag handling
that dates back to the first version of unshare in the kernel.
Finally, this fixes a minor regression caused by the introduction of
sysfs_create_mount_point, which broke someone's in house application,
by restoring the size of /sys/fs/cgroup to 0 bytes. Apparently that
application uses the directory size to determine if a tmpfs is mounted
on /sys/fs/cgroup.
The bind mount escape fixes are present in Al Viros for-next branch.
and I expect them to come from there. The bind mount escape is the
last of the user namespace related security bugs that I am aware of"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
fs: Set the size of empty dirs to 0.
userns,pidns: Force thread group sharing, not signal handler sharing.
unshare: Unsharing a thread does not require unsharing a vm
nsfs: Add a show_path method to fix mountinfo
mnt: fs_fully_visible enforce noexec and nosuid if !SB_I_NOEXEC
vfs: Commit to never having exectuables on proc and sysfs.
|
|
The handling of in detach_mounts of unmounted but connected mounts is
buggy and can lead to an infinite loop.
Correct the handling of unmounted mounts in detach_mount. When the
mountpoint of an unmounted but connected mount is connected to a
dentry, and that dentry is deleted we need to disconnect that mount
from the parent mount and the deleted dentry.
Nothing changes for the unmounted and connected children. They can be
safely ignored.
Cc: stable@vger.kernel.org
Fixes: ce07d891a0891d3c0d0c2d73d577490486b809e1 mnt: Honor MNT_LOCKED when detaching mounts
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
rmdir mntpoint will result in an infinite loop when there is
a mount locked on the mountpoint in another mount namespace.
This is because the logic to test to see if a mount should
be disconnected in umount_tree is buggy.
Move the logic to decide if a mount should remain connected to
it's mountpoint into it's own function disconnect_mount so that
clarity of expression instead of terseness of expression becomes
a virtue.
When the conditions where it is invalid to leave a mount connected
are first ruled out, the logic for deciding if a mount should
be disconnected becomes much clearer and simpler.
Fixes: e0c9c0afd2fc958ffa34b697972721d81df8a56f mnt: Update detach_mounts to leave mounts connected
Fixes: ce07d891a0891d3c0d0c2d73d577490486b809e1 mnt: Honor MNT_LOCKED when detaching mounts
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
The filesystems proc and sysfs do not have executable files do not
have exectuable files today and portions of userspace break if we do
enforce nosuid and noexec consistency of nosuid and noexec flags
between previous mounts and new mounts of proc and sysfs.
Add the code to enforce consistency of the nosuid and noexec flags,
and use the presence of SB_I_NOEXEC to signal that there is no need to
bother.
This results in a completely userspace invisible change that makes it
clear fs_fully_visible can only skip the enforcement of noexec and
nosuid because it is known the filesystems in question do not support
executables.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace updates from Eric Biederman:
"Long ago and far away when user namespaces where young it was realized
that allowing fresh mounts of proc and sysfs with only user namespace
permissions could violate the basic rule that only root gets to decide
if proc or sysfs should be mounted at all.
Some hacks were put in place to reduce the worst of the damage could
be done, and the common sense rule was adopted that fresh mounts of
proc and sysfs should allow no more than bind mounts of proc and
sysfs. Unfortunately that rule has not been fully enforced.
There are two kinds of gaps in that enforcement. Only filesystems
mounted on empty directories of proc and sysfs should be ignored but
the test for empty directories was insufficient. So in my tree
directories on proc, sysctl and sysfs that will always be empty are
created specially. Every other technique is imperfect as an ordinary
directory can have entries added even after a readdir returns and
shows that the directory is empty. Special creation of directories
for mount points makes the code in the kernel a smidge clearer about
it's purpose. I asked container developers from the various container
projects to help test this and no holes were found in the set of mount
points on proc and sysfs that are created specially.
This set of changes also starts enforcing the mount flags of fresh
mounts of proc and sysfs are consistent with the existing mount of
proc and sysfs. I expected this to be the boring part of the work but
unfortunately unprivileged userspace winds up mounting fresh copies of
proc and sysfs with noexec and nosuid clear when root set those flags
on the previous mount of proc and sysfs. So for now only the atime,
read-only and nodev attributes which userspace happens to keep
consistent are enforced. Dealing with the noexec and nosuid
attributes remains for another time.
This set of changes also addresses an issue with how open file
descriptors from /proc/<pid>/ns/* are displayed. Recently readlink of
/proc/<pid>/fd has been triggering a WARN_ON that has not been
meaningful since it was added (as all of the code in the kernel was
converted) and is not now actively wrong.
There is also a short list of issues that have not been fixed yet that
I will mention briefly.
It is possible to rename a directory from below to above a bind mount.
At which point any directory pointers below the renamed directory can
be walked up to the root directory of the filesystem. With user
namespaces enabled a bind mount of the bind mount can be created
allowing the user to pick a directory whose children they can rename
to outside of the bind mount. This is challenging to fix and doubly
so because all obvious solutions must touch code that is in the
performance part of pathname resolution.
As mentioned above there is also a question of how to ensure that
developers by accident or with purpose do not introduce exectuable
files on sysfs and proc and in doing so introduce security regressions
in the current userspace that will not be immediately obvious and as
such are likely to require breaking userspace in painful ways once
they are recognized"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
vfs: Remove incorrect debugging WARN in prepend_path
mnt: Update fs_fully_visible to test for permanently empty directories
sysfs: Create mountpoints with sysfs_create_mount_point
sysfs: Add support for permanently empty directories to serve as mount points.
kernfs: Add support for always empty directories.
proc: Allow creating permanently empty directories that serve as mount points
sysctl: Allow creating permanently empty directories that serve as mountpoints.
fs: Add helper functions for permanently empty directories.
vfs: Ignore unlocked mounts in fs_fully_visible
mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
mnt: Refactor the logic for mounting sysfs and proc in a user namespace
|
|
fs_fully_visible attempts to make fresh mounts of proc and sysfs give
the mounter no more access to proc and sysfs than if they could have
by creating a bind mount. One aspect of proc and sysfs that makes
this particularly tricky is that there are other filesystems that
typically mount on top of proc and sysfs. As those filesystems are
mounted on empty directories in practice it is safe to ignore them.
However testing to ensure filesystems are mounted on empty directories
has not been something the in kernel data structures have supported so
the current test for an empty directory which checks to see
if nlink <= 2 is a bit lacking.
proc and sysfs have recently been modified to use the new empty_dir
infrastructure to create all of their dedicated mount points. Instead
of testing for S_ISDIR(inode->i_mode) && i_nlink <= 2 to see if a
directory is empty, test for is_empty_dir_inode(inode). That small
change guaranteess mounts found on proc and sysfs really are safe to
ignore, because the directories are not only empty but nothing can
ever be added to them. This guarantees there is nothing to worry
about when mounting proc and sysfs.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Limit the mounts fs_fully_visible considers to locked mounts.
Unlocked can always be unmounted so considering them adds hassle
but no security benefit.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
A patchset to remove support for passing pre-allocated struct seq_file to
seq_open(). Such feature is undocumented and prone to error.
In particular, if seq_release() is used in release handler, it will
kfree() a pointer which was not allocated by seq_open().
So this patchset drops support for pre-allocated struct seq_file: it's
only of use in proc_namespace.c and can be easily replaced by using
seq_open_private()/seq_release_private().
Additionally, it documents the use of file->private_data to hold pointer
to struct seq_file by seq_open().
This patch (of 3):
Since patch described below, from v2.6.15-rc1, seq_open() could use a
struct seq_file already allocated by the caller if the pointer to the
structure is stored in file->private_data before calling the function.
Commit 1abe77b0fc4b485927f1f798ae81a752677e1d05
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Mon Nov 7 17:15:34 2005 -0500
[PATCH] allow callers of seq_open do allocation themselves
Allow caller of seq_open() to kmalloc() seq_file + whatever else they
want and set ->private_data to it. seq_open() will then abstain from
doing allocation itself.
Such behavior is only used by mounts_open_common().
In order to drop support for such uncommon feature, proc_mounts is
converted to use seq_open_private(), which take care of allocating the
proc_mounts structure, making it available through ->private in struct
seq_file.
Conversely, proc_mounts is converted to use seq_release_private(), in
order to release the private structure allocated by seq_open_private().
Then, ->private is used directly instead of proc_mounts() macro to access
to the proc_mounts structure.
Link: http://lkml.kernel.org/r/cover.1433193673.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Ignore an existing mount if the locked readonly, nodev or atime
attributes are less permissive than the desired attributes
of the new mount.
On success ensure the new mount locks all of the same readonly, nodev and
atime attributes as the old mount.
The nosuid and noexec attributes are not checked here as this change
is destined for stable and enforcing those attributes causes a
regression in lxc and libvirt-lxc where those applications will not
start and there are no known executables on sysfs or proc and no known
way to create exectuables without code modifications
Cc: stable@vger.kernel.org
Fixes: e51db73532955 ("userns: Better restrictions on when proc and sysfs can be mounted")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Fresh mounts of proc and sysfs are a very special case that works very
much like a bind mount. Unfortunately the current structure can not
preserve the MNT_LOCK... mount flags. Therefore refactor the logic
into a form that can be modified to preserve those lock bits.
Add a new filesystem flag FS_USERNS_VISIBLE that requires some mount
of the filesystem be fully visible in the current mount namespace,
before the filesystem may be mounted.
Move the logic for calling fs_fully_visible from proc and sysfs into
fs/namespace.c where it has greater access to mount namespace state.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
same as legitimize_mnt(), except that it does *not* drop and regain
rcu_read_lock; return values are
0 => grabbed a reference, we are fine
1 => failed, just go away
-1 => failed, go away and mntput(bastard) when outside of rcu_read_lock
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This fixes a dumb bug in fs_fully_visible that allows proc or sys to
be mounted if there is a bind mount of part of /proc/ or /sys/ visible.
Cc: stable@vger.kernel.org
Reported-by: Eric Windisch <ewindisch@docker.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Now that it is possible to lazily unmount an entire mount tree and
leave the individual mounts connected to each other add a new flag
UMOUNT_CONNECTED to umount_tree to force this behavior and use
this flag in detach_mounts.
This closes a bug where the deletion of a file or directory could
trigger an unmount and reveal data under a mount point.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
lookup_mountpoint can return either NULL or an error value.
Update the test in __detach_mounts to test for an error value
to avoid pathological cases causing a NULL pointer dereferences.
The callers of __detach_mounts should prevent it from ever being
called on an unlinked dentry but don't take any chances.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Modify umount(MNT_DETACH) to keep mounts in the hash table that are
locked to their parent mounts, when the parent is lazily unmounted.
In mntput_no_expire detach the children from the hash table, depending
on mnt_pin_kill in cleanup_mnt to decrement the mnt_count of the children.
In __detach_mounts if there are any mounts that have been unmounted
but still are on the list of mounts of a mountpoint, remove their
children from the mount hash table and those children to the unmounted
list so they won't linger potentially indefinitely waiting for their
final mntput, now that the mounts serve no purpose.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
For future use factor out a function umount_mnt from umount_tree.
This function unhashes a mount and remembers where the mount
was mounted so that eventually when the code makes it to a
sleeping context the mountpoint can be dput.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Create a function unhash_mnt that contains the common code between
detach_mnt and umount_tree, and use unhash_mnt in place of the common
code. This add a unncessary list_del_init(mnt->mnt_child) into
umount_tree but given that mnt_child is already empty this extra
line is a noop.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
The only users of collect_mounts are in audit_tree.c
In audit_trim_trees and audit_add_tree_rule the path passed into
collect_mounts is generated from kern_path passed an audit_tree
pathname which is guaranteed to be an absolute path. In those cases
collect_mounts is obviously intended to work on mounted paths and
if a race results in paths that are unmounted when collect_mounts
it is reasonable to fail early.
The paths passed into audit_tag_tree don't have the absolute path
check. But are used to play with fsnotify and otherwise interact with
the audit_trees, so again operating only on mounted paths appears
reasonable.
Avoid having to worry about what happens when we try and audit
unmounted filesystems by restricting collect_mounts to mounts
that appear in the mount tree.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
A prerequisite of calling umount_tree is that the point where the tree
is mounted at is valid to unmount.
If we are propagating the effect of the unmount clear MNT_LOCKED in
every instance where the same filesystem is mounted on the same
mountpoint in the mount tree, as we know (by virtue of the fact
that umount_tree was called) that it is safe to reveal what
is at that mountpoint.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
- Modify __lookup_mnt_hash_last to ignore mounts that have MNT_UMOUNTED set.
- Don't remove mounts from the mount hash table in propogate_umount
- Don't remove mounts from the mount hash table in umount_tree before
the entire list of mounts to be umounted is selected.
- Remove mounts from the mount hash table as the last thing that
happens in the case where a mount has a parent in umount_tree.
Mounts without parents are not hashed (by definition).
This paves the way for delaying removal from the mount hash table even
farther and fixing the MNT_LOCKED vs MNT_DETACH issue.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
In some instances it is necessary to know if the the unmounting
process has begun on a mount. Add MNT_UMOUNT to make that reliably
testable.
This fix gets used in fixing locked mounts in MNT_DETACH
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
umount_tree builds a list of mounts that need to be unmounted.
Utilize mnt_list for this purpose instead of mnt_hash. This begins to
allow keeping a mount on the mnt_hash after it is unmounted, which is
necessary for a properly functioning MNT_LOCKED implementation.
The fact that mnt_list is an ordinary list makding available list_move
is nice bonus.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Invoking mount propagation from __detach_mounts is inefficient and
wrong.
It is inefficient because __detach_mounts already walks the list of
mounts that where something needs to be done, and mount propagation
walks some subset of those mounts again.
It is actively wrong because if the dentry that is passed to
__detach_mounts is not part of the path to a mount that mount should
not be affected.
change_mnt_propagation(p,MS_PRIVATE) modifies the mount propagation
tree of a master mount so it's slaves are connected to another master
if possible. Which means even removing a mount from the middle of a
mount tree with __detach_mounts will not deprive any mount propagated
mount events.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
- Remove the unneeded declaration from pnode.h
- Mark umount_tree static as it has no callers outside of namespace.c
- Define an enumeration of umount_tree's flags.
- Pass umount_tree's flags in by name
This removes the magic numbers 0, 1 and 2 making the code a little
clearer and makes it possible for there to be lazy unmounts that don't
propagate. Which is what __detach_mounts actually wants for example.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Small cleanup to make the code more readable and maintainable.
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
|
|
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );
my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);
foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}
[AV: overlayfs parts skipped]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc VFS updates from Al Viro:
"This cycle a lot of stuff sits on topical branches, so I'll be sending
more or less one pull request per branch.
This is the first pile; more to follow in a few. In this one are
several misc commits from early in the cycle (before I went for
separate branches), plus the rework of mntput/dput ordering on umount,
switching to use of fs_pin instead of convoluted games in
namespace_unlock()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch the IO-triggering parts of umount to fs_pin
new fs_pin killing logics
allow attaching fs_pin to a group not associated with some superblock
get rid of the second argument of acct_kill()
take count and rcu_head out of fs_pin
dcache: let the dentry count go down to zero without taking d_lock
pull bumping refcount into ->kill()
kill pin_put()
mode_t whack-a-mole: chelsio
file->f_path.dentry is pinned down for as long as the file is open...
get rid of lustre_dump_dentry()
gut proc_register() a bit
kill d_validate()
ncpfs: get rid of d_validate() nonsense
selinuxfs: don't open-code d_genocide()
|
|
VFS frequently performs duplication of strings located in read-only memory
section. Replacing kstrdup by kstrdup_const allows to avoid such
operations.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|