summaryrefslogtreecommitdiff
path: root/fs/pnode.c
AgeCommit message (Collapse)Author
2020-05-03ANDROID: mnt: Propagate remount correctlyDaniel Rosenberg
This switches over to propagation_next to respect namepsace semantics. Test: Remounting to change the options of a fs with mount based options should propagate to all shared copies of that mount, and the slaves/indirect slaves of those. Bug: 122428178 Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: Ic35cd2782a646435689f5bedfa1f218fe4ab8254
2020-05-02Merge 4.4.221 into android-4.4-pGreg Kroah-Hartman
Changes in 4.4.221 ext4: fix extent_status fragmentation for plain files ALSA: hda - Fix incorrect usage of IS_REACHABLE() net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg() net: ipv4: avoid unused variable warning for sysctl crypto: mxs-dcp - make symbols 'sha1_null_hash' and 'sha256_null_hash' static vti4: removed duplicate log message. scsi: lpfc: Fix kasan slab-out-of-bounds error in lpfc_unreg_login ceph: return ceph_mdsc_do_request() errors from __get_parent() ceph: don't skip updating wanted caps when cap is stale pwm: rcar: Fix late Runtime PM enablement scsi: iscsi: Report unbind session event when the target has been removed ASoC: Intel: atom: Take the drv->lock mutex before calling sst_send_slot_map() kernel/gcov/fs.c: gcov_seq_next() should increase position index ipc/util.c: sysvipc_find_ipc() should increase position index s390/cio: avoid duplicated 'ADD' uevents pwm: renesas-tpu: Fix late Runtime PM enablement pwm: bcm2835: Dynamically allocate base ipv6: fix restrict IPV6_ADDRFORM operation macvlan: fix null dereference in macvlan_device_event() net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node net/x25: Fix x25_neigh refcnt leak when receiving frame tcp: cache line align MAX_TCP_HEADER team: fix hang in team_mode_get() xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish ALSA: hda: Remove ASUS ROG Zenith from the blacklist iio: xilinx-xadc: Fix ADC-B powerdown iio: xilinx-xadc: Fix clearing interrupt when enabling trigger iio: xilinx-xadc: Fix sequencer configuration for aux channels in simultaneous mode fs/namespace.c: fix mountpoint reference counter race USB: sisusbvga: Change port variable from signed to unsigned USB: Add USB_QUIRK_DELAY_CTRL_MSG and USB_QUIRK_DELAY_INIT for Corsair K70 RGB RAPIDFIRE drivers: usb: core: Don't disable irqs in usb_sg_wait() during URB submit. drivers: usb: core: Minimize irq disabling in usb_sg_cancel() USB: core: Fix free-while-in-use bug in the USB S-Glibrary USB: hub: Fix handling of connect changes during sleep ALSA: usx2y: Fix potential NULL dereference ALSA: usb-audio: Fix usb audio refcnt leak when getting spdif ALSA: usb-audio: Filter out unsupported sample rates on Focusrite devices KVM: Check validity of resolved slot when searching memslots KVM: VMX: Enable machine check support for 32bit targets tty: hvc: fix buffer overflow during hvc_alloc(). tty: rocket, avoid OOB access usb-storage: Add unusual_devs entry for JMicron JMS566 audit: check the length of userspace generated audit records ASoC: dapm: fixup dapm kcontrol widget ARM: imx: provide v7_cpu_resume() only on ARM_CPU_SUSPEND=y staging: comedi: dt2815: fix writing hi byte of analog output staging: comedi: Fix comedi_device refcnt leak in comedi_open staging: vt6656: Fix drivers TBTT timing counter. staging: vt6656: Power save stop wake_up_count wrap around. UAS: no use logging any details in case of ENODEV UAS: fix deadlock in error handling and PM flushing work usb: f_fs: Clear OS Extended descriptor counts to zero in ffs_data_reset() remoteproc: Fix wrong rvring index computation sctp: use right member as the param of list_for_each_entry fuse: fix possibly missed wake-up after abort mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer usb: gadget: udc: bdc: Remove unnecessary NULL checks in bdc_req_complete net/cxgb4: Check the return from t4_query_params properly perf/core: fix parent pid/tid in task exit events bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_B scsi: target: fix PR IN / READ FULL STATUS for FC xen/xenbus: ensure xenbus_map_ring_valloc() returns proper grant status ext4: convert BUG_ON's to WARN_ON's in mballoc.c ext4: avoid declaring fs inconsistent due to invalid file handles ext4: protect journal inode's blocks using block_validity ext4: don't perform block validity checks on the journal inode ext4: fix block validity checks for journal inodes using indirect blocks ext4: unsigned int compared against zero propagate_one(): mnt_set_mountpoint() needs mount_lock Linux 4.4.221 Change-Id: I95cadd4206a7c89541de002faacea3a28e7b1ac3 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-05-02propagate_one(): mnt_set_mountpoint() needs mount_lockAl Viro
commit b0d3869ce9eeacbb1bbd541909beeef4126426d5 upstream. ... to protect the modification of mp->m_count done by it. Most of the places that modify that thing also have namespace_lock held, but not all of them can do so, so we really need mount_lock here. Kudos to Piotr Krysiuk <piotras@gmail.com>, who'd spotted a related bug in pivot_root(2) (fixed unnoticed in 5.3); search for other similar turds has caught out this one. Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21Merge 4.4.78 into android-4.4Greg Kroah-Hartman
Changes in 4.4.78 net_sched: fix error recovery at qdisc creation net: sched: Fix one possible panic when no destroy callback net/phy: micrel: configure intterupts after autoneg workaround ipv6: avoid unregistering inet6_dev for loopback net: dp83640: Avoid NULL pointer dereference. tcp: reset sk_rx_dst in tcp_disconnect() net: prevent sign extension in dev_get_stats() bpf: prevent leaking pointer via xadd on unpriviledged net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish() ipv6: dad: don't remove dynamic addresses if link is down net: ipv6: Compare lwstate in detecting duplicate nexthops vrf: fix bug_on triggered by rx when destroying a vrf rds: tcp: use sock_create_lite() to create the accept socket brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx() cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES cfg80211: Check if PMKID attribute is of expected size irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity parisc: Report SIGSEGV instead of SIGBUS when running out of stack parisc: use compat_sys_keyctl() parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs parisc/mm: Ensure IRQs are off in switch_mm() tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth kernel/extable.c: mark core_kernel_text notrace mm/list_lru.c: fix list_lru_count_node() to be race free fs/dcache.c: fix spin lockup issue on nlru->lock checkpatch: silence perl 5.26.0 unescaped left brace warnings binfmt_elf: use ELF_ET_DYN_BASE only for PIE arm: move ELF_ET_DYN_BASE to 4MB arm64: move ELF_ET_DYN_BASE to 4GB / 4MB powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB s390: reduce ELF_ET_DYN_BASE exec: Limit arg stack to at most 75% of _STK_LIM vt: fix unchecked __put_user() in tioclinux ioctls mnt: In umount propagation reparent in a separate pass mnt: In propgate_umount handle visiting mounts in any order mnt: Make propagate_umount less slow for overlapping mount propagation trees selftests/capabilities: Fix the test_execve test tpm: Get rid of chip->pdev tpm: Provide strong locking for device removal Add "shutdown" to "struct class". tpm: Issue a TPM2_Shutdown for TPM2 devices. mm: fix overflow check in expand_upwards() crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD crypto: atmel - only treat EBUSY as transient if backlog crypto: sha1-ssse3 - Disable avx2 crypto: caam - fix signals handling sched/topology: Fix overlapping sched_group_mask sched/topology: Optimize build_group_mask() PM / wakeirq: Convert to SRCU PM / QoS: return -EINVAL for bogus strings tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results KVM: x86: disable MPX if host did not enable MPX XSAVE features kvm: vmx: Do not disable intercepts for BNDCFGS kvm: x86: Guest BNDCFGS requires guest MPX support kvm: vmx: Check value written to IA32_BNDCFGS kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS Linux 4.4.78 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-21mnt: Make propagate_umount less slow for overlapping mount propagation treesEric W. Biederman
commit 296990deb389c7da21c78030376ba244dc1badf5 upstream. Andrei Vagin pointed out that time to executue propagate_umount can go non-linear (and take a ludicrious amount of time) when the mount propogation trees of the mounts to be unmunted by a lazy unmount overlap. Make the walk of the mount propagation trees nearly linear by remembering which mounts have already been visited, allowing subsequent walks to detect when walking a mount propgation tree or a subtree of a mount propgation tree would be duplicate work and to skip them entirely. Walk the list of mounts whose propgatation trees need to be traversed from the mount highest in the mount tree to mounts lower in the mount tree so that odds are higher that the code will walk the largest trees first, allowing later tree walks to be skipped entirely. Add cleanup_umount_visitation to remover the code's memory of which mounts have been visited. Add the functions last_slave and skip_propagation_subtree to allow skipping appropriate parts of the mount propagation tree without needing to change the logic of the rest of the code. A script to generate overlapping mount propagation trees: $ cat runs.h set -e mount -t tmpfs zdtm /mnt mkdir -p /mnt/1 /mnt/2 mount -t tmpfs zdtm /mnt/1 mount --make-shared /mnt/1 mkdir /mnt/1/1 iteration=10 if [ -n "$1" ] ; then iteration=$1 fi for i in $(seq $iteration); do mount --bind /mnt/1/1 /mnt/1/1 done mount --rbind /mnt/1 /mnt/2 TIMEFORMAT='%Rs' nr=$(( ( 2 ** ( $iteration + 1 ) ) + 1 )) echo -n "umount -l /mnt/1 -> $nr " time umount -l /mnt/1 nr=$(cat /proc/self/mountinfo | grep zdtm | wc -l ) time umount -l /mnt/2 $ for i in $(seq 9 19); do echo $i; unshare -Urm bash ./run.sh $i; done Here are the performance numbers with and without the patch: mhash | 8192 | 8192 | 1048576 | 1048576 mounts | before | after | before | after ------------------------------------------------ 1025 | 0.040s | 0.016s | 0.038s | 0.019s 2049 | 0.094s | 0.017s | 0.080s | 0.018s 4097 | 0.243s | 0.019s | 0.206s | 0.023s 8193 | 1.202s | 0.028s | 1.562s | 0.032s 16385 | 9.635s | 0.036s | 9.952s | 0.041s 32769 | 60.928s | 0.063s | 44.321s | 0.064s 65537 | | 0.097s | | 0.097s 131073 | | 0.233s | | 0.176s 262145 | | 0.653s | | 0.344s 524289 | | 2.305s | | 0.735s 1048577 | | 7.107s | | 2.603s Andrei Vagin reports fixing the performance problem is part of the work to fix CVE-2016-6213. Fixes: a05964f3917c ("[PATCH] shared mounts handling: umount") Reported-by: Andrei Vagin <avagin@openvz.org> Reviewed-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21mnt: In propgate_umount handle visiting mounts in any orderEric W. Biederman
commit 99b19d16471e9c3faa85cad38abc9cbbe04c6d55 upstream. While investigating some poor umount performance I realized that in the case of overlapping mount trees where some of the mounts are locked the code has been failing to unmount all of the mounts it should have been unmounting. This failure to unmount all of the necessary mounts can be reproduced with: $ cat locked_mounts_test.sh mount -t tmpfs test-base /mnt mount --make-shared /mnt mkdir -p /mnt/b mount -t tmpfs test1 /mnt/b mount --make-shared /mnt/b mkdir -p /mnt/b/10 mount -t tmpfs test2 /mnt/b/10 mount --make-shared /mnt/b/10 mkdir -p /mnt/b/10/20 mount --rbind /mnt/b /mnt/b/10/20 unshare -Urm --propagation unchaged /bin/sh -c 'sleep 5; if [ $(grep test /proc/self/mountinfo | wc -l) -eq 1 ] ; then echo SUCCESS ; else echo FAILURE ; fi' sleep 1 umount -l /mnt/b wait %% $ unshare -Urm ./locked_mounts_test.sh This failure is corrected by removing the prepass that marks mounts that may be umounted. A first pass is added that umounts mounts if possible and if not sets mount mark if they could be unmounted if they weren't locked and adds them to a list to umount possibilities. This first pass reconsiders the mounts parent if it is on the list of umount possibilities, ensuring that information of umoutability will pass from child to mount parent. A second pass then walks through all mounts that are umounted and processes their children unmounting them or marking them for reparenting. A last pass cleans up the state on the mounts that could not be umounted and if applicable reparents them to their first parent that remained mounted. While a bit longer than the old code this code is much more robust as it allows information to flow up from the leaves and down from the trunk making the order in which mounts are encountered in the umount propgation tree irrelevant. Fixes: 0c56fe31420c ("mnt: Don't propagate unmounts to locked mounts") Reviewed-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21mnt: In umount propagation reparent in a separate passEric W. Biederman
commit 570487d3faf2a1d8a220e6ee10f472163123d7da upstream. It was observed that in some pathlogical cases that the current code does not unmount everything it should. After investigation it was determined that the issue is that mnt_change_mntpoint can can change which mounts are available to be unmounted during mount propagation which is wrong. The trivial reproducer is: $ cat ./pathological.sh mount -t tmpfs test-base /mnt cd /mnt mkdir 1 2 1/1 mount --bind 1 1 mount --make-shared 1 mount --bind 1 2 mount --bind 1/1 1/1 mount --bind 1/1 1/1 echo grep test-base /proc/self/mountinfo umount 1/1 echo grep test-base /proc/self/mountinfo $ unshare -Urm ./pathological.sh The expected output looks like: 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 The output without the fix looks like: 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 That last mount in the output was in the propgation tree to be unmounted but was missed because the mnt_change_mountpoint changed it's parent before the walk through the mount propagation tree observed it. Fixes: 1064f874abc0 ("mnt: Tuck mounts under others instead of creating shadow/side mounts.") Acked-by: Andrei Vagin <avagin@virtuozzo.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-29ANDROID: mnt: Fix next_descendentDaniel Rosenberg
next_descendent did not properly handle the case where the initial mount had no slaves. In this case, we would look for the next slave, but since don't have a master, the check for wrapping around to the start of the list will always fail. Instead, we check for this case, and ensure that we end the iteration when we come back to the root. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 62094374 Change-Id: I43dfcee041aa3730cb4b9a1161418974ef84812e
2017-04-30Merge 4.4.65 into android-4.4Greg Kroah-Hartman
Changes in 4.4.65: tipc: make sure IPv6 header fits in skb headroom tipc: make dist queue pernet tipc: re-enable compensation for socket receive buffer double counting tipc: correct error in node fsm tty: nozomi: avoid a harmless gcc warning hostap: avoid uninitialized variable use in hfa384x_get_rid gfs2: avoid uninitialized variable warning tipc: fix random link resets while adding a second bearer tipc: fix socket timer deadlock mnt: Add a per mount namespace limit on the number of mounts xc2028: avoid use after free netfilter: nfnetlink: correctly validate length of batch messages tipc: check minimum bearer MTU vfio/pci: Fix integer overflows, bitmask check staging/android/ion : fix a race condition in the ion driver ping: implement proper locking perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race Linux 4.4.65 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-30mnt: Add a per mount namespace limit on the number of mountsEric W. Biederman
commit d29216842a85c7970c536108e093963f02714498 upstream. CAI Qian <caiqian@redhat.com> pointed out that the semantics of shared subtrees make it possible to create an exponentially increasing number of mounts in a mount namespace. mkdir /tmp/1 /tmp/2 mount --make-rshared / for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done Will create create 2^20 or 1048576 mounts, which is a practical problem as some people have managed to hit this by accident. As such CVE-2016-6213 was assigned. Ian Kent <raven@themaw.net> described the situation for autofs users as follows: > The number of mounts for direct mount maps is usually not very large because of > the way they are implemented, large direct mount maps can have performance > problems. There can be anywhere from a few (likely case a few hundred) to less > than 10000, plus mounts that have been triggered and not yet expired. > > Indirect mounts have one autofs mount at the root plus the number of mounts that > have been triggered and not yet expired. > > The number of autofs indirect map entries can range from a few to the common > case of several thousand and in rare cases up to between 30000 and 50000. I've > not heard of people with maps larger than 50000 entries. > > The larger the number of map entries the greater the possibility for a large > number of active mounts so it's not hard to expect cases of a 1000 or somewhat > more active mounts. So I am setting the default number of mounts allowed per mount namespace at 100,000. This is more than enough for any use case I know of, but small enough to quickly stop an exponential increase in mounts. Which should be perfect to catch misconfigurations and malfunctioning programs. For anyone who needs a higher limit this can be changed by writing to the new /proc/sys/fs/mount-max sysctl. Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15mnt: Tuck mounts under others instead of creating shadow/side mounts.Eric W. Biederman
am: 839d42687d Change-Id: Id06d474c7f7b46c1ce05056028ca02837cacf215
2017-03-15mnt: Tuck mounts under others instead of creating shadow/side mounts.Eric W. Biederman
commit 1064f874abc0d05eeed8993815f584d847b72486 upstream. Ever since mount propagation was introduced in cases where a mount in propagated to parent mount mountpoint pair that is already in use the code has placed the new mount behind the old mount in the mount hash table. This implementation detail is problematic as it allows creating arbitrary length mount hash chains. Furthermore it invalidates the constraint maintained elsewhere in the mount code that a parent mount and a mountpoint pair will have exactly one mount upon them. Making it hard to deal with and to talk about this special case in the mount code. Modify mount propagation to notice when there is already a mount at the parent mount and mountpoint where a new mount is propagating to and place that preexisting mount on top of the new mount. Modify unmount propagation to notice when a mount that is being unmounted has another mount on top of it (and no other children), and to replace the unmounted mount with the mount on top of it. Move the MNT_UMUONT test from __lookup_mnt_last into __propagate_umount as that is the only call of __lookup_mnt_last where MNT_UMOUNT may be set on any mount visible in the mount hash table. These modifications allow: - __lookup_mnt_last to be removed. - attach_shadows to be renamed __attach_mnt and its shadow handling to be removed. - commit_tree to be simplified - copy_tree to be simplified The result is an easier to understand tree of mounts that does not allow creation of arbitrary length hash chains in the mount hash table. The result is also a very slight userspace visible difference in semantics. The following two cases now behave identically, where before order mattered: case 1: (explicit user action) B is a slave of A mount something on A/a , it will propagate to B/a and than mount something on B/a case 2: (tucked mount) B is a slave of A mount something on B/a and than mount something on A/a Histroically umount A/a would fail in case 1 and succeed in case 2. Now umount A/a succeeds in both configurations. This very small change in semantics appears if anything to be a bug fix to me and my survey of userspace leads me to believe that no programs will notice or care of this subtle semantic change. v2: Updated to mnt_change_mountpoint to not call dput or mntput and instead to decrement the counts directly. It is guaranteed that there will be other references when mnt_change_mountpoint is called so this is safe. v3: Moved put_mountpoint under mount_lock in attach_recursive_mnt As the locking in fs/namespace.c changed between v2 and v3. v4: Reworked the logic in propagate_mount_busy and __propagate_umount that detects when a mount completely covers another mount. v5: Removed unnecessary tests whose result is alwasy true in find_topper and attach_recursive_mnt. v6: Document the user space visible semantic difference. Fixes: b90fa9ae8f51 ("[PATCH] shared mount handling: bind and rbind") Tested-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-15Merge remote-tracking branch 'common/android-4.4' into android-4.4.yDmitry Shmidt
Change-Id: Icf907f5067fb6da5935ab0d3271df54b8d5df405
2017-01-26ANDROID: mnt: remount should propagate to slaves of slavesDaniel Rosenberg
propagate_remount was not accounting for the slave mounts of other slave mounts, leading to some namespaces not recieving the remount information. bug:33731928 Change-Id: Idc9e8c2ed126a4143229fc23f10a959c2d0a3854 Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-01-26ANDROID: mnt: Add filesystem private data to mount pointsDaniel Rosenberg
This starts to add private data associated directly to mount points. The intent is to give filesystems a sense of where they have come from, as a means of letting a filesystem take different actions based on this information. Change-Id: Ie769d7b3bb2f5972afe05c1bf16cf88c91647ab2 Signed-off-by: Daniel Rosenberg <drosen@google.com>
2016-05-11propogate_mnt: Handle the first propogated copy being a slaveEric W. Biederman
commit 5ec0811d30378ae104f250bfc9b3640242d81e3f upstream. When the first propgated copy was a slave the following oops would result: > BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 > IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > PGD bacd4067 PUD bac66067 PMD 0 > Oops: 0000 [#1] SMP > Modules linked in: > CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000 > RIP: 0010:[<ffffffff811fba4e>] [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > RSP: 0018:ffff8800bac3fd38 EFLAGS: 00010283 > RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010 > RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480 > RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000 > R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00 > FS: 00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0 > Stack: > ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85 > ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40 > 0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0 > Call Trace: > [<ffffffff811fbf85>] propagate_mnt+0x105/0x140 > [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0 > [<ffffffff811f1ec3>] graft_tree+0x63/0x70 > [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100 > [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0 > [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70 > [<ffffffff811f3a45>] SyS_mount+0x75/0xc0 > [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0 > [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25 > Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30 > RIP [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > RSP <ffff8800bac3fd38> > CR2: 0000000000000010 > ---[ end trace 2725ecd95164f217 ]--- This oops happens with the namespace_sem held and can be triggered by non-root users. An all around not pleasant experience. To avoid this scenario when finding the appropriate source mount to copy stop the walk up the mnt_master chain when the first source mount is encountered. Further rewrite the walk up the last_source mnt_master chain so that it is clear what is going on. The reason why the first source mount is special is that it it's mnt_parent is not a mount in the dest_mnt propagation tree, and as such termination conditions based up on the dest_mnt mount propgation tree do not make sense. To avoid other kinds of confusion last_dest is not changed when computing last_source. last_dest is only used once in propagate_one and that is above the point of the code being modified, so changing the global variable is meaningless and confusing. fixes: f2ebb3a921c1ca1e2ddd9242e95a1989a50c4c68 ("smarter propagate_mnt()") Reported-by: Tycho Andersen <tycho.andersen@canonical.com> Reviewed-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11fs/pnode.c: treat zero mnt_group_id-s as unequalMaxim Patlasov
commit 7ae8fd0351f912b075149a1e03a017be8b903b9a upstream. propagate_one(m) calculates "type" argument for copy_tree() like this: > if (m->mnt_group_id == last_dest->mnt_group_id) { > type = CL_MAKE_SHARED; > } else { > type = CL_SLAVE; > if (IS_MNT_SHARED(m)) > type |= CL_MAKE_SHARED; > } The "type" argument then governs clone_mnt() behavior with respect to flags and mnt_master of new mount. When we iterate through a slave group, it is possible that both current "m" and "last_dest" are not shared (although, both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison above erroneously makes new mount shared and sets its mnt_master to last_source->mnt_master. The patch fixes the problem by handling zero mnt_group_id-s as though they are unequal. The similar problem exists in the implementation of "else" clause above when we have to ascend upward in the master/slave tree by calling: > last_source = last_source->mnt_master; > last_dest = last_source->mnt_parent; proper number of times. The last step is governed by "n->mnt_group_id != last_dest->mnt_group_id" condition that may lie if both are zero. The patch fixes this case in the same way as the former one. [AV: don't open-code an obvious helper...] Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-02mnt: Don't propagate unmounts to locked mountsEric W. Biederman
If the first mount in shared subtree is locked don't unmount the shared subtree. This is ensured by walking through the mounts parents before children and marking a mount as unmountable if it is not locked or it is locked but it's parent is marked. This allows recursive mount detach to propagate through a set of mounts when unmounting them would not reveal what is under any locked mount. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-04-02mnt: On an unmount propagate clearing of MNT_LOCKEDEric W. Biederman
A prerequisite of calling umount_tree is that the point where the tree is mounted at is valid to unmount. If we are propagating the effect of the unmount clear MNT_LOCKED in every instance where the same filesystem is mounted on the same mountpoint in the mount tree, as we know (by virtue of the fact that umount_tree was called) that it is safe to reveal what is at that mountpoint. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-04-02mnt: Delay removal from the mount hash.Eric W. Biederman
- Modify __lookup_mnt_hash_last to ignore mounts that have MNT_UMOUNTED set. - Don't remove mounts from the mount hash table in propogate_umount - Don't remove mounts from the mount hash table in umount_tree before the entire list of mounts to be umounted is selected. - Remove mounts from the mount hash table as the last thing that happens in the case where a mount has a parent in umount_tree. Mounts without parents are not hashed (by definition). This paves the way for delaying removal from the mount hash table even farther and fixing the MNT_LOCKED vs MNT_DETACH issue. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-04-02mnt: Add MNT_UMOUNT flagEric W. Biederman
In some instances it is necessary to know if the the unmounting process has begun on a mount. Add MNT_UMOUNT to make that reliably testable. This fix gets used in fixing locked mounts in MNT_DETACH Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-04-02mnt: In umount_tree reuse mnt_list instead of mnt_hashEric W. Biederman
umount_tree builds a list of mounts that need to be unmounted. Utilize mnt_list for this purpose instead of mnt_hash. This begins to allow keeping a mount on the mnt_hash after it is unmounted, which is necessary for a properly functioning MNT_LOCKED implementation. The fact that mnt_list is an ordinary list makding available list_move is nice bonus. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-12-02mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.Eric W. Biederman
Clear MNT_LOCKED in the callers of copy_tree except copy_mnt_ns, and collect_mounts. In copy_mnt_ns it is necessary to create an exact copy of a mount tree, so not clearing MNT_LOCKED is important. Similarly collect_mounts is used to take a snapshot of the mount tree for audit logging purposes and auditing using a faithful copy of the tree is important. This becomes particularly significant when we start setting MNT_LOCKED on rootfs to prevent it from being unmounted. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-08-30get rid of propagate_umount() mistakenly treating slaves as busy.Al Viro
The check in __propagate_umount() ("has somebody explicitly mounted something on that slave?") is done *before* taking the already doomed victims out of the child lists. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01smarter propagate_mnt()Al Viro
The current mainline has copies propagated to *all* nodes, then tears down the copies we made for nodes that do not contain counterparts of the desired mountpoint. That sets the right propagation graph for the copies (at teardown time we move the slaves of removed node to a surviving peer or directly to master), but we end up paying a fairly steep price in useless allocations. It's fairly easy to create a situation where N calls of mount(2) create exactly N bindings, with O(N^2) vfsmounts allocated and freed in process. Fortunately, it is possible to avoid those allocations/freeings. The trick is to create copies in the right order and find which one would've eventually become a master with the current algorithm. It turns out to be possible in O(nodes getting propagation) time and with no extra allocations at all. One part is that we need to make sure that eventual master will be created before its slaves, so we need to walk the propagation tree in a different order - by peer groups. And iterate through the peers before dealing with the next group. Another thing is finding the (earlier) copy that will be a master of one we are about to create; to do that we are (temporary) marking the masters of mountpoints we are attaching the copies to. Either we are in a peer of the last mountpoint we'd dealt with, or we have the following situation: we are attaching to mountpoint M, the last copy S_0 had been attached to M_0 and there are sequences S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i}, S_{i} mounted on M{i} and we need to create a slave of the first S_{k} such that M is getting propagation from M_{k}. It means that the master of M_{k} will be among the sequence of masters of M. On the other hand, the nearest marked node in that sequence will either be the master of M_{k} or the master of M_{k-1} (the latter - in the case if M_{k-1} is a slave of something M gets propagation from, but in a wrong peer group). So we go through the sequence of masters of M until we find a marked one (P). Let N be the one before it. Then we go through the sequence of masters of S_0 until we find one (say, S) mounted on a node D that has P as master and check if D is a peer of N. If it is, S will be the master of new copy, if not - the master of S will be. That's it for the hard part; the rest is fairly simple. Iterator is in next_group(), handling of one prospective mountpoint is propagate_one(). It seems to survive all tests and gives a noticably better performance than the current mainline for setups that are seriously using shared subtrees. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-30switch mnt_hash to hlistAl Viro
fixes RCU bug - walking through hlist is safe in face of element moves, since it's self-terminating. Cyclic lists are not - if we end up jumping to another hash chain, we'll loop infinitely without ever hitting the original list head. [fix for dumb braino folded] Spotted by: Max Kellermann <mk@cm4all.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24split __lookup_mnt() in two functionsAl Viro
Instead of passing the direction as argument (and checking it on every step through the hash chain), just have separate __lookup_mnt() and __lookup_mnt_last(). And use the standard iterators... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24new helpers: lock_mount_hash/unlock_mount_hashAl Viro
aka br_write_{lock,unlock} of vfsmount_lock. Inlines in fs/mount.h, vfsmount_lock extern moved over there as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24namespace.c: get rid of mnt_ghostsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31vfs: Fix invalid ida_remove() callTakashi Iwai
When the group id of a shared mount is not allocated, the umount still tries to call mnt_release_group_id(), which eventually hits a kernel warning at ida_remove() spewing a message like: ida_remove called for id=0 which is not allocated. This patch fixes the bug simply checking the group id in the caller. Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...
2013-04-09switch unlock_mount() to namespace_unlock(), convert all umount_tree() callersAl Viro
which allows to kill the last argument of umount_tree() and make release_mounts() static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09get rid of full-hash scan on detaching vfsmountsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-27vfs: Carefully propogate mounts across user namespacesEric W. Biederman
As a matter of policy MNT_READONLY should not be changable if the original mounter had more privileges than creator of the mount namespace. Add the flag CL_UNPRIVILEGED to note when we are copying a mount from a mount namespace that requires more privileges to a mount namespace that requires fewer privileges. When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT if any of the mnt flags that should never be changed are set. This protects both mount propagation and the initial creation of a less privileged mount namespace. Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-07-14VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errorsDavid Howells
copy_tree() can theoretically fail in a case other than ENOMEM, but always returns NULL which is interpreted by callers as -ENOMEM. Change it to return an explicit error. Also change clone_mnt() for consistency and because union mounts will add new error cases. Thanks to Andreas Gruenbacher <agruen@suse.de> for a bug fix. [AV: folded braino fix by Dan Carpenter] Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Valerie Aurora <valerie.aurora@gmail.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29brlocks/lglocks: API cleanupsAndi Kleen
lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. In preparation, this patch changes the API to look more like normal function calls with pointers, not magic macros. The patch is rather large because I move over all users in one go to keep it bisectable. This impacts the VFS somewhat in terms of lines changed. But no actual behaviour change. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: switch pnode.h macros to struct mount *Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: move the rest of int fields to struct mountAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: mnt_id/mnt_group_id movedAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: mnt_ns moved to struct mountAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: take mnt_share/mnt_slave/mnt_slave_list and mnt_expire to struct mountAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: and now we can make ->mnt_master point to struct mountAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: take mnt_master to struct mountAl Viro
make IS_MNT_SLAVE take struct mount * at the same time Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: spread struct mount - remaining argument of mnt_set_mountpoint()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: spread struct mount - propagate_mnt()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: spread struct mount - shared subtree iteratorsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: spread struct mount - get_dominating_id / do_make_slaveAl Viro
next pile of horrors, similar to mnt_parent one; this time it's mnt_master. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: take mnt_child/mnt_mounts to struct mountAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: spread struct mount - work with countersAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: move mnt_mountpoint to struct mountAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>