summaryrefslogtreecommitdiff
path: root/mm/mmap.c
AgeCommit message (Collapse)Author
2018-02-01Merge android-4.4.114 (fe09418) into msm-4.4Srinivasarao P
* refs/heads/tmp-fe09418 Linux 4.4.114 nfsd: auth: Fix gid sorting when rootsquash enabled net: tcp: close sock if net namespace is exiting flow_dissector: properly cap thoff field ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY net: Allow neigh contructor functions ability to modify the primary_key vmxnet3: repair memory leak sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf sctp: do not allow the v4 socket to bind a v4mapped v6 address r8169: fix memory corruption on retrieval of hardware statistics. pppoe: take ->needed_headroom of lower device into account on xmit net: qdisc_pkt_len_init() should be more robust tcp: __tcp_hdrlen() helper net: igmp: fix source address check for IGMPv3 reports lan78xx: Fix failure in USB Full Speed ipv6: ip6_make_skb() needs to clear cork.base.dst ipv6: fix udpv6 sendmsg crash caused by too small MTU ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state hrtimer: Reset hrtimer cpu base proper on CPU hotplug x86/microcode/intel: Extend BDW late-loading further with LLC size check eventpoll.h: add missing epoll event masks vsyscall: Fix permissions for emulate mode with KAISER/PTI um: link vmlinux with -no-pie usbip: prevent leaking socket pointer address in messages usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input usbip: fix stub_rx: get_pipe() to validate endpoint number usb: usbip: Fix possible deadlocks reported by lockdep Input: trackpoint - force 3 buttons if 0 button is reported Revert "module: Add retpoline tag to VERMAGIC" scsi: libiscsi: fix shifting of DID_REQUEUE host byte fs/fcntl: f_setown, avoid undefined behaviour reiserfs: Don't clear SGID when inheriting ACLs reiserfs: don't preallocate blocks for extended attributes reiserfs: fix race in prealloc discard ext2: Don't clear SGID when inheriting ACLs netfilter: xt_osf: Add missing permission checks netfilter: nfnetlink_cthelper: Add missing permission checks netfilter: fix IS_ERR_VALUE usage netfilter: use fwmark_reflect in nf_send_reset netfilter: nf_conntrack_sip: extend request line validation netfilter: restart search if moved to other chain netfilter: nfnetlink_queue: reject verdict request from different portid netfilter: nf_ct_expect: remove the redundant slash when policy name is empty netfilter: nf_dup_ipv6: set again FLOWI_FLAG_KNOWN_NH at flowi6_flags netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed in 64bit kernel netfilter: x_tables: speed up jump target validation ACPICA: Namespace: fix operand cache leak ACPI / scan: Prefer devices without _HID/_CID for _ADR matching ACPI / processor: Avoid reserving IO regions too early x86/ioapic: Fix incorrect pointers in ioapic_setup_resources() ipc: msg, make msgrcv work with LONG_MIN mm, page_alloc: fix potential false positive in __zone_watermark_ok cma: fix calculation of aligned offset hwpoison, memcg: forcibly uncharge LRU pages mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack fs/select: add vmalloc fallback for select(2) mmc: sdhci-of-esdhc: add/remove some quirks according to vendor version PCI: layerscape: Fix MSG TLP drop setting PCI: layerscape: Add "fsl,ls2085a-pcie" compatible ID drivers: base: cacheinfo: fix boot error message when acpi is enabled drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled Prevent timer value 0 for MWAITX timers: Plug locking race vs. timer migration time: Avoid undefined behaviour in ktime_add_safe() PM / sleep: declare __tracedata symbols as char[] rather than char can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once sched/deadline: Use the revised wakeup rule for suspending constrained dl tasks x86/retpoline: Fill RSB on context switch for affected CPUs x86/cpu/intel: Introduce macros for Intel family numbers x86/microcode/intel: Fix BDW late-loading revision check usbip: Fix potential format overflow in userspace tools usbip: Fix implicit fallthrough warning usbip: prevent vhci_hcd driver from leaking a socket pointer address x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels ANDROID: sched: EAS: check energy_aware() before calling select_energy_cpu_brute() in up-migrate path UPSTREAM: eventpoll.h: add missing epoll event masks ANDROID: xattr: Pass EOPNOTSUPP to permission2 Conflicts: kernel/sched/fair.c Change-Id: I15005cb3bc039f4361d25ed2e22f8175b3d7ca96 Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2018-01-31Merge 4.4.114 into android-4.4Greg Kroah-Hartman
Changes in 4.4.114 x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels usbip: prevent vhci_hcd driver from leaking a socket pointer address usbip: Fix implicit fallthrough warning usbip: Fix potential format overflow in userspace tools x86/microcode/intel: Fix BDW late-loading revision check x86/cpu/intel: Introduce macros for Intel family numbers x86/retpoline: Fill RSB on context switch for affected CPUs sched/deadline: Use the revised wakeup rule for suspending constrained dl tasks can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once PM / sleep: declare __tracedata symbols as char[] rather than char time: Avoid undefined behaviour in ktime_add_safe() timers: Plug locking race vs. timer migration Prevent timer value 0 for MWAITX drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled drivers: base: cacheinfo: fix boot error message when acpi is enabled PCI: layerscape: Add "fsl,ls2085a-pcie" compatible ID PCI: layerscape: Fix MSG TLP drop setting mmc: sdhci-of-esdhc: add/remove some quirks according to vendor version fs/select: add vmalloc fallback for select(2) mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack hwpoison, memcg: forcibly uncharge LRU pages cma: fix calculation of aligned offset mm, page_alloc: fix potential false positive in __zone_watermark_ok ipc: msg, make msgrcv work with LONG_MIN x86/ioapic: Fix incorrect pointers in ioapic_setup_resources() ACPI / processor: Avoid reserving IO regions too early ACPI / scan: Prefer devices without _HID/_CID for _ADR matching ACPICA: Namespace: fix operand cache leak netfilter: x_tables: speed up jump target validation netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed in 64bit kernel netfilter: nf_dup_ipv6: set again FLOWI_FLAG_KNOWN_NH at flowi6_flags netfilter: nf_ct_expect: remove the redundant slash when policy name is empty netfilter: nfnetlink_queue: reject verdict request from different portid netfilter: restart search if moved to other chain netfilter: nf_conntrack_sip: extend request line validation netfilter: use fwmark_reflect in nf_send_reset netfilter: fix IS_ERR_VALUE usage netfilter: nfnetlink_cthelper: Add missing permission checks netfilter: xt_osf: Add missing permission checks ext2: Don't clear SGID when inheriting ACLs reiserfs: fix race in prealloc discard reiserfs: don't preallocate blocks for extended attributes reiserfs: Don't clear SGID when inheriting ACLs fs/fcntl: f_setown, avoid undefined behaviour scsi: libiscsi: fix shifting of DID_REQUEUE host byte Revert "module: Add retpoline tag to VERMAGIC" Input: trackpoint - force 3 buttons if 0 button is reported usb: usbip: Fix possible deadlocks reported by lockdep usbip: fix stub_rx: get_pipe() to validate endpoint number usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input usbip: prevent leaking socket pointer address in messages um: link vmlinux with -no-pie vsyscall: Fix permissions for emulate mode with KAISER/PTI eventpoll.h: add missing epoll event masks x86/microcode/intel: Extend BDW late-loading further with LLC size check hrtimer: Reset hrtimer cpu base proper on CPU hotplug dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL ipv6: fix udpv6 sendmsg crash caused by too small MTU ipv6: ip6_make_skb() needs to clear cork.base.dst lan78xx: Fix failure in USB Full Speed net: igmp: fix source address check for IGMPv3 reports tcp: __tcp_hdrlen() helper net: qdisc_pkt_len_init() should be more robust pppoe: take ->needed_headroom of lower device into account on xmit r8169: fix memory corruption on retrieval of hardware statistics. sctp: do not allow the v4 socket to bind a v4mapped v6 address sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf vmxnet3: repair memory leak net: Allow neigh contructor functions ability to modify the primary_key ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY flow_dissector: properly cap thoff field net: tcp: close sock if net namespace is exiting nfsd: auth: Fix gid sorting when rootsquash enabled Linux 4.4.114 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-31mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stackMichal Hocko
commit 561b5e0709e4a248c67d024d4d94b6e31e3edf2f upstream. Commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") has introduced a regression in some rust and Java environments which are trying to implement their own stack guard page. They are punching a new MAP_FIXED mapping inside the existing stack Vma. This will confuse expand_{downwards,upwards} into thinking that the stack expansion would in fact get us too close to an existing non-stack vma which is a correct behavior wrt safety. It is a real regression on the other hand. Let's work around the problem by considering PROT_NONE mapping as a part of the stack. This is a gros hack but overflowing to such a mapping would trap anyway an we only can hope that usespace knows what it is doing and handle it propely. Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas") Link: http://lkml.kernel.org/r/20170705182849.GA18027@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Debugged-by: Vlastimil Babka <vbabka@suse.cz> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Willy Tarreau <w@1wt.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-18Merge android-4.4.106 (2fea039) into msm-4.4Srinivasarao P
* refs/heads/tmp-2fea039 Linux 4.4.106 usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one Revert "x86/mm/pat: Ensure cpa->pfn only contains page frame numbers" Revert "x86/efi: Hoist page table switching code into efi_call_virt()" Revert "x86/efi: Build our own page table structures" net/packet: fix a race in packet_bind() and packet_notifier() packet: fix crash in fanout_demux_rollover() sit: update frag_off info rds: Fix NULL pointer dereference in __rds_rdma_map tipc: fix memory leak in tipc_accept_from_sock() more bio_map_user_iov() leak fixes s390: always save and restore all registers on context switch ipmi: Stop timers before cleaning up the module audit: ensure that 'audit=1' actually enables audit for PID 1 ipvlan: fix ipv6 outbound device afs: Connect up the CB.ProbeUuid IB/mlx5: Assign send CQ and recv CQ of UMR QP IB/mlx4: Increase maximal message size under UD QP xfrm: Copy policy family in clone_policy jump_label: Invoke jump_label_test() via early_initcall() atm: horizon: Fix irq release error sctp: use the right sk after waking up from wait_buf sleep sctp: do not free asoc when it is already dead in sctp_sendmsg sparc64/mm: set fields in deferred pages block: wake up all tasks blocked in get_request() sunrpc: Fix rpc_task_begin trace point NFS: Fix a typo in nfs_rename() dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0 lib/genalloc.c: make the avail variable an atomic_long_t route: update fnhe_expires for redirect when the fnhe exists route: also update fnhe_genid when updating a route cache mac80211_hwsim: Fix memory leak in hwsim_new_radio_nl() kbuild: pkg: use --transform option to prefix paths in tar EDAC, i5000, i5400: Fix definition of NRECMEMB register EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested drm/amd/amdgpu: fix console deadlock if late init failed axonram: Fix gendisk handling netfilter: don't track fragmented packets zram: set physical queue limits to avoid array out of bounds accesses i2c: riic: fix restart condition crypto: s5p-sss - Fix completing crypto request in IRQ handler ipv6: reorder icmpv6_init() and ip6_mr_init() bnx2x: do not rollback VF MAC/VLAN filters we did not configure bnx2x: fix possible overrun of VFPF multicast addresses array bnx2x: prevent crash when accessing PTP with interface down spi_ks8995: fix "BUG: key accdaa28 not in .data!" arm64: KVM: Survive unknown traps from guests arm: KVM: Survive unknown traps from guests KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset irqchip/crossbar: Fix incorrect type of register size scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq libata: drop WARN from protocol error in ata_sff_qc_issue() kvm: nVMX: VMCLEAR should not cause the vCPU to shut down USB: gadgetfs: Fix a potential memory leak in 'dev_config()' usb: gadget: configs: plug memory leak HID: chicony: Add support for another ASUS Zen AiO keyboard gpio: altera: Use handle_level_irq when configured as a level_high ARM: OMAP2+: Release device node after it is no longer needed. ARM: OMAP2+: Fix device node reference counts module: set __jump_table alignment to 8 selftest/powerpc: Fix false failures for skipped tests x86/hpet: Prevent might sleep splat on resume ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure vti6: Don't report path MTU below IPV6_MIN_MTU. Revert "s390/kbuild: enable modversions for symbols exported from asm" Revert "spi: SPI_FSL_DSPI should depend on HAS_DMA" Revert "drm/armada: Fix compile fail" mm: drop unused pmdp_huge_get_and_clear_notify() thp: fix MADV_DONTNEED vs. numa balancing race thp: reduce indentation level in change_huge_pmd() scsi: storvsc: Workaround for virtual DVD SCSI version ARM: avoid faulting on qemu ARM: BUG if jumping to usermode address in kernel mode arm64: fpsimd: Prevent registers leaking from dead tasks KVM: VMX: remove I/O port 0x80 bypass on Intel hosts arm64: KVM: fix VTTBR_BADDR_MASK BUG_ON off-by-one media: dvb: i2c transfers over usb cannot be done from stack drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU drm: extra printk() wrapper macros kdb: Fix handling of kallsyms_symbol_next() return value s390: fix compat system call table iommu/vt-d: Fix scatterlist offset handling ALSA: usb-audio: Add check return value for usb_string() ALSA: usb-audio: Fix out-of-bound error ALSA: seq: Remove spurious WARN_ON() at timer check ALSA: pcm: prevent UAF in snd_pcm_info x86/PCI: Make broadcom_postcore_init() check acpi_disabled X.509: reject invalid BIT STRING for subjectPublicKey ASN.1: check for error from ASN1_OP_END__ACT actions ASN.1: fix out-of-bounds read when parsing indefinite length item efi: Move some sysfs files to be read-only by root scsi: libsas: align sata_device's rps_resp on a cacheline isa: Prevent NULL dereference in isa_bus driver callbacks hv: kvp: Avoid reading past allocated blocks from KVP file virtio: release virtio index when fail to device_register can: usb_8dev: cancel urb on -EPIPE and -EPROTO can: esd_usb2: cancel urb on -EPIPE and -EPROTO can: ems_usb: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: ratelimit errors if incomplete messages are received can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback() can: kvaser_usb: free buf in error paths can: ti_hecc: Fix napi poll return value for repoll BACKPORT: irq: Make the irqentry text section unconditional UPSTREAM: arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections UPSTREAM: x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text UPSTREAM: kasan: make get_wild_bug_type() static UPSTREAM: kasan: separate report parts by empty lines UPSTREAM: kasan: improve double-free report format UPSTREAM: kasan: print page description after stacks UPSTREAM: kasan: improve slab object description UPSTREAM: kasan: change report header UPSTREAM: kasan: simplify address description logic UPSTREAM: kasan: change allocation and freeing stack traces headers UPSTREAM: kasan: unify report headers UPSTREAM: kasan: introduce helper functions for determining bug type BACKPORT: kasan: report only the first error by default UPSTREAM: kasan: fix races in quarantine_remove_cache() UPSTREAM: kasan: resched in quarantine_remove_cache() BACKPORT: kasan, sched/headers: Uninline kasan_enable/disable_current() BACKPORT: kasan: drain quarantine of memcg slab objects UPSTREAM: kasan: eliminate long stalls during quarantine reduction UPSTREAM: kasan: support panic_on_warn UPSTREAM: x86/suspend: fix false positive KASAN warning on suspend/resume UPSTREAM: kasan: support use-after-scope detection UPSTREAM: kasan/tests: add tests for user memory access functions UPSTREAM: mm, kasan: add a ksize() test UPSTREAM: kasan: test fix: warn if the UAF could not be detected in kmalloc_uaf2 UPSTREAM: kasan: modify kmalloc_large_oob_right(), add kmalloc_pagealloc_oob_right() UPSTREAM: lib/stackdepot: export save/fetch stack for drivers UPSTREAM: lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB BACKPORT: kprobes: Unpoison stack in jprobe_return() for KASAN UPSTREAM: kasan: remove the unnecessary WARN_ONCE from quarantine.c UPSTREAM: kasan: avoid overflowing quarantine size on low memory systems UPSTREAM: kasan: improve double-free reports BACKPORT: mm: coalesce split strings BACKPORT: mm/kasan: get rid of ->state in struct kasan_alloc_meta UPSTREAM: mm/kasan: get rid of ->alloc_size in struct kasan_alloc_meta UPSTREAM: mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta UPSTREAM: mm/kasan, slub: don't disable interrupts when object leaves quarantine UPSTREAM: mm/kasan: don't reduce quarantine in atomic contexts UPSTREAM: mm/kasan: fix corruptions and false positive reports UPSTREAM: lib/stackdepot.c: use __GFP_NOWARN for stack allocations BACKPORT: mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB UPSTREAM: kasan/quarantine: fix bugs on qlist_move_cache() UPSTREAM: mm: mempool: kasan: don't poot mempool objects in quarantine UPSTREAM: kasan: change memory hot-add error messages to info messages BACKPORT: mm/kasan: add API to check memory regions UPSTREAM: mm/kasan: print name of mem[set,cpy,move]() caller in report UPSTREAM: mm: kasan: initial memory quarantine implementation UPSTREAM: lib/stackdepot: avoid to return 0 handle UPSTREAM: lib/stackdepot.c: allow the stack trace hash to be zero UPSTREAM: mm, kasan: fix compilation for CONFIG_SLAB BACKPORT: mm, kasan: stackdepot implementation. Enable stackdepot for SLAB BACKPORT: mm, kasan: add GFP flags to KASAN API UPSTREAM: mm, kasan: SLAB support UPSTREAM: mm/slab: align cache size first before determination of OFF_SLAB candidate UPSTREAM: mm/slab: use more appropriate condition check for debug_pagealloc UPSTREAM: mm/slab: factor out debugging initialization in cache_init_objs() UPSTREAM: mm/slab: remove object status buffer for DEBUG_SLAB_LEAK UPSTREAM: mm/slab: alternative implementation for DEBUG_SLAB_LEAK UPSTREAM: mm/slab: clean up DEBUG_PAGEALLOC processing code UPSTREAM: mm/slab: activate debug_pagealloc in SLAB when it is actually enabled sched: EAS/WALT: Don't take into account of running task's util BACKPORT: schedutil: Reset cached freq if it is not in sync with next_freq UPSTREAM: kasan: add functions to clear stack poison Conflicts: arch/arm/include/asm/kvm_arm.h arch/arm64/kernel/vmlinux.lds.S include/linux/kasan.h kernel/softirq.c lib/Kconfig lib/Kconfig.kasan lib/Makefile lib/stackdepot.c mm/kasan/kasan.c sound/usb/mixer.c Change-Id: If70ced6da5f19be3dd92d10a8d8cd4d5841e5870 Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2017-12-14BACKPORT: mm: coalesce split stringsJoe Perches
Kernel style prefers a single string over split strings when the string is 'user-visible'. Miscellanea: - Add a missing newline - Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Tejun Heo <tj@kernel.org> [percpu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 64145065 (cherry-picked from 756a025f00091918d9d09ca3229defb160b409c0) Change-Id: I377fb1542980c15d2f306924656227ad17b02b5e Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-08-04Merge android-4.4@59ff2e1 (v4.4.78) into msm-4.4Blagovest Kolenichev
* refs/heads/tmp-59ff2e1 Linux 4.4.78 kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS kvm: vmx: Check value written to IA32_BNDCFGS kvm: x86: Guest BNDCFGS requires guest MPX support kvm: vmx: Do not disable intercepts for BNDCFGS KVM: x86: disable MPX if host did not enable MPX XSAVE features tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results PM / QoS: return -EINVAL for bogus strings PM / wakeirq: Convert to SRCU sched/topology: Optimize build_group_mask() sched/topology: Fix overlapping sched_group_mask crypto: caam - fix signals handling crypto: sha1-ssse3 - Disable avx2 crypto: atmel - only treat EBUSY as transient if backlog crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD mm: fix overflow check in expand_upwards() tpm: Issue a TPM2_Shutdown for TPM2 devices. Add "shutdown" to "struct class". tpm: Provide strong locking for device removal tpm: Get rid of chip->pdev selftests/capabilities: Fix the test_execve test mnt: Make propagate_umount less slow for overlapping mount propagation trees mnt: In propgate_umount handle visiting mounts in any order mnt: In umount propagation reparent in a separate pass vt: fix unchecked __put_user() in tioclinux ioctls exec: Limit arg stack to at most 75% of _STK_LIM s390: reduce ELF_ET_DYN_BASE powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB arm64: move ELF_ET_DYN_BASE to 4GB / 4MB arm: move ELF_ET_DYN_BASE to 4MB binfmt_elf: use ELF_ET_DYN_BASE only for PIE checkpatch: silence perl 5.26.0 unescaped left brace warnings fs/dcache.c: fix spin lockup issue on nlru->lock mm/list_lru.c: fix list_lru_count_node() to be race free kernel/extable.c: mark core_kernel_text notrace tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth parisc/mm: Ensure IRQs are off in switch_mm() parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs parisc: use compat_sys_keyctl() parisc: Report SIGSEGV instead of SIGBUS when running out of stack irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity cfg80211: Check if PMKID attribute is of expected size cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx() rds: tcp: use sock_create_lite() to create the accept socket vrf: fix bug_on triggered by rx when destroying a vrf net: ipv6: Compare lwstate in detecting duplicate nexthops ipv6: dad: don't remove dynamic addresses if link is down net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish() bpf: prevent leaking pointer via xadd on unpriviledged net: prevent sign extension in dev_get_stats() tcp: reset sk_rx_dst in tcp_disconnect() net: dp83640: Avoid NULL pointer dereference. ipv6: avoid unregistering inet6_dev for loopback net/phy: micrel: configure intterupts after autoneg workaround net: sched: Fix one possible panic when no destroy callback net_sched: fix error recovery at qdisc creation ANDROID: android-verity: mark dev as rw for linear target ANDROID: sdcardfs: Remove unnecessary lock ANDROID: binder: don't check prio permissions on restore. Add BINDER_GET_NODE_DEBUG_INFO ioctl UPSTREAM: cpufreq: schedutil: Trace frequency only if it has changed UPSTREAM: cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely UPSTREAM: cpufreq: schedutil: Refactor sugov_next_freq_shared() UPSTREAM: cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start() UPSTREAM: cpufreq: schedutil: Pass sg_policy to get_next_freq() UPSTREAM: cpufreq: schedutil: move cached_raw_freq to struct sugov_policy UPSTREAM: cpufreq: schedutil: Rectify comment in sugov_irq_work() function UPSTREAM: cpufreq: schedutil: irq-work and mutex are only used in slow path UPSTREAM: cpufreq: schedutil: enable fast switch earlier UPSTREAM: cpufreq: schedutil: Avoid indented labels Linux 4.4.77 saa7134: fix warm Medion 7134 EEPROM read x86/mm/pat: Don't report PAT on CPUs that don't support it ext4: check return value of kstrtoull correctly in reserved_clusters_store staging: comedi: fix clean-up of comedi_class in comedi_init() staging: vt6556: vnt_start Fix missing call to vnt_key_init_table. tcp: fix tcp_mark_head_lost to check skb len before fragmenting md: fix super_offset endianness in super_1_rdev_size_change md: fix incorrect use of lexx_to_cpu in does_sb_need_changing perf tools: Use readdir() instead of deprecated readdir_r() again perf tests: Remove wrong semicolon in while loop in CQM test perf trace: Do not process PERF_RECORD_LOST twice perf dwarf: Guard !x86_64 definitions under #ifdef else clause perf pmu: Fix misleadingly indented assignment (whitespace) perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed perf tools: Remove duplicate const qualifier perf script: Use readdir() instead of deprecated readdir_r() perf thread_map: Use readdir() instead of deprecated readdir_r() perf tools: Use readdir() instead of deprecated readdir_r() perf bench numa: Avoid possible truncation when using snprintf() perf tests: Avoid possible truncation with dirent->d_name + snprintf perf scripting perl: Fix compile error with some perl5 versions perf thread_map: Correctly size buffer used with dirent->dt_name perf intel-pt: Use __fallthrough perf top: Use __fallthrough tools strfilter: Use __fallthrough tools string: Use __fallthrough in perf_atoll() tools include: Add a __fallthrough statement mqueue: fix a use-after-free in sys_mq_notify() RDMA/uverbs: Check port number supplied by user verbs cmds KEYS: Fix an error code in request_master_key() ath10k: override CE5 config for QCA9377 x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings x86/tools: Fix gcc-7 warning in relocs.c gfs2: Fix glock rhashtable rcu bug USB: serial: qcserial: new Sierra Wireless EM7305 device ID USB: serial: option: add two Longcheer device ids pinctrl: sh-pfc: Update info pointer after SoC-specific init pinctrl: mxs: atomically switch mux and drive strength config pinctrl: sunxi: Fix SPDIF function name for A83T pinctrl: meson: meson8b: fix the NAND DQS pins pinctrl: sh-pfc: r8a7791: Fix SCIF2 pinmux data sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec sysctl: don't print negative flag for proc_douintvec mac80211_hwsim: Replace bogus hrtimer clockid usb: Fix typo in the definition of Endpoint[out]Request usb: usbip: set buffer pointers to NULL after free Add USB quirk for HVR-950q to avoid intermittent device resets USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick usb: dwc3: replace %p with %pK drm/virtio: don't leak bo on drm_gem_object_init failure tracing/kprobes: Allow to create probe with a module name starting with a digit mm: fix classzone_idx underflow in shrink_zones() bgmac: reset & enable Ethernet core before using it driver core: platform: fix race condition with driver_override fs: completely ignore unknown open flags fs: add a VALID_OPEN_FLAGS ANDROID: binder: add RT inheritance flag to node. ANDROID: binder: improve priority inheritance. ANDROID: binder: add min sched_policy to node. ANDROID: binder: add support for RT prio inheritance. ANDROID: binder: push new transactions to waiting threads. ANDROID: binder: remove proc waitqueue FROMLIST: binder: remove global binder lock FROMLIST: binder: fix death race conditions FROMLIST: binder: protect against stale pointers in print_binder_transaction FROMLIST: binder: protect binder_ref with outer lock FROMLIST: binder: use inner lock to protect thread accounting FROMLIST: binder: protect transaction_stack with inner lock. FROMLIST: binder: protect proc->threads with inner_lock FROMLIST: binder: protect proc->nodes with inner lock FROMLIST: binder: add spinlock to protect binder_node FROMLIST: binder: add spinlocks to protect todo lists FROMLIST: binder: use inner lock to sync work dq and node counts FROMLIST: binder: introduce locking helper functions FROMLIST: binder: use node->tmp_refs to ensure node safety FROMLIST: binder: refactor binder ref inc/dec for thread safety FROMLIST: binder: make sure accesses to proc/thread are safe FROMLIST: binder: make sure target_node has strong ref FROMLIST: binder: guarantee txn complete / errors delivered in-order FROMLIST: binder: refactor binder_pop_transaction FROMLIST: binder: use atomic for transaction_log index FROMLIST: binder: add more debug info when allocation fails. FROMLIST: binder: protect against two threads freeing buffer FROMLIST: binder: remove dead code in binder_get_ref_for_node FROMLIST: binder: don't modify thread->looper from other threads FROMLIST: binder: avoid race conditions when enqueuing txn FROMLIST: binder: refactor queue management in binder_thread_read FROMLIST: binder: add log information for binder transaction failures FROMLIST: binder: make binder_last_id an atomic FROMLIST: binder: change binder_stats to atomics FROMLIST: binder: add protection for non-perf cases FROMLIST: binder: remove binder_debug_no_lock mechanism FROMLIST: binder: move binder_alloc to separate file FROMLIST: binder: separate out binder_alloc functions FROMLIST: binder: remove unneeded cleanup code FROMLIST: binder: separate binder allocator structure from binder proc FROMLIST: binder: Use wake up hint for synchronous transactions. Revert "android: binder: move global binder state into context struct." sched: walt: fix window misalignment when HZ=300 ANDROID: android-base.cfg: remove CONFIG_CGROUP_DEBUG ANDROID: sdcardfs: use mount_nodev and fix a issue in sdcardfs_kill_sb Conflicts: drivers/android/binder.c drivers/net/wireless/ath/ath10k/pci.c Change-Id: Ic6f82c2ec9929733a16a03bb3b745187e002f4f6 Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-07-21Merge 4.4.78 into android-4.4Greg Kroah-Hartman
Changes in 4.4.78 net_sched: fix error recovery at qdisc creation net: sched: Fix one possible panic when no destroy callback net/phy: micrel: configure intterupts after autoneg workaround ipv6: avoid unregistering inet6_dev for loopback net: dp83640: Avoid NULL pointer dereference. tcp: reset sk_rx_dst in tcp_disconnect() net: prevent sign extension in dev_get_stats() bpf: prevent leaking pointer via xadd on unpriviledged net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish() ipv6: dad: don't remove dynamic addresses if link is down net: ipv6: Compare lwstate in detecting duplicate nexthops vrf: fix bug_on triggered by rx when destroying a vrf rds: tcp: use sock_create_lite() to create the accept socket brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx() cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES cfg80211: Check if PMKID attribute is of expected size irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity parisc: Report SIGSEGV instead of SIGBUS when running out of stack parisc: use compat_sys_keyctl() parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs parisc/mm: Ensure IRQs are off in switch_mm() tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth kernel/extable.c: mark core_kernel_text notrace mm/list_lru.c: fix list_lru_count_node() to be race free fs/dcache.c: fix spin lockup issue on nlru->lock checkpatch: silence perl 5.26.0 unescaped left brace warnings binfmt_elf: use ELF_ET_DYN_BASE only for PIE arm: move ELF_ET_DYN_BASE to 4MB arm64: move ELF_ET_DYN_BASE to 4GB / 4MB powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB s390: reduce ELF_ET_DYN_BASE exec: Limit arg stack to at most 75% of _STK_LIM vt: fix unchecked __put_user() in tioclinux ioctls mnt: In umount propagation reparent in a separate pass mnt: In propgate_umount handle visiting mounts in any order mnt: Make propagate_umount less slow for overlapping mount propagation trees selftests/capabilities: Fix the test_execve test tpm: Get rid of chip->pdev tpm: Provide strong locking for device removal Add "shutdown" to "struct class". tpm: Issue a TPM2_Shutdown for TPM2 devices. mm: fix overflow check in expand_upwards() crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD crypto: atmel - only treat EBUSY as transient if backlog crypto: sha1-ssse3 - Disable avx2 crypto: caam - fix signals handling sched/topology: Fix overlapping sched_group_mask sched/topology: Optimize build_group_mask() PM / wakeirq: Convert to SRCU PM / QoS: return -EINVAL for bogus strings tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results KVM: x86: disable MPX if host did not enable MPX XSAVE features kvm: vmx: Do not disable intercepts for BNDCFGS kvm: x86: Guest BNDCFGS requires guest MPX support kvm: vmx: Check value written to IA32_BNDCFGS kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS Linux 4.4.78 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-21mm: fix overflow check in expand_upwards()Helge Deller
commit 37511fb5c91db93d8bd6e3f52f86e5a7ff7cfcdf upstream. Jörn Engel noticed that the expand_upwards() function might not return -ENOMEM in case the requested address is (unsigned long)-PAGE_SIZE and if the architecture didn't defined TASK_SIZE as multiple of PAGE_SIZE. Affected architectures are arm, frv, m68k, blackfin, h8300 and xtensa which all define TASK_SIZE as 0xffffffff, but since none of those have an upwards-growing stack we currently have no actual issue. Nevertheless let's fix this just in case any of the architectures with an upward-growing stack (currently parisc, metag and partly ia64) define TASK_SIZE similar. Link: http://lkml.kernel.org/r/20170702192452.GA11868@p100.box Fixes: bd726c90b6b8 ("Allow stack to grow up to address space limit") Signed-off-by: Helge Deller <deller@gmx.de> Reported-by: Jörn Engel <joern@purestorage.com> Cc: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-28Merge branch 'android-4.4@77ddb50' (v4.4.74) into 'msm-4.4'Blagovest Kolenichev
* refs/heads/tmp-77ddb50: UPSTREAM: usb: gadget: f_fs: avoid out of bounds access on comp_desc Linux 4.4.74 mm: fix new crash in unmapped_area_topdown() Allow stack to grow up to address space limit mm: larger stack guard gap, between vmas alarmtimer: Rate limit periodic intervals MIPS: Fix bnezc/jialc return address calculation usb: dwc3: exynos fix axius clock error path to do cleanup alarmtimer: Prevent overflow of relative timers genirq: Release resources in __setup_irq() error path swap: cond_resched in swap_cgroup_prepare() mm/memory-failure.c: use compound_head() flags for huge pages USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR() usb: r8a66597-hcd: decrease timeout usb: r8a66597-hcd: select a different endpoint on timeout USB: gadget: dummy_hcd: fix hub-descriptor removable fields pvrusb2: reduce stack usage pvr2_eeprom_analyze() usb: core: fix potential memory leak in error path during hcd creation USB: hub: fix SS max number of ports iio: proximity: as3935: recalibrate RCO after resume staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data() mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init() serial: efm32: Fix parity management in 'efm32_uart_console_get_options()' mac80211: fix IBSS presp allocation size mac80211: fix CSA in IBSS mode mac80211/wpa: use constant time memory comparison for MACs mac80211: don't look at the PM bit of BAR frames vb2: Fix an off by one error in 'vb2_plane_vaddr' cpufreq: conservative: Allow down_threshold to take values from 1 to 10 can: gs_usb: fix memory leak in gs_cmd_reset() configfs: Fix race between create_link and configfs_rmdir UPSTREAM: bpf: don't let ldimm64 leak map addresses on unprivileged BACKPORT: ext4: fix data exposure after a crash ANDROID: sdcardfs: remove dead function open_flags_to_access_mode() ANDROID: android-base.cfg: split out arm64-specific configs Linux 4.4.73 sparc64: make string buffers large enough s390/kvm: do not rely on the ILC on kvm host protection fauls xtensa: don't use linux IRQ #0 tipc: ignore requests when the connection state is not CONNECTED proc: add a schedule point in proc_pid_readdir() romfs: use different way to generate fsid for BLOCK or MTD sctp: sctp_addr_id2transport should verify the addr before looking up assoc r8152: avoid start_xmit to schedule napi when napi is disabled r8152: fix rtl8152_post_reset function r8152: re-schedule napi for tx nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED" ravb: unmap descriptors when freeing rings drm/ast: Fixed system hanged if disable P2A drm/nouveau: Don't enabling polling twice on runtime resume parisc, parport_gsc: Fixes for printk continuation lines net: adaptec: starfire: add checks for dma mapping errors pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page net/mlx4_core: Avoid command timeouts during VF driver device shutdown drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers drm/nouveau: prevent userspace from deleting client object ipv6: fix flow labels when the traffic class is non-0 FS-Cache: Initialise stores_lock in netfs cookie fscache: Clear outstanding writes when disabling a cookie fscache: Fix dead object requeue ethtool: do not vzalloc(0) on registers dump log2: make order_base_2() behave correctly on const input value zero kasan: respect /proc/sys/kernel/traceoff_on_warning jump label: pass kbuild_cflags when checking for asm goto support PM / runtime: Avoid false-positive warnings from might_sleep_if() ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches i2c: piix4: Fix request_region size sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications sierra_net: Skip validating irrelevant fields for IDLE LSIs net: hns: Fix the device being used for dma mapping during TX NET: mkiss: Fix panic NET: Fix /proc/net/arp for AX.25 ipv6: Inhibit IPv4-mapped src address on the wire. ipv6: Handle IPv4-mapped src to in6addr_any dst. net: xilinx_emaclite: fix receive buffer overflow net: xilinx_emaclite: fix freezes due to unordered I/O Call echo service immediately after socket reconnect staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory. ARM: dts: imx6dl: Fix the VDD_ARM_CAP voltage for 396MHz operation partitions/msdos: FreeBSD UFS2 file systems are not recognized s390/vmem: fix identity mapping usb: gadget: f_fs: Fix possibe deadlock Conflicts: drivers/usb/gadget/function/f_fs.c Change-Id: I23106e9fc2c4f2d0b06acce59b781f6c36487fcc Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-06-27Merge 4.4.74 into android-4.4Greg Kroah-Hartman
Changes in 4.4.74 configfs: Fix race between create_link and configfs_rmdir can: gs_usb: fix memory leak in gs_cmd_reset() cpufreq: conservative: Allow down_threshold to take values from 1 to 10 vb2: Fix an off by one error in 'vb2_plane_vaddr' mac80211: don't look at the PM bit of BAR frames mac80211/wpa: use constant time memory comparison for MACs mac80211: fix CSA in IBSS mode mac80211: fix IBSS presp allocation size serial: efm32: Fix parity management in 'efm32_uart_console_get_options()' x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init() mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data() iio: proximity: as3935: recalibrate RCO after resume USB: hub: fix SS max number of ports usb: core: fix potential memory leak in error path during hcd creation pvrusb2: reduce stack usage pvr2_eeprom_analyze() USB: gadget: dummy_hcd: fix hub-descriptor removable fields usb: r8a66597-hcd: select a different endpoint on timeout usb: r8a66597-hcd: decrease timeout drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR() usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks mm/memory-failure.c: use compound_head() flags for huge pages swap: cond_resched in swap_cgroup_prepare() genirq: Release resources in __setup_irq() error path alarmtimer: Prevent overflow of relative timers usb: dwc3: exynos fix axius clock error path to do cleanup MIPS: Fix bnezc/jialc return address calculation alarmtimer: Rate limit periodic intervals mm: larger stack guard gap, between vmas Allow stack to grow up to address space limit mm: fix new crash in unmapped_area_topdown() Linux 4.4.74 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-06-26mm: fix new crash in unmapped_area_topdown()Hugh Dickins
commit f4cb767d76cf7ee72f97dd76f6cfa6c76a5edc89 upstream. Trinity gets kernel BUG at mm/mmap.c:1963! in about 3 minutes of mmap testing. That's the VM_BUG_ON(gap_end < gap_start) at the end of unmapped_area_topdown(). Linus points out how MAP_FIXED (which does not have to respect our stack guard gap intentions) could result in gap_end below gap_start there. Fix that, and the similar case in its alternative, unmapped_area(). Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas") Reported-by: Dave Jones <davej@codemonkey.org.uk> Debugged-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26Allow stack to grow up to address space limitHelge Deller
commit bd726c90b6b8ce87602208701b208a208e6d5600 upstream. Fix expand_upwards() on architectures with an upward-growing stack (parisc, metag and partly IA-64) to allow the stack to reliably grow exactly up to the address space limit given by TASK_SIZE. Signed-off-by: Helge Deller <deller@gmx.de> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26mm: larger stack guard gap, between vmasHugh Dickins
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream. Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: backport to 4.11: adjust context] [wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide] [wt: backport to 4.4: adjust context ; drop ppc hugetlb_radix changes] Signed-off-by: Willy Tarreau <w@1wt.eu> [gkh: minor build fixes for 4.4] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-24arm64: Add support for app specific settingsSarangdhar Joshi
Add support to provide an interface that can be used from userspace to decide whether app specific settings need to be applied / cleared when particular processes are running. CRs-Fixed: 981519 997757 Change-Id: Id81f8b70de64f291a8586150f4d2c7c8f8b4420f Signed-off-by: Sarangdhar Joshi <spjoshi@codeaurora.org> [satyap@codeaurora.org: trivial merge conflict resolution and pull fixes for CR: 997757] Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org> [ztu@codeaurora.org: Resolved conflicts] Signed-off-by: Zhiqiang Tu <ztu@codeaurora.org>
2016-08-01Merge tag 'v4.4.16' into android-4.4.yDmitry Shmidt
This is the 4.4.16 stable release Change-Id: Ibaf7b7e03695e1acebc654a2ca1a4bfcc48fcea4
2016-04-07mm: Export do_munmapGuenter Roeck
The 0-day build bot reports the following build error, seen if SDCARD_FS is built as module. ERROR: "do_munmap" undefined! Fixes: 84a1b7d3d312 ("Included sdcardfs source code for kernel 3.0") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Guenter Roeck <groeck@chromium.org>
2016-03-29mm: Export do_munmapGuenter Roeck
The 0-day build bot reports the following build error, seen if SDCARD_FS is built as module. ERROR: "do_munmap" undefined! Fixes: 84a1b7d3d312 ("Included sdcardfs source code for kernel 3.0") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Guenter Roeck <groeck@chromium.org>
2016-02-29Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-androidAlex Shi
2016-02-25mm: fix regression in remap_file_pages() emulationKirill A. Shutemov
commit 48f7df329474b49d83d0dffec1b6186647f11976 upstream. Grazvydas Ignotas has reported a regression in remap_file_pages() emulation. Testcase: #define _GNU_SOURCE #include <assert.h> #include <stdlib.h> #include <stdio.h> #include <sys/mman.h> #define SIZE (4096 * 3) int main(int argc, char **argv) { unsigned long *p; long i; p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) { perror("mmap"); return -1; } for (i = 0; i < SIZE / 4096; i++) p[i * 4096 / sizeof(*p)] = i; if (remap_file_pages(p, 4096, 0, 1, 0)) { perror("remap_file_pages"); return -1; } if (remap_file_pages(p, 4096 * 2, 0, 1, 0)) { perror("remap_file_pages"); return -1; } assert(p[0] == 1); munmap(p, SIZE); return 0; } The second remap_file_pages() fails with -EINVAL. The reason is that remap_file_pages() emulation assumes that the target vma covers whole area we want to over map. That assumption is broken by first remap_file_pages() call: it split the area into two vma. The solution is to check next adjacent vmas, if they map the same file with the same flags. Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Grazvydas Ignotas <notasas@gmail.com> Tested-by: Grazvydas Ignotas <notasas@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25mm: replace vma_lock_anon_vma with anon_vma_lock_read/writeKonstantin Khlebnikov
commit 12352d3cae2cebe18805a91fab34b534d7444231 upstream. Sequence vma_lock_anon_vma() - vma_unlock_anon_vma() isn't safe if anon_vma appeared between lock and unlock. We have to check anon_vma first or call anon_vma_prepare() to be sure that it's here. There are only few users of these legacy helpers. Let's get rid of them. This patch fixes anon_vma lock imbalance in validate_mm(). Write lock isn't required here, read lock is enough. And reorders expand_downwards/expand_upwards: security_mmap_addr() and wrapping-around check don't have to be under anon vma lock. Link: https://lkml.kernel.org/r/CACT4Y+Y908EjM2z=706dv4rV6dWtxTLK9nFg9_7DhRMLppBo2g@mail.gmail.com Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-16FROMLIST: mm: mmap: Add new /proc tunable for mmap_base ASLR.dcashman
(cherry picked from commit https://lkml.org/lkml/2015/12/21/337) ASLR only uses as few as 8 bits to generate the random offset for the mmap base address on 32 bit architectures. This value was chosen to prevent a poorly chosen value from dividing the address space in such a way as to prevent large allocations. This may not be an issue on all platforms. Allow the specification of a minimum number of bits so that platforms desiring greater ASLR protection may determine where to place the trade-off. Bug: 24047224 Signed-off-by: Daniel Cashman <dcashman@android.com> Signed-off-by: Daniel Cashman <dcashman@google.com> Change-Id: Ibf9ed3d4390e9686f5cc34f605d509a20d40e6c2
2016-02-16mm: add a field to store names for private anonymous memoryColin Cross
Userspace processes often have multiple allocators that each do anonymous mmaps to get memory. When examining memory usage of individual processes or systems as a whole, it is useful to be able to break down the various heaps that were allocated by each layer and examine their size, RSS, and physical memory usage. This patch adds a user pointer to the shared union in vm_area_struct that points to a null terminated string inside the user process containing a name for the vma. vmas that point to the same address will be merged, but vmas that point to equivalent strings at different addresses will not be merged. Userspace can set the name for a region of memory by calling prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name); Setting the name to NULL clears it. The names of named anonymous vmas are shown in /proc/pid/maps as [anon:<name>] and in /proc/pid/smaps in a new "Name" field that is only present for named vmas. If the userspace pointer is no longer valid all or part of the name will be replaced with "<fault>". The idea to store a userspace pointer to reduce the complexity within mm (at the expense of the complexity of reading /proc/pid/mem) came from Dave Hansen. This results in no runtime overhead in the mm subsystem other than comparing the anon_name pointers when considering vma merging. The pointer is stored in a union with fieds that are only used on file-backed mappings, so it does not increase memory usage. Includes fix from Jed Davis <jld@mozilla.com> for typo in prctl_set_vma_anon_name, which could attempt to set the name across two vmas at the same time due to a typo, which might corrupt the vma list. Fix it to use tmp instead of end to limit the name setting to a single vma at a time. Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2015-11-05mm: introduce VM_LOCKONFAULTEric B Munson
The cost of faulting in all memory to be locked can be very high when working with large mappings. If only portions of the mapping will be used this can incur a high penalty for locking. For the example of a large file, this is the usage pattern for a large statical language model (probably applies to other statical or graphical models as well). For the security example, any application transacting in data that cannot be swapped out (credit card data, medical records, etc). This patch introduces the ability to request that pages are not pre-faulted, but are placed on the unevictable LRU when they are finally faulted in. The VM_LOCKONFAULT flag will be used together with VM_LOCKED and has no effect when set without VM_LOCKED. Setting the VM_LOCKONFAULT flag for a VMA will cause pages faulted into that VMA to be added to the unevictable LRU when they are faulted or if they are already present, but will not cause any missing pages to be faulted in. Exposing this new lock state means that we cannot overload the meaning of the FOLL_POPULATE flag any longer. Prior to this patch it was used to mean that the VMA for a fault was locked. This means we need the new FOLL_MLOCK flag to communicate the locked state of a VMA. FOLL_POPULATE will now only control if the VMA should be populated and in the case of VM_LOCKONFAULT, it will not be set. Signed-off-by: Eric B Munson <emunson@akamai.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm/mmap.c: change __install_special_mapping() args orderChen Gang
Make __install_special_mapping() args order match the caller, so the caller can pass their register args directly to callee with no touch. For most of architectures, args (at least the first 5th args) are in registers, so this change will have effect on most of architectures. For -O2, __install_special_mapping() may be inlined under most of architectures, but for -Os, it should not. So this change can get a little better performance for -Os, at least. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm/mmap.c: do not initialize retval in mmap_pgoff()Chen Gang
When fget() fails we can return -EBADF directly. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm/mmap.c: remove redundant statement "error = -ENOMEM"Chen Gang
It is still a little better to remove it, although it should be skipped by "-O2". Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>=0A= Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm: add the "struct mm_struct *mm" local intoOleg Nesterov
Cosmetic, but expand_upwards() and expand_downwards() overuse vma->vm_mm, a local variable makes sense imho. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm: fix the racy mm->locked_vm change inOleg Nesterov
"mm->locked_vm += grow" and vm_stat_account() in acct_stack_growth() are not safe; multiple threads using the same ->mm can do this at the same time trying to expans different vma's under down_read(mmap_sem). This means that one of the "locked_vm += grow" changes can be lost and we can miss munlock_vma_pages_all() later. Move this code into the caller(s) under mm->page_table_lock. All other updates to ->locked_vm hold mmap_sem for writing. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm/mmap: use offset_in_page macroAlexander Kuleshov
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm/mmap.c: remove useless statement "vma = NULL" in find_vma()Chen Gang
Before the main loop, vma is already is NULL. There is no need to set it to NULL again. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-22mm, dax: VMA with vm_ops->pfn_mkwrite wants to be write-notifiedKirill A. Shutemov
For VM_PFNMAP and VM_MIXEDMAP we use vm_ops->pfn_mkwrite instead of vm_ops->page_mkwrite to notify abort write access. This means we want vma->vm_page_prot to be write-protected if the VMA provides this vm_ops. A theoretical scenario that will cause these missed events is: On writable mapping with vm_ops->pfn_mkwrite, but without vm_ops->page_mkwrite: read fault followed by write access to the pfn. Writable pte will be set up on read fault and write fault will not be generated. I found it examining Dave's complaint on generic/080: http://lkml.kernel.org/g/20150831233803.GO3902@dastard Although I don't think it's the reason. It shouldn't be a problem for ext2/ext4 as they provide both pfn_mkwrite and page_mkwrite. [akpm@linux-foundation.org: add local vm_ops to avoid 80-cols mess] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yigal Korman <yigal@plexistor.com> Acked-by: Boaz Harrosh <boaz@plexistor.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-17revert "mm: make sure all file VMAs have ->vm_ops set"Andrew Morton
Revert commit 6dc296e7df4c "mm: make sure all file VMAs have ->vm_ops set". Will Deacon reports that it "causes some mmap regressions in LTP, which appears to use a MAP_PRIVATE mmap of /dev/zero as a way to get anonymous pages in some of its tests (specifically mmap10 [1])". William Shuman reports Oracle crashes. So revert the patch while we work out what to do. Reported-by: William Shuman <wshuman3@gmail.com> Reported-by: Will Deacon <will.deacon@arm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10mm: make sure all file VMAs have ->vm_ops setKirill A. Shutemov
We rely on vma->vm_ops == NULL to detect anonymous VMA: see vma_is_anonymous(), but some drivers doesn't set ->vm_ops. As a result we can end up with anonymous page in private file mapping. That should not lead to serious misbehaviour, but nevertheless is wrong. Let's fix by setting up dummy ->vm_ops for file mmapping if f_op->mmap() didn't set its own. The patch also adds sanity check into __vma_link_rb(). It will help catch broken VMAs which inserted directly into mm_struct via insert_vm_struct(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()Oleg Nesterov
Add the additional "vm_flags_t vm_flags" argument to do_mmap_pgoff(), rename it to do_mmap(), and re-introduce do_mmap_pgoff() as a simple wrapper on top of do_mmap(). Perhaps we should update the callers of do_mmap_pgoff() and kill it later. This way mpx_mmap() can simply call do_mmap(vm_flags => VM_MPX) and do not play with vm internals. After this change mmap_region() has a single user outside of mmap.c, arch/tile/mm/elf.c:arch_setup_additional_pages(). It would be nice to change arch/tile/ and unexport mmap_region(). [kirill@shutemov.name: fix build] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08mm/mmap.c:insert_vm_struct(): check for failure before setting valuesChen Gang
There's no point in initializing vma->vm_pgoff if the insertion attempt will be failing anyway. Run the checks before performing the initialization. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08mm/mmap.c: simplify the failure return working flowChen Gang
__split_vma() doesn't need out_err label, neither need initializing err. copy_vma() can return NULL directly when kmem_cache_alloc() fails. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08mremap: fix the wrong !vma->vm_file check in copy_vma()Oleg Nesterov
Test-case: #define _GNU_SOURCE #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> #include <assert.h> void *find_vdso_vaddr(void) { FILE *perl; char buf[32] = {}; perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;" "/^(.*?)-.*vdso/ && print hex $1 while <>'", "r"); fread(buf, sizeof(buf), 1, perl); fclose(perl); return (void *)atol(buf); } #define PAGE_SIZE 4096 void *get_unmapped_area(void) { void *p = mmap(0, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1,0); assert(p != MAP_FAILED); munmap(p, PAGE_SIZE); return p; } char save[2][PAGE_SIZE]; int main(void) { void *vdso = find_vdso_vaddr(); void *page[2]; assert(vdso); memcpy(save, vdso, sizeof (save)); // force another fault on the next check assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0); page[0] = mremap(vdso, PAGE_SIZE, PAGE_SIZE, MREMAP_FIXED | MREMAP_MAYMOVE, get_unmapped_area()); page[1] = mremap(vdso + PAGE_SIZE, PAGE_SIZE, PAGE_SIZE, MREMAP_FIXED | MREMAP_MAYMOVE, get_unmapped_area()); assert(page[0] != MAP_FAILED && page[1] != MAP_FAILED); printf("match: %d %d\n", !memcmp(save[0], page[0], PAGE_SIZE), !memcmp(save[1], page[1], PAGE_SIZE)); return 0; } fails without this patch. Before the previous commit it gets the wrong page, now it segfaults (which is imho better). This is because copy_vma() wrongly assumes that if vma->vm_file == NULL is irrelevant until the first fault which will use do_anonymous_page(). This is obviously wrong for the special mapping. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08mmap: fix the usage of ->vm_pgoff in special_mapping pathsOleg Nesterov
Test-case: #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> #include <assert.h> void *find_vdso_vaddr(void) { FILE *perl; char buf[32] = {}; perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;" "/^(.*?)-.*vdso/ && print hex $1 while <>'", "r"); fread(buf, sizeof(buf), 1, perl); fclose(perl); return (void *)atol(buf); } #define PAGE_SIZE 4096 int main(void) { void *vdso = find_vdso_vaddr(); assert(vdso); // of course they should differ, and they do so far printf("vdso pages differ: %d\n", !!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE)); // split into 2 vma's assert(mprotect(vdso, PAGE_SIZE, PROT_READ) == 0); // force another fault on the next check assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0); // now they no longer differ, the 2nd vm_pgoff is wrong printf("vdso pages differ: %d\n", !!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE)); return 0; } Output: vdso pages differ: 1 vdso pages differ: 0 This is because split_vma() correctly updates ->vm_pgoff, but the logic in insert_vm_struct() and special_mapping_fault() is absolutely broken, so the fault at vdso + PAGE_SIZE return the 1st page. The same happens if you simply unmap the 1st page. special_mapping_fault() does: pgoff = vmf->pgoff - vma->vm_pgoff; and this is _only_ correct if vma->vm_start mmaps the first page from ->vm_private_data array. vdso or any other user of install_special_mapping() is not anonymous, it has the "backing storage" even if it is just the array of pages. So we actually need to make vm_pgoff work as an offset in this array. Note: this also allows to fix another problem: currently gdb can't access "[vvar]" memory because in this case special_mapping_fault() doesn't work. Now that we can use ->vm_pgoff we can implement ->access() and fix this. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctxAndrea Arcangeli
vma->vm_userfaultfd_ctx is yet another vma parameter that vma_merge must be aware about so that we can merge vmas back like they were originally before arming the userfaultfd on some memory range. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-10vfs: Commit to never having exectuables on proc and sysfs.Eric W. Biederman
Today proc and sysfs do not contain any executable files. Several applications today mount proc or sysfs without noexec and nosuid and then depend on there being no exectuables files on proc or sysfs. Having any executable files show on proc or sysfs would cause a user space visible regression, and most likely security problems. Therefore commit to never allowing executables on proc and sysfs by adding a new flag to mark them as filesystems without executables and enforce that flag. Test the flag where MNT_NOEXEC is tested today, so that the only user visible effect will be that exectuables will be treated as if the execute bit is cleared. The filesystems proc and sysfs do not currently incoporate any executable files so this does not result in any user visible effects. This makes it unnecessary to vet changes to proc and sysfs tightly for adding exectuable files or changes to chattr that would modify existing files, as no matter what the individual file say they will not be treated as exectuable files by the vfs. Not having to vet changes to closely is important as without this we are only one proc_create call (or another goof up in the implementation of notify_change) from having problematic executables on proc. Those mistakes are all too easy to make and would create a situation where there are security issues or the assumptions of some program having to be broken (and cause userspace regressions). Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-06-24mm/mmap.c: optimization of do_mmap_pgoff functionPiotr Kwapulinski
The simple check for zero length memory mapping may be performed earlier. So that in case of zero length memory mapping some unnecessary code is not executed at all. It does not make the code less readable and saves some CPU cycles. Signed-off-by: Piotr Kwapulinski <kwapulinski.piotr@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15mm/mmap.c: use while instead of if+gotoRasmus Villemoes
The creators of the C language gave us the while keyword. Let's use that instead of synthesizing it from if+goto. Made possible by 6597d783397a ("mm/mmap.c: replace find_vma_prepare() with clearer find_vma_links()"). [akpm@linux-foundation.org: fix 80-col overflows] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15mm: remove rest of ACCESS_ONCE() usagesJason Low
We converted some of the usages of ACCESS_ONCE to READ_ONCE in the mm/ tree since it doesn't work reliably on non-scalar types. This patch removes the rest of the usages of ACCESS_ONCE, and use the new READ_ONCE API for the read accesses. This makes things cleaner, instead of using separate/multiple sets of APIs. Signed-off-by: Jason Low <jason.low2@hp.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14mm: rename __mlock_vma_pages_range() to populate_vma_page_range()Kirill A. Shutemov
__mlock_vma_pages_range() doesn't necessarily mlock pages. It depends on vma flags. The same codepath is used for MAP_POPULATE. Let's rename __mlock_vma_pages_range() to populate_vma_page_range(). This patch also drops mlock_vma_pages_range() references from documentation. It has gone in cea10a19b797 ("mm: directly use __mlock_vma_pages_range() in find_extend_vma()"). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm: fix anon_vma->degree underflow in anon_vma endless growing preventionLeon Yu
I have constantly stumbled upon "kernel BUG at mm/rmap.c:399!" after upgrading to 3.19 and had no luck with 4.0-rc1 neither. So, after looking into new logic introduced by commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy"), I found chances are that unlink_anon_vmas() is called without incrementing dst->anon_vma->degree in anon_vma_clone() due to allocation failure. If dst->anon_vma is not NULL in error path, its degree will be incorrectly decremented in unlink_anon_vmas() and eventually underflow when exiting as a result of another call to unlink_anon_vmas(). That's how "kernel BUG at mm/rmap.c:399!" is triggered for me. This patch fixes the underflow by dropping dst->anon_vma when allocation fails. It's safe to do so regardless of original value of dst->anon_vma because dst->anon_vma doesn't have valid meaning if anon_vma_clone() fails. Besides, callers don't care dst->anon_vma in such case neither. Also suggested by Michal Hocko, we can clean up vma_adjust() a bit as anon_vma_clone() now does the work. [akpm@linux-foundation.org: tweak comment] Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy") Signed-off-by: Leon Yu <chianglungyu@gmail.com> Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()Roman Gushchin
I noticed, that "allowed" can easily overflow by falling below 0, because (total_vm / 32) can be larger than "allowed". The problem occurs in OVERCOMMIT_NONE mode. In this case, a huge allocation can success and overcommit the system (despite OVERCOMMIT_NONE mode). All subsequent allocations will fall (system-wide), so system become unusable. The problem was masked out by commit c9b1d0981fcc ("mm: limit growth of 3% hardcoded other user reserve"), but it's easy to reproduce it on older kernels: 1) set overcommit_memory sysctl to 2 2) mmap() large file multiple times (with VM_SHARED flag) 3) try to malloc() large amount of memory It also can be reproduced on newer kernels, but miss-configured sysctl_user_reserve_kbytes is required. Fix this issue by switching to signed arithmetic here. [akpm@linux-foundation.org: use min_t] Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Cc: Andrew Shewmaker <agshew@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11mm: fix false-positive warning on exit due mm_nr_pmds(mm)Kirill A. Shutemov
The problem is that we check nr_ptes/nr_pmds in exit_mmap() which happens *before* pgd_free(). And if an arch does pte/pmd allocation in pgd_alloc() and frees them in pgd_free() we see offset in counters by the time of the checks. We tried to workaround this by offsetting expected counter value according to FIRST_USER_ADDRESS for both nr_pte and nr_pmd in exit_mmap(). But it doesn't work in some cases: 1. ARM with LPAE enabled also has non-zero USER_PGTABLES_CEILING, but upper addresses occupied with huge pmd entries, so the trick with offsetting expected counter value will get really ugly: we will have to apply it nr_pmds, but not nr_ptes. 2. Metag has non-zero FIRST_USER_ADDRESS, but doesn't do allocation pte/pmd page tables allocation in pgd_alloc(), just setup a pgd entry which is allocated at boot and shared accross all processes. The proposal is to move the check to check_mm() which happens *after* pgd_free() and do proper accounting during pgd_alloc() and pgd_free() which would bring counters to zero if nothing leaked. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Tyler Baker <tyler.baker@linaro.org> Tested-by: Tyler Baker <tyler.baker@linaro.org> Tested-by: Nishanth Menon <nm@ti.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: James Hogan <james.hogan@imgtec.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11mm: account pmd page tables to the processKirill A. Shutemov
Dave noticed that unprivileged process can allocate significant amount of memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and memory cgroup. The trick is to allocate a lot of PMD page tables. Linux kernel doesn't account PMD tables to the process, only PTE. The use-cases below use few tricks to allocate a lot of PMD page tables while keeping VmRSS and VmPTE low. oom_score for the process will be 0. #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #include <sys/prctl.h> #define PUD_SIZE (1UL << 30) #define PMD_SIZE (1UL << 21) #define NR_PUD 130000 int main(void) { char *addr = NULL; unsigned long i; prctl(PR_SET_THP_DISABLE); for (i = 0; i < NR_PUD ; i++) { addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); break; } *addr = 'x'; munmap(addr, PMD_SIZE); mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0); if (addr == MAP_FAILED) perror("re-mmap"), exit(1); } printf("PID %d consumed %lu KiB in PMD page tables\n", getpid(), i * 4096 >> 10); return pause(); } The patch addresses the issue by account PMD tables to the process the same way we account PTE. The main place where PMD tables is accounted is __pmd_alloc() and free_pmd_range(). But there're few corner cases: - HugeTLB can share PMD page tables. The patch handles by accounting the table to all processes who share it. - x86 PAE pre-allocates few PMD tables on fork. - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity check on exit(2). Accounting only happens on configuration where PMD page table's level is present (PMD is not folded). As with nr_ptes we use per-mm counter. The counter value is used to calculate baseline for badness score by oom-killer. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: David Rientjes <rientjes@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10rmap: drop support of non-linear mappingsKirill A. Shutemov
We don't create non-linear mappings anymore. Let's drop code which handles them in rmap. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10mm: replace remap_file_pages() syscall with emulationKirill A. Shutemov
remap_file_pages(2) was invented to be able efficiently map parts of huge file into limited 32-bit virtual address space such as in database workloads. Nonlinear mappings are pain to support and it seems there's no legitimate use-cases nowadays since 64-bit systems are widely available. Let's drop it and get rid of all these special-cased code. The patch replaces the syscall with emulation which creates new VMA on each remap_file_pages(), unless they it can be merged with an adjacent one. I didn't find *any* real code that uses remap_file_pages(2) to test emulation impact on. I've checked Debian code search and source of all packages in ALT Linux. No real users: libc wrappers, mentions in strace, gdb, valgrind and this kind of stuff. There are few basic tests in LTP for the syscall. They work just fine with emulation. To test performance impact, I've written small test case which demonstrate pretty much worst case scenario: map 4G shmfs file, write to begin of every page pgoff of the page, remap pages in reverse order, read every page. The test creates 1 million of VMAs if emulation is in use, so I had to set vm.max_map_count to 1100000 to avoid -ENOMEM. Before: 23.3 ( +- 4.31% ) seconds After: 43.9 ( +- 0.85% ) seconds Slowdown: 1.88x I believe we can live with that. Test case: #define _GNU_SOURCE #include <assert.h> #include <stdlib.h> #include <stdio.h> #include <sys/mman.h> #define MB (1024UL * 1024) #define SIZE (4096 * MB) int main(int argc, char **argv) { unsigned long *p; long i, pass; for (pass = 0; pass < 10; pass++) { p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) { perror("mmap"); return -1; } for (i = 0; i < SIZE / 4096; i++) p[i * 4096 / sizeof(*p)] = i; for (i = 0; i < SIZE / 4096; i++) { if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096, 0, (SIZE - 4096 * (i + 1)) >> 12, 0)) { perror("remap_file_pages"); return -1; } } for (i = SIZE / 4096 - 1; i >= 0; i--) assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1); munmap(p, SIZE); } return 0; } [akpm@linux-foundation.org: fix spello] [sasha.levin@oracle.com: initialize populate before usage] [sasha.levin@oracle.com: grab file ref to prevent race while mmaping] Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Armin Rigo <arigo@tunes.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>